Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Results of HPO #976

Closed
3 tasks done
ahmedibatta opened this issue Jun 13, 2022 · 15 comments
Closed
3 tasks done

Results of HPO #976

ahmedibatta opened this issue Jun 13, 2022 · 15 comments
Labels
question Further information is requested

Comments

@ahmedibatta
Copy link

What is your question

I am using HPO to optimize (Epoch, batch size, embedding dimension, neg_per-positive, learning rate)
I got the best model with the best hyperparameters with highest Mrr.
but when I am trying the same results of hyperparameter and build the mode , I got MRR value higher than I got in the results of HPO itself. Is this normal ?
why some trials have much higher Mrr in HPO but failed?

How can I Know the default value for margin in marginranking loss function in default case for models and is it the same in all models?

Environment

Key Value
OS posix
Platform Linux
Release 5.10.107+
Time Mon Jun 13 09:04:32 2022
Python 3.7.12
PyKEEN 1.8.1
PyKEEN Hash UNHASHED
PyKEEN Branch  
PyTorch 1.11.0
CUDA Available? true
CUDA Version 11.0
cuDNN Version 8005

Issue Template Checks

  • This is not a bug report (use a different issue template if it is)
  • This is not a feature request (use a different issue template if it is)
  • I've read the text explaining why including environment information is important and understand if I omit this information that my issue will be dismissed
@ahmedibatta ahmedibatta added the question Further information is requested label Jun 13, 2022
@mberr
Copy link
Member

mberr commented Jun 14, 2022

I am using HPO to optimize (Epoch, batch size, embedding dimension, neg_per-positive, learning rate)
I got the best model with the best hyperparameters with highest Mrr.
but when I am trying the same results of hyperparameter and build the mode , I got MRR value higher than I got in the results of HPO itself. Is this normal ?

I am not exactly sure about your procedure, but one thing you could check is that those MRRs are on the same set of evaluation triples: for selecting hyperparameter we typically use a validation set which is different from the test set used for the final performance assessment. Since the hparams are chosen on this validation, it can happen that the performance drops a bit when switching to the test set. Moreover, there is some inherent stochasticity in the training process which can lead to slightly fluctuating results. Thus, one often trains one configuration multiple times with different random seeds to assess how much variation there is between those runs.

why some trials have much higher Mrr in HPO but failed?
could you explain what you mean here?

How can I Know the default value for margin in marginranking loss function in default case for models and is it the same in all models?

this seems to be a different question. You can find an answer at https://pykeen.readthedocs.io/en/stable/api/pykeen.losses.MarginRankingLoss.html. If you did not include this parameter in the HPO grid, or manually changed it in between, then it should be the same for all models you trained.

notice that if you trained multiple different models, they sometimes have different default loss functions, e.g., ConvE defaults to BCEAfterSigmoidLoss instead of margin ranking loss.

@ahmedibatta
Copy link
Author

Thanks for your reply and this good tip, I will try multiple times.

could you explain what you mean here?

In the final results of HPO in file(trials.TSV), there are values(Mrr) for some trials are higher than the final best model, but the status of these trials are PRUNED. Even I do not indicate any value for pruner in hpo_pipeline()

@cthoyt
Copy link
Member

cthoyt commented Jun 15, 2022

Can't provide any more help without full code example

@ahmedibatta
Copy link
Author

ahmedibatta commented Jun 15, 2022

My code

from pykeen.hpo import hpo_pipeline
hpo_pipeline_result = hpo_pipeline(  
    n_trials=100,
    training=training,
    testing=testing,
    validation = validation,
  
    model='BoxE',    
    model_kwargs_ranges=dict(
            embedding_dim=dict(
                type=int,
                low=20,
                high=512,
                q=10)),  

    optimizer='adam',  
    optimizer_kwargs_ranges=dict(
            lr=dict(
                type=float,
                low=0.001,
                high=0.1,
                scale ='log')), 
    loss='marginranking', 
    training_loop='slcwa',
    training_kwargs_ranges=dict(
            num_epochs=dict(
                type=int,
                low=10,
                high=500,
                q=10) , 
            batch_size=dict(type = int,
                            low = 16,
                            high = 200, 
                            q= 16) ), 
    negative_sampler='basic',
    negative_sampler_kwargs_ranges=dict(
            num_negs_per_pos=dict(
                type=int,
                low=1,
                high=100,
                q=10)),
    evaluator_kwargs=dict(filtered=True),
    stopper='early',
    stopper_kwargs=dict(frequency=5, patience=2, relative_delta=0.002),
    filter_validation_when_testing  = True,
)

@mberr
Copy link
Member

mberr commented Jun 15, 2022

Even I do not indicate any value for pruner in hpo_pipeline()

To select a pruner, we use class_resolver.contrib.optuna.pruner_resolver, which defaults to MedianPruner, cf.
https://github.com/cthoyt/class-resolver/blob/8e56e153513cae1a13b9fc12b57cbcd4ade79948/src/class_resolver/contrib/optuna.py#L49-L54 If you want to disable pruning, use pruner="Nop" instead. Notice that not pruning trials will likely increase the runtime of the HPO.

In the final results of HPO in file(trials.TSV), there are values(Mrr) for some trials are higher than the final best model, but the status of these trials are PRUNED.

Could you also share the trials.tsv (or the relevant part of it)? Also, to which extent are the results higher?

@ahmedibatta
Copy link
Author

I do not know how to share all file with you here so I just take trials with highest Mrr (0.45) value but PRUNED (trial 80). However, the hpo at the end chose another trial (trial 80) with Mrr = 0.36.

<style> </style>
77 0.304797 51:36.2 52:18.7 0 days 00:00:42.510877 1.324229 130 2 41 0.073823 48 180                                                                                                                                                                                                                                                                                                                                                                                                                                                                       PRUNED
78 0.416046 52:18.7 53:29.1 0 days 00:01:10.448824 2.295811 260 2 51 0.030079 80 370                                                                                                                                                                                                                                                                                                                                                                                                                                                                       PRUNED
79 0.437965 52:46.0 55:43.7 0 days 00:02:57.659087 2.915244 370 2 31 0.020549 64 130                                                                                                                                                                                                                                                                                                                                                                                                                                                                       PRUNED
80 0.454508 53:29.1 56:55.8 0 days 00:03:26.646425 2.908837 410 2 31 0.018712 64 160                                                                                                                                                                                                                                                                                                                                                                                                                                                                       PRUNED
81 0.369003 55:43.7 10:50.9 0 days 00:15:07.258732 2.766026 320 2 31 0.014421 64 160 0.119323 0.88109 0.990511 0.492194 0.36768 254.885 2418 15.91984 2.710006 0.300662 0.493383 0.3933 0.431348 0.003923 0.062815 0.369003 0.090909 14.82602 11 656.7586 431331.9 75.01888 48.93194 499.5649 924.6233 0.119323 0.881089 0.990511 0.492194 0.36768 254.8854 2418 15.91985 2.710006 0.300662 0.493383 0.3933 0.431348 0.003923 0.062815 0.369003 0.090909 14.82602 11 656.7589 431332.2 75.01886 48.93194 499.5649 924.6233 0.119323 0.88109 0.990511 0.492194 0.36768 254.8852 2418 15.91985 2.710006 0.300662 0.493383 0.3933 0.431348 0.003923 0.062815 0.369003 0.090909 14.82602 11 656.7587 431332.1 75.01887 48.93194 499.5649 924.6233 0.096014 0.904409 0.992904 0.5465 0.413684 205.4053 1209 12.17681 2.410167 0.346567 0.54756 0.439206 0.483044 0.004868 0.082123 0.414909 0.142857 8.895613 7 561.9152 315748.6 54.45117 34.69381 392.5243 736.1767 0.096014 0.904409 0.992904 0.5465 0.413684 205.4053 1209 12.17681 2.410167 0.346567 0.54756 0.439206 0.483044 0.004868 0.082123 0.414909 0.142857 8.895613 7 561.9152 315748.6 54.45117 34.69381 392.5243 736.1767 0.096014 0.904409 0.992904 0.5465 0.413684 205.4053 1209 12.17681 2.410167 0.346567 0.54756 0.439206 0.483044 0.004868 0.082123 0.414909 0.142857 8.895613 7 561.9152 315748.6 54.45117 34.69381 392.5243 736.1767 1.15E+09 70 0.142702 0.8577 0.987382 0.437888 0.321675 304.3648 1209 20.81345 3.095049 0.254756 0.439206 0.347395 0.379653 0.003286 0.048046 0.323097 0.055556 25.20424 18 736.2192 542018.7 51.63745 34.50121 314.0278 571.5643 0.142703 0.857699 0.987382 0.437888 0.321675 304.3656 1209 20.81348 3.095049 0.254756 0.439206 0.347395 0.379653 0.003286 0.048046 0.323097 0.055556 25.20424 18 736.2195 542019.2 51.63743 34.50121 314.0278 571.5643 0.142702 0.8577 0.987382 0.437888 0.321675 304.3652 1209 20.81347 3.095049 0.254756 0.439206 0.347395 0.379653 0.003286 0.048046 0.323097 0.055556 25.20424 18 736.2194 542018.9 51.63744 34.50121 314.0278 571.5643 COMPLETE
82 0.455335 56:55.8 00:24.8 0 days 00:03:29.028143 2.745786 330 2 41 0.015867 48 140                                                                                                                                                                                                                                                                                                                                                                                                                                                                       PRUNED
83 0.33292 00:24.8 00:48.9 0 days 00:00:24.109553 2.832544 350 2 31 0.013573 160 330                                                                                                                                                                                                                                                                                                                                                                                                                                                                       PRUNED
84 0.351913 00:48.9 03:22.5 0 days 00:02:33.586149 2.693525 300 2 21 0.023356 176 220 0.13868 0.861723 0.988675 0.465249 0.350553 296.2345 2418 18.8069 2.841615 0.287841 0.466501 0.377585 0.409016 0.003376 0.053172 0.351913 0.071429 19.27383 14 719.353 517468.7 73.36995 48.84123 472.2166 881.5552 0.13868 0.861723 0.988675 0.465249 0.350553 296.2345 2418 18.8069 2.841615 0.287841 0.466501 0.377585 0.409016 0.003376 0.053172 0.351913 0.071429 19.27383 14 719.353 517468.7 73.36995 48.84123 472.2166 881.5552 0.13868 0.861723 0.988675 0.465249 0.350553 296.2345 2418 18.8069 2.841615 0.287841 0.466501 0.377585 0.409016 0.003376 0.053172 0.351913 0.071429 19.27383 14 719.353 517468.8 73.36995 48.84123 472.2166 881.5552 0.116838 0.883575 0.991457 0.509192 0.402172 249.9553 1209 14.45586 2.4788 0.342432 0.510339 0.425972 0.45823 0.004001 0.069176 0.403421 0.111111 11.86082 9 651.686 424694.7 53.19683 34.64325 365.7277 715.6902 0.116838 0.883575 0.991457 0.509192 0.402172 249.9553 1209 14.45586 2.4788 0.342432 0.510339 0.425972 0.45823 0.004001 0.069176 0.403421 0.111111 11.86082 9 651.686 424694.7 53.19683 34.64325 365.7277 715.6902 0.116838 0.883575 0.991457 0.509192 0.402172 249.9553 1209 14.45586 2.4788 0.342432 0.510339 0.425972 0.45823 0.004001 0.069176 0.403421 0.111111 11.86082 9 651.686 424694.7 53.19683 34.64325 365.7277 715.6901 3.84E+09 45 0.160589 0.839805 0.985055 0.421306 0.298935 342.5136 1209 24.46754 3.32885 0.233251 0.422663 0.329198 0.359801 0.00292 0.04087 0.300404 0.045455 31.13465 22 778.4338 605959.2 50.56011 34.4199 302.1365 531.1585 0.160589 0.839805 0.985055 0.421306 0.298935 342.5136 1209 24.46754 3.32885 0.233251 0.422663 0.329198 0.359801 0.00292 0.04087 0.300404 0.045455 31.13465 22 778.4338 605959.2 50.56011 34.4199 302.1365 531.1585 0.160589 0.839805 0.985055 0.421306 0.298935 342.5136 1209 24.46755 3.32885 0.233251 0.422663 0.329198 0.359801 0.00292 0.04087 0.300404 0.045455 31.13465 22 778.4338 605959.2 50.56011 34.4199 302.1365 531.1585 COMPLETE
85 0.317204 03:22.5 03:40.3 0 days 00:00:17.782669 2.564328 300 2 11 0.042262 192 250                                                                                                                                                                                                                                                                                                                                                                                                                                                                       PRUNED

@ahmedibatta
Copy link
Author

There is also another issue in HPO results

1- I saved all models from HPO and when I tried the best values of best trial (from trials.csv or best_pipline) I got almost same Mrr as mentioned in (Trials.csv) 0.37 but when I tried the saved model to evaluate the Mrr, I got higher Mrr 0.44.

this my code for evaluation

try best model

import torch
from typing import List
paire_model = torch.load('./models/82/trained_model.pkl')
from pykeen.evaluation import RankBasedEvaluator
evaluator = RankBasedEvaluator(
    filtered=True, )
results = evaluator.evaluate(
    model=paire_model,
    mapped_triples=testing.mapped_triples,
    additional_filter_triples=[
        training.mapped_triples,
        validation.mapped_triples,])

@mberr
Copy link
Member

mberr commented Jun 16, 2022

How big is your custom dataset?

@ahmedibatta
Copy link
Author

How big is your custom dataset?

About 12000 triples, 4296 entities and 12 relations

@mberr
Copy link
Member

mberr commented Jun 18, 2022

Assuming that you use a sufficiently large portion of that as validation/test, the results should be somewhat stable 😕

So right now, my best guess is that your metrics fluctuate based on the random initialization.

@ahmedibatta
Copy link
Author

ahmedibatta commented Jun 18, 2022

Assuming that you use a sufficiently large portion of that as validation/test, the results should be somewhat stable 😕

So right now, my best guess is that your metrics fluctuate based on the random initialization.

Okay, I will try with different random seeds

my question exactly why the same model I saved through HPO have different result than the model I am trying with the values mentioned in the best_pipline file. In both cases, I used the same testing set, so the fluctuation does not depend on testing set or split of data.

@mberr
Copy link
Member

mberr commented Jun 18, 2022

my question exactly why the same model I saved through HPO have different result than the model I am trying with the values mentioned in the best_pipline file. In both cases, I used the same testing set, so the fluctuation does not depend on testing set or split of data.

In HPO, the model is always evaluated on validation data (which is intended). From your script, it looks like you evaluate on test data instead.

@ahmedibatta
Copy link
Author

In HPO, the model is always evaluated on validation data (which is intended). From your script, it looks like you evaluate on test data instead.

Firstly, thanks for your quick reply.

yeah I know, I mean I used during HPO save_model_directory= './models/', to save model of trials.
and from the best_pipline file I got for example trial number 82 is the best so I just load the model from trial 82
paire_model = torch.load('./models/82/trained_model.pkl') that I mentioned in my above code and then evaluate the model.

And also, from the values of hyperparameters in the best_pipline, I built the model from beginning with same values and test on the testing set.
SO now , I am trying both models on the same testing set but got different results.

@mberr
Copy link
Member

mberr commented Jun 19, 2022

And also, from the values of hyperparameters in the best_pipline, I built the model from beginning with same values and test on the testing set.

Okay, so then you have two different model (with same hyperparameters), correct?

@ahmedibatta
Copy link
Author

Okay, so then you have two different model (with same hyperparameters), correct?

yes, I thought I will get almost the same MRR or in the same range but I forgot that HPO does not have random seed to fix so the randomness of two model are different so this may be the cause of difference.

@pykeen pykeen locked and limited conversation to collaborators Jun 30, 2022
@cthoyt cthoyt converted this issue into discussion #1003 Jun 30, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants