This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Results of HPO #976
Comments
I am not exactly sure about your procedure, but one thing you could check is that those MRRs are on the same set of evaluation triples: for selecting hyperparameter we typically use a validation set which is different from the test set used for the final performance assessment. Since the hparams are chosen on this validation, it can happen that the performance drops a bit when switching to the test set. Moreover, there is some inherent stochasticity in the training process which can lead to slightly fluctuating results. Thus, one often trains one configuration multiple times with different random seeds to assess how much variation there is between those runs.
this seems to be a different question. You can find an answer at https://pykeen.readthedocs.io/en/stable/api/pykeen.losses.MarginRankingLoss.html. If you did not include this parameter in the HPO grid, or manually changed it in between, then it should be the same for all models you trained. notice that if you trained multiple different models, they sometimes have different default loss functions, e.g., |
Thanks for your reply and this good tip, I will try multiple times.
In the final results of HPO in file(trials.TSV), there are values(Mrr) for some trials are higher than the final best model, but the status of these trials are PRUNED. Even I do not indicate any value for |
Can't provide any more help without full code example |
My code from pykeen.hpo import hpo_pipeline
hpo_pipeline_result = hpo_pipeline(
n_trials=100,
training=training,
testing=testing,
validation = validation,
model='BoxE',
model_kwargs_ranges=dict(
embedding_dim=dict(
type=int,
low=20,
high=512,
q=10)),
optimizer='adam',
optimizer_kwargs_ranges=dict(
lr=dict(
type=float,
low=0.001,
high=0.1,
scale ='log')),
loss='marginranking',
training_loop='slcwa',
training_kwargs_ranges=dict(
num_epochs=dict(
type=int,
low=10,
high=500,
q=10) ,
batch_size=dict(type = int,
low = 16,
high = 200,
q= 16) ),
negative_sampler='basic',
negative_sampler_kwargs_ranges=dict(
num_negs_per_pos=dict(
type=int,
low=1,
high=100,
q=10)),
evaluator_kwargs=dict(filtered=True),
stopper='early',
stopper_kwargs=dict(frequency=5, patience=2, relative_delta=0.002),
filter_validation_when_testing = True,
) |
To select a pruner, we use
Could you also share the |
I do not know how to share all file with you here so I just take trials with highest Mrr (0.45) value but PRUNED (trial 80). However, the hpo at the end chose another trial (trial 80) with Mrr = 0.36. <style> </style>
|
There is also another issue in HPO results 1- I saved all models from HPO and when I tried the best values of best trial (from this my code for evaluation try best model
|
How big is your custom dataset? |
About 12000 triples, 4296 entities and 12 relations |
Assuming that you use a sufficiently large portion of that as validation/test, the results should be somewhat stable 😕 So right now, my best guess is that your metrics fluctuate based on the random initialization. |
Okay, I will try with different random seeds my question exactly why the same model I saved through HPO have different result than the model I am trying with the values mentioned in the best_pipline file. In both cases, I used the same testing set, so the fluctuation does not depend on testing set or split of data. |
In HPO, the model is always evaluated on validation data (which is intended). From your script, it looks like you evaluate on test data instead. |
Firstly, thanks for your quick reply. yeah I know, I mean I used during HPO And also, from the values of hyperparameters in the best_pipline, I built the model from beginning with same values and test on the testing set. |
Okay, so then you have two different model (with same hyperparameters), correct? |
yes, I thought I will get almost the same MRR or in the same range but I forgot that HPO does not have random seed to fix so the randomness of two model are different so this may be the cause of difference. |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
What is your question
I am using HPO to optimize (Epoch, batch size, embedding dimension, neg_per-positive, learning rate)
I got the best model with the best hyperparameters with highest Mrr.
but when I am trying the same results of hyperparameter and build the mode , I got MRR value higher than I got in the results of HPO itself. Is this normal ?
why some trials have much higher Mrr in HPO but failed?
How can I Know the default value for margin in
marginranking
loss function in default case for models and is it the same in all models?Environment
Issue Template Checks
The text was updated successfully, but these errors were encountered: