Results of HPO #976

ahmedibatta · 2022-06-13T17:47:44Z

What is your question

I am using HPO to optimize (Epoch, batch size, embedding dimension, neg_per-positive, learning rate)
I got the best model with the best hyperparameters with highest Mrr.
but when I am trying the same results of hyperparameter and build the mode , I got MRR value higher than I got in the results of HPO itself. Is this normal ?
why some trials have much higher Mrr in HPO but failed?

How can I Know the default value for margin in marginranking loss function in default case for models and is it the same in all models?

Environment

Key	Value
OS	posix
Platform	Linux
Release	5.10.107+
Time	Mon Jun 13 09:04:32 2022
Python	3.7.12
PyKEEN	1.8.1
PyKEEN Hash	UNHASHED
PyKEEN Branch
PyTorch	1.11.0
CUDA Available?	true
CUDA Version	11.0
cuDNN Version	8005

Issue Template Checks

This is not a bug report (use a different issue template if it is)
This is not a feature request (use a different issue template if it is)
I've read the text explaining why including environment information is important and understand if I omit this information that my issue will be dismissed

The text was updated successfully, but these errors were encountered:

mberr · 2022-06-14T12:18:31Z

I am using HPO to optimize (Epoch, batch size, embedding dimension, neg_per-positive, learning rate)
I got the best model with the best hyperparameters with highest Mrr.
but when I am trying the same results of hyperparameter and build the mode , I got MRR value higher than I got in the results of HPO itself. Is this normal ?

I am not exactly sure about your procedure, but one thing you could check is that those MRRs are on the same set of evaluation triples: for selecting hyperparameter we typically use a validation set which is different from the test set used for the final performance assessment. Since the hparams are chosen on this validation, it can happen that the performance drops a bit when switching to the test set. Moreover, there is some inherent stochasticity in the training process which can lead to slightly fluctuating results. Thus, one often trains one configuration multiple times with different random seeds to assess how much variation there is between those runs.

why some trials have much higher Mrr in HPO but failed?
could you explain what you mean here?

How can I Know the default value for margin in marginranking loss function in default case for models and is it the same in all models?

this seems to be a different question. You can find an answer at https://pykeen.readthedocs.io/en/stable/api/pykeen.losses.MarginRankingLoss.html. If you did not include this parameter in the HPO grid, or manually changed it in between, then it should be the same for all models you trained.

notice that if you trained multiple different models, they sometimes have different default loss functions, e.g., ConvE defaults to BCEAfterSigmoidLoss instead of margin ranking loss.

ahmedibatta · 2022-06-15T08:22:15Z

Thanks for your reply and this good tip, I will try multiple times.

could you explain what you mean here?

In the final results of HPO in file(trials.TSV), there are values(Mrr) for some trials are higher than the final best model, but the status of these trials are PRUNED. Even I do not indicate any value for pruner in hpo_pipeline()

cthoyt · 2022-06-15T08:34:33Z

Can't provide any more help without full code example

ahmedibatta · 2022-06-15T14:27:32Z

My code

from pykeen.hpo import hpo_pipeline
hpo_pipeline_result = hpo_pipeline(  
    n_trials=100,
    training=training,
    testing=testing,
    validation = validation,
  
    model='BoxE',    
    model_kwargs_ranges=dict(
            embedding_dim=dict(
                type=int,
                low=20,
                high=512,
                q=10)),  

    optimizer='adam',  
    optimizer_kwargs_ranges=dict(
            lr=dict(
                type=float,
                low=0.001,
                high=0.1,
                scale ='log')), 
    loss='marginranking', 
    training_loop='slcwa',
    training_kwargs_ranges=dict(
            num_epochs=dict(
                type=int,
                low=10,
                high=500,
                q=10) , 
            batch_size=dict(type = int,
                            low = 16,
                            high = 200, 
                            q= 16) ), 
    negative_sampler='basic',
    negative_sampler_kwargs_ranges=dict(
            num_negs_per_pos=dict(
                type=int,
                low=1,
                high=100,
                q=10)),
    evaluator_kwargs=dict(filtered=True),
    stopper='early',
    stopper_kwargs=dict(frequency=5, patience=2, relative_delta=0.002),
    filter_validation_when_testing  = True,
)

mberr · 2022-06-15T15:42:09Z

Even I do not indicate any value for pruner in hpo_pipeline()

To select a pruner, we use class_resolver.contrib.optuna.pruner_resolver, which defaults to MedianPruner, cf.
https://github.com/cthoyt/class-resolver/blob/8e56e153513cae1a13b9fc12b57cbcd4ade79948/src/class_resolver/contrib/optuna.py#L49-L54 If you want to disable pruning, use pruner="Nop" instead. Notice that not pruning trials will likely increase the runtime of the HPO.

In the final results of HPO in file(trials.TSV), there are values(Mrr) for some trials are higher than the final best model, but the status of these trials are PRUNED.

Could you also share the trials.tsv (or the relevant part of it)? Also, to which extent are the results higher?

ahmedibatta · 2022-06-15T16:38:38Z

I do not know how to share all file with you here so I just take trials with highest Mrr (0.45) value but PRUNED (trial 80). However, the hpo at the end chose another trial (trial 80) with Mrr = 0.36.

77	0.304797	51:36.2	52:18.7	0 days 00:00:42.510877	1.324229	130	2	41	0.073823	48	180																																																																																																																																																																																																																																				PRUNED
78	0.416046	52:18.7	53:29.1	0 days 00:01:10.448824	2.295811	260	2	51	0.030079	80	370																																																																																																																																																																																																																																				PRUNED
79	0.437965	52:46.0	55:43.7	0 days 00:02:57.659087	2.915244	370	2	31	0.020549	64	130																																																																																																																																																																																																																																				PRUNED
80	0.454508	53:29.1	56:55.8	0 days 00:03:26.646425	2.908837	410	2	31	0.018712	64	160																																																																																																																																																																																																																																				PRUNED
81	0.369003	55:43.7	10:50.9	0 days 00:15:07.258732	2.766026	320	2	31	0.014421	64	160	0.119323	0.88109	0.990511	0.492194	0.36768	254.885	2418	15.91984	2.710006	0.300662	0.493383	0.3933	0.431348	0.003923	0.062815	0.369003	0.090909	14.82602	11	656.7586	431331.9	75.01888	48.93194	499.5649	924.6233	0.119323	0.881089	0.990511	0.492194	0.36768	254.8854	2418	15.91985	2.710006	0.300662	0.493383	0.3933	0.431348	0.003923	0.062815	0.369003	0.090909	14.82602	11	656.7589	431332.2	75.01886	48.93194	499.5649	924.6233	0.119323	0.88109	0.990511	0.492194	0.36768	254.8852	2418	15.91985	2.710006	0.300662	0.493383	0.3933	0.431348	0.003923	0.062815	0.369003	0.090909	14.82602	11	656.7587	431332.1	75.01887	48.93194	499.5649	924.6233	0.096014	0.904409	0.992904	0.5465	0.413684	205.4053	1209	12.17681	2.410167	0.346567	0.54756	0.439206	0.483044	0.004868	0.082123	0.414909	0.142857	8.895613	7	561.9152	315748.6	54.45117	34.69381	392.5243	736.1767	0.096014	0.904409	0.992904	0.5465	0.413684	205.4053	1209	12.17681	2.410167	0.346567	0.54756	0.439206	0.483044	0.004868	0.082123	0.414909	0.142857	8.895613	7	561.9152	315748.6	54.45117	34.69381	392.5243	736.1767	0.096014	0.904409	0.992904	0.5465	0.413684	205.4053	1209	12.17681	2.410167	0.346567	0.54756	0.439206	0.483044	0.004868	0.082123	0.414909	0.142857	8.895613	7	561.9152	315748.6	54.45117	34.69381	392.5243	736.1767	1.15E+09	70	0.142702	0.8577	0.987382	0.437888	0.321675	304.3648	1209	20.81345	3.095049	0.254756	0.439206	0.347395	0.379653	0.003286	0.048046	0.323097	0.055556	25.20424	18	736.2192	542018.7	51.63745	34.50121	314.0278	571.5643	0.142703	0.857699	0.987382	0.437888	0.321675	304.3656	1209	20.81348	3.095049	0.254756	0.439206	0.347395	0.379653	0.003286	0.048046	0.323097	0.055556	25.20424	18	736.2195	542019.2	51.63743	34.50121	314.0278	571.5643	0.142702	0.8577	0.987382	0.437888	0.321675	304.3652	1209	20.81347	3.095049	0.254756	0.439206	0.347395	0.379653	0.003286	0.048046	0.323097	0.055556	25.20424	18	736.2194	542018.9	51.63744	34.50121	314.0278	571.5643	COMPLETE
82	0.455335	56:55.8	00:24.8	0 days 00:03:29.028143	2.745786	330	2	41	0.015867	48	140																																																																																																																																																																																																																																				PRUNED
83	0.33292	00:24.8	00:48.9	0 days 00:00:24.109553	2.832544	350	2	31	0.013573	160	330																																																																																																																																																																																																																																				PRUNED
84	0.351913	00:48.9	03:22.5	0 days 00:02:33.586149	2.693525	300	2	21	0.023356	176	220	0.13868	0.861723	0.988675	0.465249	0.350553	296.2345	2418	18.8069	2.841615	0.287841	0.466501	0.377585	0.409016	0.003376	0.053172	0.351913	0.071429	19.27383	14	719.353	517468.7	73.36995	48.84123	472.2166	881.5552	0.13868	0.861723	0.988675	0.465249	0.350553	296.2345	2418	18.8069	2.841615	0.287841	0.466501	0.377585	0.409016	0.003376	0.053172	0.351913	0.071429	19.27383	14	719.353	517468.7	73.36995	48.84123	472.2166	881.5552	0.13868	0.861723	0.988675	0.465249	0.350553	296.2345	2418	18.8069	2.841615	0.287841	0.466501	0.377585	0.409016	0.003376	0.053172	0.351913	0.071429	19.27383	14	719.353	517468.8	73.36995	48.84123	472.2166	881.5552	0.116838	0.883575	0.991457	0.509192	0.402172	249.9553	1209	14.45586	2.4788	0.342432	0.510339	0.425972	0.45823	0.004001	0.069176	0.403421	0.111111	11.86082	9	651.686	424694.7	53.19683	34.64325	365.7277	715.6902	0.116838	0.883575	0.991457	0.509192	0.402172	249.9553	1209	14.45586	2.4788	0.342432	0.510339	0.425972	0.45823	0.004001	0.069176	0.403421	0.111111	11.86082	9	651.686	424694.7	53.19683	34.64325	365.7277	715.6902	0.116838	0.883575	0.991457	0.509192	0.402172	249.9553	1209	14.45586	2.4788	0.342432	0.510339	0.425972	0.45823	0.004001	0.069176	0.403421	0.111111	11.86082	9	651.686	424694.7	53.19683	34.64325	365.7277	715.6901	3.84E+09	45	0.160589	0.839805	0.985055	0.421306	0.298935	342.5136	1209	24.46754	3.32885	0.233251	0.422663	0.329198	0.359801	0.00292	0.04087	0.300404	0.045455	31.13465	22	778.4338	605959.2	50.56011	34.4199	302.1365	531.1585	0.160589	0.839805	0.985055	0.421306	0.298935	342.5136	1209	24.46754	3.32885	0.233251	0.422663	0.329198	0.359801	0.00292	0.04087	0.300404	0.045455	31.13465	22	778.4338	605959.2	50.56011	34.4199	302.1365	531.1585	0.160589	0.839805	0.985055	0.421306	0.298935	342.5136	1209	24.46755	3.32885	0.233251	0.422663	0.329198	0.359801	0.00292	0.04087	0.300404	0.045455	31.13465	22	778.4338	605959.2	50.56011	34.4199	302.1365	531.1585	COMPLETE
85	0.317204	03:22.5	03:40.3	0 days 00:00:17.782669	2.564328	300	2	11	0.042262	192	250																																																																																																																																																																																																																																				PRUNED

ahmedibatta · 2022-06-16T10:55:31Z

There is also another issue in HPO results

1- I saved all models from HPO and when I tried the best values of best trial (from trials.csv or best_pipline) I got almost same Mrr as mentioned in (Trials.csv) 0.37 but when I tried the saved model to evaluate the Mrr, I got higher Mrr 0.44.

this my code for evaluation

try best model

import torch
from typing import List
paire_model = torch.load('./models/82/trained_model.pkl')
from pykeen.evaluation import RankBasedEvaluator
evaluator = RankBasedEvaluator(
    filtered=True, )
results = evaluator.evaluate(
    model=paire_model,
    mapped_triples=testing.mapped_triples,
    additional_filter_triples=[
        training.mapped_triples,
        validation.mapped_triples,])

mberr · 2022-06-16T12:13:18Z

How big is your custom dataset?

ahmedibatta · 2022-06-18T18:13:25Z

How big is your custom dataset?

About 12000 triples, 4296 entities and 12 relations

mberr · 2022-06-18T19:16:45Z

Assuming that you use a sufficiently large portion of that as validation/test, the results should be somewhat stable 😕

So right now, my best guess is that your metrics fluctuate based on the random initialization.

ahmedibatta · 2022-06-18T19:43:00Z

Assuming that you use a sufficiently large portion of that as validation/test, the results should be somewhat stable 😕

So right now, my best guess is that your metrics fluctuate based on the random initialization.

Okay, I will try with different random seeds

my question exactly why the same model I saved through HPO have different result than the model I am trying with the values mentioned in the best_pipline file. In both cases, I used the same testing set, so the fluctuation does not depend on testing set or split of data.

mberr · 2022-06-18T19:48:31Z

my question exactly why the same model I saved through HPO have different result than the model I am trying with the values mentioned in the best_pipline file. In both cases, I used the same testing set, so the fluctuation does not depend on testing set or split of data.

In HPO, the model is always evaluated on validation data (which is intended). From your script, it looks like you evaluate on test data instead.

ahmedibatta · 2022-06-18T20:01:43Z

In HPO, the model is always evaluated on validation data (which is intended). From your script, it looks like you evaluate on test data instead.

Firstly, thanks for your quick reply.

yeah I know, I mean I used during HPO save_model_directory= './models/', to save model of trials.
and from the best_pipline file I got for example trial number 82 is the best so I just load the model from trial 82
paire_model = torch.load('./models/82/trained_model.pkl') that I mentioned in my above code and then evaluate the model.

And also, from the values of hyperparameters in the best_pipline, I built the model from beginning with same values and test on the testing set.
SO now , I am trying both models on the same testing set but got different results.

mberr · 2022-06-19T07:59:44Z

And also, from the values of hyperparameters in the best_pipline, I built the model from beginning with same values and test on the testing set.

Okay, so then you have two different model (with same hyperparameters), correct?

ahmedibatta · 2022-06-19T09:28:16Z

Okay, so then you have two different model (with same hyperparameters), correct?

yes, I thought I will get almost the same MRR or in the same range but I forgot that HPO does not have random seed to fix so the randomness of two model are different so this may be the cause of difference.

ahmedibatta added the question Further information is requested label Jun 13, 2022

pykeen locked and limited conversation to collaborators Jun 30, 2022

cthoyt converted this issue into discussion #1003 Jun 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Results of HPO #976

Results of HPO #976

ahmedibatta commented Jun 13, 2022

mberr commented Jun 14, 2022

ahmedibatta commented Jun 15, 2022

cthoyt commented Jun 15, 2022 •

edited

ahmedibatta commented Jun 15, 2022 •

edited by cthoyt

mberr commented Jun 15, 2022

ahmedibatta commented Jun 15, 2022

ahmedibatta commented Jun 16, 2022

mberr commented Jun 16, 2022

ahmedibatta commented Jun 18, 2022

mberr commented Jun 18, 2022

ahmedibatta commented Jun 18, 2022 •

edited

mberr commented Jun 18, 2022

ahmedibatta commented Jun 18, 2022

mberr commented Jun 19, 2022

ahmedibatta commented Jun 19, 2022

This issue was moved to a discussion.

This issue was moved to a discussion.

Results of HPO #976

Results of HPO #976

Comments

ahmedibatta commented Jun 13, 2022

What is your question

Environment

Issue Template Checks

mberr commented Jun 14, 2022

ahmedibatta commented Jun 15, 2022

cthoyt commented Jun 15, 2022 • edited

ahmedibatta commented Jun 15, 2022 • edited by cthoyt

mberr commented Jun 15, 2022

ahmedibatta commented Jun 15, 2022

ahmedibatta commented Jun 16, 2022

try best model

mberr commented Jun 16, 2022

ahmedibatta commented Jun 18, 2022

mberr commented Jun 18, 2022

ahmedibatta commented Jun 18, 2022 • edited

mberr commented Jun 18, 2022

ahmedibatta commented Jun 18, 2022

mberr commented Jun 19, 2022

ahmedibatta commented Jun 19, 2022

This issue was moved to a discussion.

cthoyt commented Jun 15, 2022 •

edited

ahmedibatta commented Jun 15, 2022 •

edited by cthoyt

ahmedibatta commented Jun 18, 2022 •

edited