You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to optimize AutoSF on a custom dataset. However, this triggers a device-side assert error in CUDA.
Here is the full trace:
I 2024-02-19 07:58:28,133] A new study created in memory with name: no-name-54ecbfdb-b81a-4379-b02a-ef5ffdd29652
INFO:pykeen.hpo.hpo:Using model: <class 'pykeen.models.unimodal.auto_sf.AutoSF'>
INFO:pykeen.hpo.hpo:Using loss: <class 'pykeen.losses.MarginRankingLoss'>
INFO:pykeen.hpo.hpo:Using optimizer: <class 'torch.optim.adam.Adam'>
INFO:pykeen.hpo.hpo:Using training loop: <class 'pykeen.training.slcwa.SLCWATrainingLoop'>
INFO:pykeen.hpo.hpo:Using negative sampler: <class 'pykeen.sampling.basic_negative_sampler.BasicNegativeSampler'>
INFO:pykeen.hpo.hpo:Using evaluator: <class 'pykeen.evaluation.rank_based_evaluator.RankBasedEvaluator'>
INFO:pykeen.hpo.hpo:Attempting to maximize both.realistic.inverse_harmonic_mean_rank
INFO:pykeen.hpo.hpo:Filter validation triples when testing: True
WARNING:pykeen.pipeline.api:No random seed is specified. Setting to 4229552334.
[W 2024-02-19 07:58:28,139] Trial 0 failed with parameters: {'model.embedding_dim': 128, 'loss.margin': 1.633297580856592, 'optimizer.lr': 0.04577728396873623, 'negative_sampler.num_negs_per_pos': 11, 'training.num_epochs': 400, 'training.batch_size': 4096} because of the following error: RuntimeError('CUDA error: device-side assert triggered\nCompile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.\n').
Traceback (most recent call last):
File "/home/synthesisproject/anaconda3/envs/vineeth_14/lib/python3.10/site-packages/optuna/study/_optimize.py", line 200, in _run_trial
value_or_values = func(trial)
File "/home/synthesisproject/anaconda3/envs/vineeth_14/lib/python3.10/site-packages/pykeen/hpo/hpo.py", line 309, in __call__raise e
File "/home/synthesisproject/anaconda3/envs/vineeth_14/lib/python3.10/site-packages/pykeen/hpo/hpo.py", line 259, in __call__
result = pipeline(
File "/home/synthesisproject/anaconda3/envs/vineeth_14/lib/python3.10/site-packages/pykeen/pipeline/api.py", line 1487, in pipeline
set_random_seed(_random_seed)
File "/home/synthesisproject/anaconda3/envs/vineeth_14/lib/python3.10/site-packages/pykeen/utils.py", line 298, in set_random_seed
generator = torch.manual_seed(seed=seed)
File "/home/synthesisproject/anaconda3/envs/vineeth_14/lib/python3.10/site-packages/torch/random.py", line 40, in manual_seed
torch.cuda.manual_seed_all(seed)
File "/home/synthesisproject/anaconda3/envs/vineeth_14/lib/python3.10/site-packages/torch/cuda/random.py", line 113, in manual_seed_all
_lazy_call(cb, seed_all=True)
File "/home/synthesisproject/anaconda3/envs/vineeth_14/lib/python3.10/site-packages/torch/cuda/__init__.py", line 183, in _lazy_callcallable()
File "/home/synthesisproject/anaconda3/envs/vineeth_14/lib/python3.10/site-packages/torch/cuda/random.py", line 111, in cb
default_generator.manual_seed(seed)
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
I have various combinations of parameters to see if that solves the problem, but it does not work even in this simplest case.
Environment
Key
Value
OS
posix
Platform
Linux
Release
4.18.0-305.19.1.el8_4.x86_64
Time
Mon Feb 19 08:04:23 2024
Python
3.10.11
PyKEEN
1.10.1
PyKEEN Hash
UNHASHED
PyKEEN Branch
PyTorch
2.0.1
CUDA Available?
true
CUDA Version
11.8
cuDNN Version
8700
Additional information
No response
Issue Template Checks
This is not a feature request (use a different issue template if it is)
This is not a question (use the discussions forum instead)
I've read the text explaining why including environment information is important and understand if I omit this information that my issue will be dismissed
The text was updated successfully, but these errors were encountered:
Describe the bug
I am trying to optimize AutoSF on a custom dataset. However, this triggers a device-side assert error in CUDA.
Here is the full trace:
How to reproduce
I have various combinations of parameters to see if that solves the problem, but it does not work even in this simplest case.
Environment
Additional information
No response
Issue Template Checks
The text was updated successfully, but these errors were encountered: