Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA error, possibly related to max_length #1

Closed
avacaondata opened this issue Feb 20, 2022 · 4 comments
Closed

CUDA error, possibly related to max_length #1

avacaondata opened this issue Feb 20, 2022 · 4 comments

Comments

@avacaondata
Copy link

I think the library should be installed isolated from transformers, because if one has another version of transformers with custom models or whatever, this breaks the environment, unnecessarily.

But the important point here is that it's not possible to train robertuito:

:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [64,0,0] Assertion `srcIndex < srcSelectDimSize` failed. 
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [65,0,0] Assertion `srcIndex < srcSelectDimSize` failed.                | 73/666 [00:38<00:45, 13.07it/s] 
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [66,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [67,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [68,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [69,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [70,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [71,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [72,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [73,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [74,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [75,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [76,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [77,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [78,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [79,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [80,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [81,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [82,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [83,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [84,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [85,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [86,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [87,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [88,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [89,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [90,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [91,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [92,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [93,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [94,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [95,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
[W 2022-02-20 18:25:42,448] Trial 0 failed because of the following error: RuntimeError('CUDA error: device-side assert triggered\nCUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1.')
Traceback (most recent call last):
  File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\optuna\study\_optimize.py", line 213, in _run_trial
    value_or_values = func(trial)
  File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\transformers\integrations.py", line 150, in _objective
    trainer.train(resume_from_checkpoint=checkpoint, trial=trial)
  File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\transformers\trainer.py", line 1365, in train
    tr_loss_step = self.training_step(model, inputs)
  File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\transformers\trainer.py", line 1950, in training_step
    self.scaler.scale(loss).backward()
  File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\torch\_tensor.py", line 307, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\torch\autograd\__init__.py", line 156, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Trying models on dataset exist22_1_es:   0%|                                                                                                                                             | 0/1 [00:49<?, ?it/s] 
Iterating over datasets...: 0it [00:49, ?it/s]
Traceback (most recent call last):
  File "run_experiments.py", line 3279, in <module>
    benchmarker()
  File "run_experiments.py", line 1196, in __call__
    self.optuna_hp_search()
  File "run_experiments.py", line 1470, in optuna_hp_search
    sampler=TPESampler(seed=420)
  File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\transformers\trainer.py", line 1847, in hyperparameter_search
    best_run = backend_dict[backend](self, n_trials, direction, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\transformers\integrations.py", line 160, in run_hp_search_optuna
    study.optimize(_objective, n_trials=n_trials, timeout=timeout, n_jobs=n_jobs)
  File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\optuna\study\study.py", line 409, in optimize
    show_progress_bar=show_progress_bar,
  File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\optuna\study\_optimize.py", line 76, in _optimize
    progress_bar=progress_bar,
  File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\optuna\study\_optimize.py", line 163, in _optimize_sequential
    trial = _run_trial(study, func, catch)
  File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\optuna\study\_optimize.py", line 264, in _run_trial
    raise func_err
  File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\optuna\study\_optimize.py", line 213, in _run_trial
    value_or_values = func(trial)
  File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\transformers\integrations.py", line 150, in _objective
    trainer.train(resume_from_checkpoint=checkpoint, trial=trial)
  File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\transformers\trainer.py", line 1365, in train
    tr_loss_step = self.training_step(model, inputs)
  File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\transformers\trainer.py", line 1950, in training_step
    self.scaler.scale(loss).backward()
  File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\torch\_tensor.py", line 307, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\torch\autograd\__init__.py", line 156, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

I have tried many other models in spanish and this doesn't happen, therefore it's directly related to your model, and not the model architecture (coming from transformers).

@finiteautomata
Copy link
Collaborator

finiteautomata commented Feb 20, 2022

Hi @alexvaca0. If you want to use RoBERTuito, you don't really need to install this package -- just install pysentimiento and use the preprocessing stage described in the README. If there is a problem with the installation of that package, report to the other repository and we can work it out to solve that dependency issue.

Regarding your stack trace, could you provide an example (if possible, on a Colab notebook) of what you are running? I think that there might be an issue with the max length of the tokenizer.

@avacaondata
Copy link
Author

avacaondata commented Feb 21, 2022

Actually what I meant is that I installed pysentimiento: pip install pysentimiento, and it installs transformers.

Providing an example of what I'm running is quite hard because I'm using my own Benchmarke which is a huge class with much functionality, let me first try to check if it is the max length and if it isn't I can code a whole example for showing it to you. What is the supposed max length of robertuito? @finiteautomata

@avacaondata avacaondata changed the title CUDA error, only with your model. CUDA error, possibly related to max_length Feb 21, 2022
@avacaondata
Copy link
Author

Thanks a lot for the suggestion @finiteautomata , I just checked your config files in this repo and corrected that on my code to hardcode 128 max length, and it's all solved. Thank you very much! :)

@finiteautomata
Copy link
Collaborator

Great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants