CUDA error, possibly related to max_length #1

avacaondata · 2022-02-20T17:28:00Z

I think the library should be installed isolated from transformers, because if one has another version of transformers with custom models or whatever, this breaks the environment, unnecessarily.

But the important point here is that it's not possible to train robertuito:

:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [64,0,0] Assertion `srcIndex < srcSelectDimSize` failed. 
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [65,0,0] Assertion `srcIndex < srcSelectDimSize` failed.                | 73/666 [00:38<00:45, 13.07it/s] 
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [66,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [67,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [68,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [69,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [70,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [71,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [72,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [73,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [74,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [75,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [76,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [77,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [78,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [79,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [80,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [81,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [82,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [83,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [84,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [85,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [86,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [87,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [88,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [89,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [90,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [91,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [92,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [93,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [94,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [95,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
[W 2022-02-20 18:25:42,448] Trial 0 failed because of the following error: RuntimeError('CUDA error: device-side assert triggered\nCUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1.')
Traceback (most recent call last):
  File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\optuna\study\_optimize.py", line 213, in _run_trial
    value_or_values = func(trial)
  File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\transformers\integrations.py", line 150, in _objective
    trainer.train(resume_from_checkpoint=checkpoint, trial=trial)
  File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\transformers\trainer.py", line 1365, in train
    tr_loss_step = self.training_step(model, inputs)
  File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\transformers\trainer.py", line 1950, in training_step
    self.scaler.scale(loss).backward()
  File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\torch\_tensor.py", line 307, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\torch\autograd\__init__.py", line 156, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Trying models on dataset exist22_1_es:   0%|                                                                                                                                             | 0/1 [00:49<?, ?it/s] 
Iterating over datasets...: 0it [00:49, ?it/s]
Traceback (most recent call last):
  File "run_experiments.py", line 3279, in <module>
    benchmarker()
  File "run_experiments.py", line 1196, in __call__
    self.optuna_hp_search()
  File "run_experiments.py", line 1470, in optuna_hp_search
    sampler=TPESampler(seed=420)
  File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\transformers\trainer.py", line 1847, in hyperparameter_search
    best_run = backend_dict[backend](self, n_trials, direction, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\transformers\integrations.py", line 160, in run_hp_search_optuna
    study.optimize(_objective, n_trials=n_trials, timeout=timeout, n_jobs=n_jobs)
  File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\optuna\study\study.py", line 409, in optimize
    show_progress_bar=show_progress_bar,
  File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\optuna\study\_optimize.py", line 76, in _optimize
    progress_bar=progress_bar,
  File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\optuna\study\_optimize.py", line 163, in _optimize_sequential
    trial = _run_trial(study, func, catch)
  File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\optuna\study\_optimize.py", line 264, in _run_trial
    raise func_err
  File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\optuna\study\_optimize.py", line 213, in _run_trial
    value_or_values = func(trial)
  File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\transformers\integrations.py", line 150, in _objective
    trainer.train(resume_from_checkpoint=checkpoint, trial=trial)
  File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\transformers\trainer.py", line 1365, in train
    tr_loss_step = self.training_step(model, inputs)
  File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\transformers\trainer.py", line 1950, in training_step
    self.scaler.scale(loss).backward()
  File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\torch\_tensor.py", line 307, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\torch\autograd\__init__.py", line 156, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

I have tried many other models in spanish and this doesn't happen, therefore it's directly related to your model, and not the model architecture (coming from transformers).

The text was updated successfully, but these errors were encountered:

finiteautomata · 2022-02-20T18:10:56Z

Hi @alexvaca0. If you want to use RoBERTuito, you don't really need to install this package -- just install pysentimiento and use the preprocessing stage described in the README. If there is a problem with the installation of that package, report to the other repository and we can work it out to solve that dependency issue.

Regarding your stack trace, could you provide an example (if possible, on a Colab notebook) of what you are running? I think that there might be an issue with the max length of the tokenizer.

avacaondata · 2022-02-21T08:12:48Z

Actually what I meant is that I installed pysentimiento: pip install pysentimiento, and it installs transformers.

Providing an example of what I'm running is quite hard because I'm using my own Benchmarke which is a huge class with much functionality, let me first try to check if it is the max length and if it isn't I can code a whole example for showing it to you. What is the supposed max length of robertuito? @finiteautomata

avacaondata · 2022-02-21T11:06:08Z

Thanks a lot for the suggestion @finiteautomata , I just checked your config files in this repo and corrected that on my code to hardcode 128 max length, and it's all solved. Thank you very much! :)

finiteautomata · 2022-02-21T11:42:44Z

Great!

avacaondata changed the title ~~CUDA error, only with your model.~~ CUDA error, possibly related to max_length Feb 21, 2022

avacaondata closed this as completed Feb 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA error, possibly related to max_length #1

CUDA error, possibly related to max_length #1

avacaondata commented Feb 20, 2022

finiteautomata commented Feb 20, 2022 •

edited

avacaondata commented Feb 21, 2022 •

edited

avacaondata commented Feb 21, 2022

finiteautomata commented Feb 21, 2022

CUDA error, possibly related to max_length #1

CUDA error, possibly related to max_length #1

Comments

avacaondata commented Feb 20, 2022

finiteautomata commented Feb 20, 2022 • edited

avacaondata commented Feb 21, 2022 • edited

avacaondata commented Feb 21, 2022

finiteautomata commented Feb 21, 2022

finiteautomata commented Feb 20, 2022 •

edited

avacaondata commented Feb 21, 2022 •

edited