Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel restarted #69

Open
cahya-wirawan opened this issue May 8, 2020 · 1 comment
Open

Kernel restarted #69

cahya-wirawan opened this issue May 8, 2020 · 1 comment

Comments

@cahya-wirawan
Copy link
Contributor

I have some other problems to run the notebook CLS-DE.ipynb. If I use conda and install the default pytorch (1.3.1), after the command

exp.finetune_lm.train_(cls_dataset, num_epochs=20)

I get following error message:

ImportError: /tmp/torch_extensions/forget_mult_cuda/forget_mult_cuda.so: undefined symbol: _ZN3c106Symbol14fromQualStringERKSs

Then I installed pytroch from the pytorch channel as follow:

conda install pytorch=1.3.1 torchvision cudatoolkit=10.0 -c pytorch

The issue with "undefined symbol" is gone, but the kernel was restarted during the first epoch of exp.finetune_lm.train_(cls_dataset, num_epochs=20)

Is this known problem? Following is maybe the relevan python modules:

$ conda list| egrep 'torch|^fastai|cuda|nvid'
_pytorch_select           0.2                       gpu_0  
cudatoolkit               10.0.130                      0  
cudnn                     7.6.5                cuda10.0_0  
fastai                    1.0.61                        1    fastai
nvidia-ml-py3             7.352.0                    py_0    fastai
pytorch                   1.3.1           cuda100py37h53c1284_0  
torchvision               0.4.2           cuda100py37hecfc37a_0  

Thanks.

@cahya-wirawan
Copy link
Contributor Author

I fixed the kernel restarting after I use CUDA 9.2 instead of CUDA 10.0. It seems the model doesn't like the latest cuda version. Now the notebook runs properly to the end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant