ModelOnCPU() has destructive side effects? #2541

kennysong · 2020-04-09T07:19:06Z

Describe the bug

I'm training a language_model_learner to generate company names.

Surprisingly, I noticed that calling learn.export() mutates learn in some way that significantly drops the quality of the model.

I manually called each line of learn.export() to identify where this was going wrong, and found that the model quality drops when entering the with ModelOnCPU(learn.model) as m: context.

You can see what happens here:

(FYI, loading the exported model with load_learner also gives the same garbage predictions.)

Provide your installation details

=== Software === 
python        : 3.7.6
fastai        : 1.0.60
fastprogress  : 0.2.2
torch         : 1.4.0
nvidia driver : 418.87
torch cuda    : 10.1 / is available
torch cudnn   : 7603 / is enabled

=== Hardware === 
nvidia gpus   : 1
torch devices : 1
  - gpu0      : 15079MB | Tesla T4

=== Environment === 
platform      : Linux-4.9.0-12-amd64-x86_64-with-debian-9.12
distro        : #1 SMP Debian 4.9.210-1 (2020-01-20)
conda env     : base
python        : /opt/conda/bin/python
sys.path      : /home/jupyter/data/models
/opt/conda/lib/python37.zip
/opt/conda/lib/python3.7
/opt/conda/lib/python3.7/lib-dynload
/opt/conda/lib/python3.7/site-packages
/opt/conda/lib/python3.7/site-packages/IPython/extensions

To Reproduce

Perhaps you can reproduce with a smaller example? (It takes a few hours to train the model.)

If not, here is my code:

To download the dataset: https://gist.github.com/kennysong/daf7b77a5f42860059b312dd57968512
To train the language model and generate predictions: https://gist.github.com/kennysong/9572da21461e10c9c6e7d7c8f6902ab4

Expected behavior

learn.export() should not alter learn at all.

Screenshots

See above.

Additional context

Also, a similar drop in performance happens if I do learn.save() and learn2.load(). learn2 will generate the same garbage predictions. learn is unmodified.

The text was updated successfully, but these errors were encountered:

sgugger · 2020-04-09T12:18:19Z

Since you have the model available, it's going to be easier for you test than me. ModelOnCPU just does

model.cpu()
model.to(previous_device)

Could you just try this (instead of with ModelOnCPU(m) as m: pass) and see if it still gives you garbage predictions?

kennysong · 2020-04-09T14:08:28Z

Yep, it happens if I call those two lines as well.

Let me know if there's something else I should try.

sgugger · 2020-04-11T14:32:40Z

Just an update: I managed to reproduce this bug on a smaller scale and am trying to get some minimal reproducer to find its cause. Not sure yet if it's inside fastai or PyTorch. It's a very subtle one, so it might take me a while to get to the root of it.

kennysong · 2020-04-11T14:41:10Z

Got it, thanks for the update – let me know if I can help.

sgugger · 2020-04-13T17:22:17Z

Ok, finally went to the bottom of this. It was due to WeightDropout not really working. I fixed that, but note that you may need to adjust your weight dropout.

kennysong · 2020-04-14T07:10:24Z

Awesome, verified that it's working correctly on fastai master. Thank you Sylvain!

sgugger closed this as completed in b4b984d Apr 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ModelOnCPU() has destructive side effects? #2541

ModelOnCPU() has destructive side effects? #2541

kennysong commented Apr 9, 2020 •

edited

sgugger commented Apr 9, 2020

kennysong commented Apr 9, 2020

sgugger commented Apr 11, 2020

kennysong commented Apr 11, 2020

sgugger commented Apr 13, 2020

kennysong commented Apr 14, 2020

ModelOnCPU() has destructive side effects? #2541

ModelOnCPU() has destructive side effects? #2541

Comments

kennysong commented Apr 9, 2020 • edited

sgugger commented Apr 9, 2020

kennysong commented Apr 9, 2020

sgugger commented Apr 11, 2020

kennysong commented Apr 11, 2020

sgugger commented Apr 13, 2020

kennysong commented Apr 14, 2020

kennysong commented Apr 9, 2020 •

edited