New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Save load #1502
Save load #1502
Conversation
Good suggestion. I mayo change it a little bit because I'm not sure what happens if we put a GPU as a device (since there are other things than the model in the state), so it may end up just being a flag |
@sgugger The changes you made after this merge have broken this again. Trying to load an exported model on cpu-only machines get the cuda error in 1.0.42 and master. |
You have to re-export any previously saved |
Before making my comment I trained a
|
You should check your version on the machine you export your 'export.pkl' file then provide us with the code you're running. Just confirmed I had no problem loading an exported file on a CPU-only instance. |
Double checked both machines. Both running 1.0.42. Code used to generate model: from fastai import *
from fastai.vision import *
# Loading data
bs = 32
df = pd.read_csv('./labels.csv', header='infer')
data = (ImageItemList.from_df(df, path='.', folder='train')
.random_split_by_pct()
.label_from_df(label_cls=FloatList)
.transform(get_transforms(), size=299)
.databunch(bs=bs).normalize(imagenet_stats))
# Create model
learn = create_cnn(data, models.resnet101)
# Use mixed precision
learn.to_fp16()
# Find Learning Rate
learn.lr_find()
learn.recorder.plot()
# Fit
learn.fit_one_cycle(10)
learn.save('stage-1-101')
# Prep for fine tuning
learn.unfreeze()
learn.lr_find()
learn.recorder.plot()
# Fine Tuning
learn.fit_one_cycle(10, max_lr=slice(1e-6, 3e-4))
learn.save('stage-2-101')
# Export
learn.to_fp32()
learn.export('prod-101.pkl') |
Ok, the problem was different: |
Hello. |
You'll need to install fastai from master if you want this fix, as it's not in a release yet. (assuming you have the same issue with fp16) It was removed because the logic was changed to always save models for CPU. |
I have installed the last version of fastai with
I think it's more flexible to be able to choose while loading rather than at saving. |
The boolean cpu wasn't working: there was still an error on some machines with CPU only, that's why it was removed. If you still get the error, it means you have some tensors on the GPU saved, and they shouldn't really be there. Sharing your code would help us debug this. |
You'll have a déjà-vu, it's based on your DeepFrench notebook ;)
|
Ok, so it's the weights of your loss function that are the culprit. I'll try to find a more general fix that will go through everything in the state and save it on the CPU. In the meantime, pushed a fix that will load the Learner on cpu if that's the default device. Since I had reports that the map_location wasn't working before (though I couldn't reproduce) it may not work in 100% of the cases. |
Nice catch! |
Looking forward to the new release! I'l update Render's starter repo as soon as it's out.
…On Mon, Feb 04, 2019 at 7:27 AM, Legrand Thomas < ***@***.*** > wrote:
Nice catch!
It works when I save after removing.cuda() for the metrics.
Thank you for your help.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub (
#1502 (comment) ) , or mute
the thread (
https://github.com/notifications/unsubscribe-auth/AADqATDZUNinHdTrdKXOwFmhGpuwVaHhks5vKFFugaJpZM4aNsb_
).
|
Seeing the same error as @njaremko with |
It's only fixed in master, there hasn't been a new release yet. |
Thanks @sgugger. Any estimates on the next release date? |
I think sometime next week. There has been a lot of changes with my digging-out of kwargs everywhere, so we want to make sure it's stable and nothing is broken before the next release. |
The simplest way to accomplish that is with:
|
The
export
andload_learner
methods of Learner were only working when a gpu with cuda was available, so there were no possibility to export a model and then load it on a cpu only device.You can now do that by specifiying
device='cpu'
when callingload_learner