Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

learn.freeze_to and learn.fit_one_cycle will accept params that presuppose state of the model that is not correct #228

Closed
radekosmulski opened this issue Sep 28, 2019 · 2 comments

Comments

@radekosmulski
Copy link
Contributor

The core of the issue here is 1) lack of mechanism to easily gauge state of the model (number of param groups, which param groups are unfrozen 2) methods accepting params without complaints that are in some way incompatible with the model.

I don't know what is the correct fix here and this is more for me about sharing ideas and observations. Maybe 1) is completely not needed, but there were times even with fastai 1x where I would have appreciated having a mechanism that would print out to me whether param groups were frozen or not. But I think this is at best a nice to have - once everything works as intended and blows up when there are issues I don't think this will add that much value.

2 is a bigger issue. Right now, even if I have just a single param group, I can call learn.freeze_to(-2). If I don't realize that the cutting of model didn't go as plan (as is the case right now where the cutting doesn't seem to work) I will never be informed of the problem (can still most likely infer that that is the case from the training time, etc, but that requires deeper understanding and paying attention).

Same for learn.fit. I can right now call `learn.fit([1, 1, ...]) with an arbitrarily long list of lrs and that method will not complain, regardless how many param groups there are.

Probably calling learn.freeze_to with an argument that is incompatible with the model should raise. With learn.fit and the lrs I am not sure how to handle this - the two options that come to mind are being okay with a single lr and with multiple lrs raising either when len(lrs) != len(param_groups) or len(lrs) != len(trainable_param_groups).

@tianjianjiang
Copy link

I have a similar experience and end up creating a data class to store them.
Besides the info of frozen layers and parameter groups, extensions from callbacks can be additional sources. For example, I have to memorize the dynamic loss scale of fp16.

@sgugger
Copy link
Contributor

sgugger commented Sep 30, 2019

Fixed in several ways:

  • The state of the model can be seen in learn.summary() (defined in 15). This will show the parameters that are frozen and unfrozen and contains a line model forzen to parameter group number xxx
  • Trying to freeze up to a layer that's bigger than the number of parameter groups will issue a warning that the whole model is frozen
  • Trying to set an hyper-parameter with a collection containing more items than parameter groups will raise an exception (we chose to use number of parameter groups, not trainable parameter groups)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants