You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The core of the issue here is 1) lack of mechanism to easily gauge state of the model (number of param groups, which param groups are unfrozen 2) methods accepting params without complaints that are in some way incompatible with the model.
I don't know what is the correct fix here and this is more for me about sharing ideas and observations. Maybe 1) is completely not needed, but there were times even with fastai 1x where I would have appreciated having a mechanism that would print out to me whether param groups were frozen or not. But I think this is at best a nice to have - once everything works as intended and blows up when there are issues I don't think this will add that much value.
2 is a bigger issue. Right now, even if I have just a single param group, I can call learn.freeze_to(-2). If I don't realize that the cutting of model didn't go as plan (as is the case right now where the cutting doesn't seem to work) I will never be informed of the problem (can still most likely infer that that is the case from the training time, etc, but that requires deeper understanding and paying attention).
Same for learn.fit. I can right now call `learn.fit([1, 1, ...]) with an arbitrarily long list of lrs and that method will not complain, regardless how many param groups there are.
Probably calling learn.freeze_to with an argument that is incompatible with the model should raise. With learn.fit and the lrs I am not sure how to handle this - the two options that come to mind are being okay with a single lr and with multiple lrs raising either when len(lrs) != len(param_groups) or len(lrs) != len(trainable_param_groups).
The text was updated successfully, but these errors were encountered:
I have a similar experience and end up creating a data class to store them.
Besides the info of frozen layers and parameter groups, extensions from callbacks can be additional sources. For example, I have to memorize the dynamic loss scale of fp16.
The state of the model can be seen in learn.summary() (defined in 15). This will show the parameters that are frozen and unfrozen and contains a line model forzen to parameter group number xxx
Trying to freeze up to a layer that's bigger than the number of parameter groups will issue a warning that the whole model is frozen
Trying to set an hyper-parameter with a collection containing more items than parameter groups will raise an exception (we chose to use number of parameter groups, not trainable parameter groups)
The core of the issue here is 1) lack of mechanism to easily gauge state of the model (number of param groups, which param groups are unfrozen 2) methods accepting params without complaints that are in some way incompatible with the model.
I don't know what is the correct fix here and this is more for me about sharing ideas and observations. Maybe 1) is completely not needed, but there were times even with fastai 1x where I would have appreciated having a mechanism that would print out to me whether param groups were frozen or not. But I think this is at best a nice to have - once everything works as intended and blows up when there are issues I don't think this will add that much value.
2 is a bigger issue. Right now, even if I have just a single param group, I can call
learn.freeze_to(-2)
. If I don't realize that the cutting of model didn't go as plan (as is the case right now where the cutting doesn't seem to work) I will never be informed of the problem (can still most likely infer that that is the case from the training time, etc, but that requires deeper understanding and paying attention).Same for
learn.fit
. I can right now call `learn.fit([1, 1, ...]) with an arbitrarily long list of lrs and that method will not complain, regardless how many param groups there are.Probably calling
learn.freeze_to
with an argument that is incompatible with the model should raise. Withlearn.fit
and the lrs I am not sure how to handle this - the two options that come to mind are being okay with a single lr and with multiple lrs raising either when len(lrs) != len(param_groups) or len(lrs) != len(trainable_param_groups).The text was updated successfully, but these errors were encountered: