Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transfer Learning #1

Closed
ThomasNorr opened this issue Aug 4, 2022 · 2 comments
Closed

Transfer Learning #1

ThomasNorr opened this issue Aug 4, 2022 · 2 comments

Comments

@ThomasNorr
Copy link

Hello,

thanks for your fascinating work. I am trying to use the B-cos network (the densenet121 named “densenet_121_cossched”) in my research but I struggle with having it transfer effectively to smaller datasets, e.g. CUB2011. In fact, it overfits much more ( much worse final test acc) and improves a lot slower than the conventional densenet (In fact, only retraining the final layer leads to no learning whatsoever across a range of hyperparameters that all work for the conventional one). Since you have experience with training this network, I figure I might just ask you:

  • How sensitive was the training to optimized hyperparameters? Do I “just” need to tune regularization etc?
  • Have you done some experiments on transfer learning and maybe figured out effective methods?
  • Could B-cos networks be less suited for fine-grained-classification?
  • How exactly did you train the model? From my point of view your training code does not work, as the trainer class does not get a “loss” argument or am I missing something? I am encountering nan as loss, when using BCE.
  • Did you investigate the impact of norming w in Equation 3 in the paper (maybe with increased B) so that the model rescales the outputs himself?

Any answers would be greatly appreciated :)

Greetings

@moboehle
Copy link
Owner

moboehle commented Sep 9, 2022

Hi Gnabe, sorry for the delay in answering and thanks for your interest in the project!

Since there are no normalisation layers in the model, it is indeed somewhat more sensitive to a good choice of hyperparameters; in fact, we are currently investigating how to facilitate training for the B-cos networks. Regarding your questions:

  • Since the weight vectors are normalised before being applied to the input, an L2 normalisation on the model weights will probably not change much about the learning behaviour. If you need to regularise the models, I would recommend to use dropout or augmenting your data. (Potentially for B-cos models an L1 regularisation of the weights might work also).
  • So far, I have not used B-cos models for transfer learning, but I would not have expected any major problems. What issue are you facing exactly? If the loss is NaN or vanishingly small, you might need to adjust the layer scaling (scale_fact parameter in BcosConv2d) to ensure that the model output lies in a reasonable range.
  • Since the B-cos networks work well on ImageNet with a lot of fine-grained classification regarding dogs, I would say they seem to generally work for fine-grained classification too. However, fine-grained classifications such as on CUB might lead to less interpretable explanations, as the individual classes share many of the same features and the explanations for different classes might thus look similar.
  • Sorry for the confusion in the code. The trainer receives the loss argument via the exp_params (e.g., see
    "loss": CombinedLosses(LogitsBCE()),
    )
  • I assume you mean not norming w in equation 3? While this could potentially work, it gives the model more 'slack' w.r.t. how to solve the classification problem, which might hurt the interpretability. It might, of course, help the optimisation.

I hope this helps!

Best
Moritz

@ThomasNorr
Copy link
Author

Hi Moritz,

thanks for the answer.

  • That regularisation is hard is something I found out too. Dropout does not work (for me it did not and I think it changes the angle too much?) and I tried adding gaussian noise to slightly alter the angle, but that was also ineffective for me. Data augmentation has been very effective, but not sufficiently to prevent overfitting. I will try l1.
  • I think the problem mostly lies in the small dataset size of transfer learning problems, leading to overfitting
  • Good point regarding the localization, thanks.
  • I see, thanks.
  • Okay, thanks, I will try that too.

Thanks a lot :)

Best

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants