Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to draw flatness curve in Figure 3? #11

Closed
FrankZhangRp opened this issue Apr 30, 2022 · 5 comments
Closed

How to draw flatness curve in Figure 3? #11

FrankZhangRp opened this issue Apr 30, 2022 · 5 comments

Comments

@FrankZhangRp
Copy link

Hi,
Thank you so much for providing this repo, the work is awesome!
And how can we reproduce the loss gap curve in Figure 3 of this paper? How to add the gamma on the model parameter and what is the metric of the distance in X-axis? I flat the model parameter dict into one vector and add a noise vector with norm 1.0 and get the loss gap about 0.2 on p domain test, I must have made a mistake on the Monte-Carlo approximation sampling.
Thanks a lot!

@khanrc
Copy link
Owner

khanrc commented May 2, 2022

Hi, thanks to the interest in our study.

We first sample an unit direction vector and compute the loss gap by changing the model parameter according to the radius gamma. The parameter difference can be computed by gamma * unit_direction_vector. The reported value is averaged over 100 sampled direction vectors. X-axis indicates the gamma.

Simple pytorch-style pseudo code is:

n_params = num_parameters(model)
direction_vector = torch.randn(n_params)
unit_direction_vector = direction_vector / torch.norm(direction_vector)
for gamma in gamma_list:
  noised_model = get_noised_model(model, unit_direction_vector * gamma)
  loss_gap = evaluate(noised_model) - evaluate(model)

@FrankZhangRp
Copy link
Author

got it! Very clear! Thanks a lot!

@khanrc khanrc closed this as completed May 2, 2022
@Wang-pengfei
Copy link

got it! Very clear! Thanks a lot!

The loss gap I get seems to be wrong. Did you solve this problem?

@brisker
Copy link

brisker commented Nov 18, 2022

about Figure 3 plotting mentioned here

  1. Is the model used in plotting figure 3, the final converged model , or the model during training?
  2. are all the parameters of every layer added with weight noise?
    @khanrc

@khanrc
Copy link
Owner

khanrc commented Nov 18, 2022

@brisker

  1. Three converged models are used. In particular, the models are converged before 1000 steps (See Fig. 5), and models from 2500, 3500, 4500 steps are used.
  2. Yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants