implement_gsam_jax #4

juntang-zhuang · 2022-05-18T06:34:57Z

Implement GSAM algorithm proposed in Surrogate gap minimization improves sharpness-aware training, ICLR 2022, which is an improvement over SAM (Sharpness-Aware Minimization)

When config.rho_max == config.rho_min and config.alpha=0.0, the GSAM algorithm reduces to SAM.

akolesnikoff · 2022-05-27T11:55:15Z

Hi,

Thank you for contribution. As stated in the readme, we normally do not accept external contributions, but we are happy to make an exception for open-source implementations of published projects developed in big_vision.

However, according to the codebase principles, project-specific code should not add complexity to the core library parts, such as the main train loop. Thus, standalone projects are expected to fork the main train loop into big_vision/trainers/proj/<project name>/... and apply necessary modifications there. We plan to submit an example of how this works soon (~2 weeks from now). Maybe you wait for the example, and then update this pull request accordingly?

juntang-zhuang · 2022-06-01T03:15:56Z

Thanks a lot for the clarification! I will re-format and re-submit later according to the examples.

lucasb-eyer · 2022-06-24T22:57:02Z

hey, we now have an example of a project-specific trainer here: https://github.com/google-research/big_vision/tree/main/big_vision/trainers/proj/distill

If you are still interested in submitting gsam (we would like it!), could you sync to head and instead of modifying the core train.py, fork it into trainers/proj/gsam/train.py and do the modifications there?

Sorry for the delay on our side!

juntang-zhuang · 2022-06-25T07:22:26Z

Thanks a lot for the example! I have moved all changes to big_vision/trainers/proj/gsam, please let me know if it looks good.

lucasb-eyer

Thanks for your patience!

Would it be possible to add an example config? Ideally, one which produces some reference run from the paper. It would live in configs/proj/gsam/whatever.py? You would probably fork it off https://github.com/google-research/big_vision/blob/main/big_vision/configs/vit_i1k.py.

Also, in an ideal world, you would actually run this config, and show that it matches a number in the paper, and link the result here or at the top of the config, is that still possible, or you can't do that anymore?

lucasb-eyer · 2022-07-13T19:04:14Z

big_vision/trainers/proj/gsam/train.py

+# limitations under the License.
+
+"""Training loop example.
+This is a basic variant of a training loop, good starting point for fancy ones.


You should probably update this to something like "Trainer that implements SAM/GSAM optimizers"?

lucasb-eyer · 2022-07-13T19:15:15Z

big_vision/trainers/proj/gsam/train.py

+
+    if config.get("GSAM", False):
+      # Get the current learning rate.
+      learning_rate = sched_fns_cpu[0](step)


I highly doubt this is what you want. Note that you're calling a function that's been jit'ed onto the CPU from within a function that's pmap'ed onto GPU/TPU, so we have transfer at every single step happening here.

Why not call sched_fn[0](step) instead?

big_vision/trainers/proj/gsam/train.py

lucasb-eyer · 2022-07-13T19:17:36Z

big_vision/trainers/proj/gsam/train.py

+      return getattr(u, config.get("loss", "sigmoid_xent"))(
+          logits=logits, labels=labels)
+
+    if config.get("GSAM", False):


Since here we're specifically in the gsam/train.py, we can remove this config variable and if statement, and always execute the GSAM branch.

lucasb-eyer · 2022-07-13T19:18:31Z

big_vision/trainers/proj/gsam/train.py

+
+  ALPHA = config.get("alpha", 0.05)
+  ADAPTIVE_PERTURBATION = config.get("adaptive_perturbation", False)
+  MINIMIZE_FP = config.get("minimize_fp", True)


Each of these is actually only used exactly once below, so in our code style, we would not assign them to any variable, but just inline them where they are used, see comment below.

lucasb-eyer · 2022-07-13T19:27:16Z