Add support and test for KFAC-expand and KFAC-reduce #26

runame · 2023-10-17T20:29:20Z

Resolves #14 and partially resolves #13.

singd/optim/utils.py can probably still be improved, for now I focused on supporting KFAC-expand and KFAC-reduce and their correctness. However, I think it is already clean enough for a first release.

A note on one design choice I made: Here, I assume that the final loss is averaged over batch_size terms, since this is always the case for the losses we consider when conv layers are used. In contrast, here I assume that if the KFAC-expand approximation is used for linear layers, that the loss was also averaged over the sequence dimension when batch_averaged=True. This holds for language modelling, but e.g. with a vision transformer, a classification task, kfac_approx="expand", and batch_averaged=True this will lead to a mismatch of the scale of the preconditioner and the gradient. We could consider adding an additional flag for this or modifying the batch_averaged flag.

f-dangel · 2023-10-18T14:28:40Z

Thanks for the PR, I will take a look now.

Regarding the scaling issues: Do people usually write custom loss functions for the settings you describe, or do they rely on built-in ones? I think our implementation should prioritize easy usage together with PyTorch's built-in modular losses such as nn.MSELoss and nn.CrossEntropyLoss which both have a reduction='mean/sum' argument and also work with d>2-dimensional predictions. For custom loss functions, the user must specify more information than just the reduction argument.

runame · 2023-10-18T14:38:11Z

They usually rely on built-in ones and I agree that we should prioritize nn.MSELoss and nn.CrossEntropyLoss. However, this does not solve this issue; for example, see the vision transformer example I have described above. To rephrase it a bit, when using a vision transformer for image classification and we want to use reduction=mean, i.e. batch_averaged=True, the loss will be divided by batch_size. However, currently we divide the preconditioner by batch_size * sequence_length for KFAC-expand if batch_averaged=True. Conversely, if we choose to only divide by batch_size, this would mean that in a language modelling task we would only scale the preconditioner by batch_size, whereas the loss will be averaged over batch_size * sequence_length.

With other words, this discrepancy arises because we can apply both approximations, KFAC-expand and KFAC-reduce, in the expand AND the reduce setting, i.e. with a loss with N * R or N terms, where R is the sequence length. Without additional information, we cannot deduce the setting just based on the KFAC approximation we use.

f-dangel

These are the comments from my first pass through the PR.

There are ~5 bigger refactorings that might make sense to work on before giving it a second pass. I also complained about missing documentation, but don't worry too much for now.

singd/optim/optimizer.py

singd/optim/utils.py

test/optim/test_kfac.py

test/optim/utils.py

runame · 2023-10-19T14:44:34Z

Thanks a lot for the thorough review, I think I have addressed all comments and the test is already looking much better now.

f-dangel

Getting there! I only have one major refactoring request left + many smaller things.

Think we will need 1-2 more rounds to merge 👍

singd/optim/optimizer.py

test/optim/test_kfac.py

test/optim/utils.py

f-dangel

Only very minor things. Ping me once you've gone through them and I will finish things off 👍

test/optim/test_kfac.py

f-dangel · 2023-10-23T15:51:04Z

Looks good to me! Applied minor refactoring and fixes. Currently running CI and will merge if passing.

…v2d modules

runame added 4 commits October 17, 2023 22:08

Add naive Jacobian util for KFAC test

517739b

Add support for KFAC-expand and KFAC-reduce to utils

6f68077

Add support for KFAC-expand and KFAC-reduce to optimizer

1038ea9

Add test of exactness of KFAC-expand and KFAC-reduce

52b476b

runame added the enhancement New feature or request label Oct 17, 2023

runame requested a review from f-dangel October 17, 2023 20:29

f-dangel reviewed Oct 18, 2023

View reviewed changes

runame added 16 commits October 18, 2023 17:53

Improve kfac_approx doc string

70d9494

Improve ValueError when kfac_approx is set incorrectly

abae4b4

Fix typo

0dcb246

Parameterize KFAC tests with device

4e27884

More efficient num_params computation

9a3d45e

Fix random seed for KFAC tests

113fb69

Fix typo

38b61e6

Use flatten instead of reshape

e761cfb

Remove superfluous code

295ec66

Improve comments

3c41a77

Simplify reordering of blocks

0dbdf66

Add docstrings to KFAC test

9068e06

Set dtype locally and use einops for WeightShareModel

5b1ac3e

Clean up jacobians_naive

96a8e82

Skip tests on GPU when not available

cc4022d

Increase OUT_DIM to 2 to generalize KFAC test

6d1b7f8

runame requested a review from f-dangel October 19, 2023 14:43

f-dangel reviewed Oct 19, 2023

View reviewed changes

runame added 2 commits October 19, 2023 17:49

Style improvements

3740e30

Refactor all KFAC tests into one

9c27875

runame added 2 commits October 19, 2023 18:46

Improve device setting in KFAC test

016e92f

Add Conv2d expand setting test case

718bbaf

runame requested a review from f-dangel October 19, 2023 22:18

f-dangel reviewed Oct 20, 2023

View reviewed changes

test/optim/test_kfac.py Outdated Show resolved Hide resolved

test/optim/test_kfac.py Outdated Show resolved Hide resolved

test/optim/test_kfac.py Outdated Show resolved Hide resolved

runame mentioned this pull request Oct 21, 2023

Distinguish between expand and reduce setting and KFAC approximation #31

Closed

Refactor models with Conv2d module

e7d3ae1

runame requested a review from f-dangel October 21, 2023 17:01

f-dangel added 2 commits October 23, 2023 11:45

[REF] Implement conv2d model as function

43fe9a2

[REF] Use set

cbcc08b

f-dangel approved these changes Oct 23, 2023

View reviewed changes

[REF] Move conv2 function up

98d389b

f-dangel mentioned this pull request Oct 23, 2023

Refactor KFAC utils w.r.t. readability #32

Closed

runame added 2 commits October 23, 2023 22:23

Increase tolerance of compare_optimizers due to change of KFAC scaling

5d57fa1

Undo test tolerance change and use scaling for reduce setting for Con…

98d4ff0

…v2d modules

runame merged commit 0c121df into main Oct 24, 2023
14 checks passed

runame deleted the test-kfac branch October 24, 2023 00:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support and test for KFAC-expand and KFAC-reduce #26

Add support and test for KFAC-expand and KFAC-reduce #26

runame commented Oct 17, 2023

f-dangel commented Oct 18, 2023 •

edited

Loading

runame commented Oct 18, 2023

f-dangel left a comment

runame commented Oct 19, 2023

f-dangel left a comment

f-dangel left a comment

f-dangel commented Oct 23, 2023

Add support and test for KFAC-expand and KFAC-reduce #26

Add support and test for KFAC-expand and KFAC-reduce #26

Conversation

runame commented Oct 17, 2023

f-dangel commented Oct 18, 2023 • edited Loading

runame commented Oct 18, 2023

f-dangel left a comment

Choose a reason for hiding this comment

runame commented Oct 19, 2023

f-dangel left a comment

Choose a reason for hiding this comment

f-dangel left a comment

Choose a reason for hiding this comment

f-dangel commented Oct 23, 2023

f-dangel commented Oct 18, 2023 •

edited

Loading