Adds "full eval" HOWTO. #2111

andsteing · 2022-05-10T12:05:01Z

Adds a HOWTO explaining the problem and possible solutions for "full dataset processing", which is especially relevant for computing evaluation metrics.

The added function flax.jax_utils.pad_shard_unpad() is copied verbatim from big_vision and was created by @lucasb-eyer - thanks!

For reviewing this PR, see:

https://flax--2111.org.readthedocs.build/en/2111/howtos/full_eval.html

https://colab.research.google.com/github/andsteing/flax/blob/doc/docs/notebooks/full_eval.ipynb

Note that for running the Colab you'll need to replace the line

!pip install -q git+https://github.com/google/flax

with

!pip install -q git+https://github.com/andsteing/flax@doc

review-notebook-app · 2022-05-10T12:05:06Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

codecov-commenter · 2022-05-10T12:30:43Z

Codecov Report

Merging #2111 (d0822db) into main (7c56ba9) will increase coverage by 0.19%.
The diff coverage is 96.87%.

@@            Coverage Diff             @@
##             main    #2111      +/-   ##
==========================================
+ Coverage   74.91%   75.10%   +0.19%     
==========================================
  Files          59       59              
  Lines        5042     5094      +52     
==========================================
+ Hits         3777     3826      +49     
- Misses       1265     1268       +3

Impacted Files	Coverage Δ
flax/jax_utils.py	`66.66% <96.87%> (+10.28%)`	⬆️
flax/linen/__init__.py	`100.00% <0.00%> (ø)`
flax/linen/activation.py	`100.00% <0.00%> (ø)`
flax/linen/partitioning.py	`81.49% <0.00%> (+0.01%)`	⬆️
flax/traverse_util.py	`98.97% <0.00%> (+0.03%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7c56ba9...d0822db. Read the comment docs.

docs/howtos/full_eval.rst

marcvanzee

Looks great!

marcvanzee · 2022-05-12T08:11:16Z

docs/howtos/full_eval.rst

+does not form a complete batch at the end.
+
+
+The problem


It seems the problem is two-fold: double compilation (both in training and eval), and incorrect metric results (in eval). Is this correct? Maybe it is worth emphasizing this. Currently you state that is especially important during eval but you don't explain why.

I see it more like this

In eval, we care about processing the full dataset because otherwise the metrics are off. (in training we usually do multiple epochs and using some examples 1x less for training does not matter)

When we want to avoid loosing data at the end, we run into other problems (like e.g. multiple compliations)

I think 1. is mentioned in the first paragraph "Especially when evaluating a model, it is important that we process all examples", and 2. is mentioned further down as disadvantage of some solutions.

docs/howtos/full_eval.rst

cgarciae · 2022-05-12T11:05:26Z

One case that might be worth discussing is what to do if you update the metric state inside a compiled train_step/test_step. It would seem that you would have to use padding plus some form of masking?

@cgarciae

The added sub-section "Computing metrics in ``eval_step()``" and corresponding Colab cell show how to use the new argument to compute metrics inside the `eval_step()`. In response to comment by @cgarciae

andsteing · 2022-05-16T11:52:00Z

One case that might be worth discussing is what to do if you update the metric state inside a compiled train_step/test_step. It would seem that you would have to use padding plus some form of masking?

@cgarciae, please check out the added section about using a eval_step() with the new argument pad_shard_unpad(static_return)

docs/howtos/full_eval.rst

jheek · 2022-05-16T11:57:27Z

docs/howtos/full_eval.rst

+Adding "infinite padding"
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Above solution works in most cases, but it has some limitations:


another solution is to let each host process indepedently and do the pmean(metrics) add the very end in a seperate pmapped program

yes, this is actually very similar to what I do with count_correct_p() below

note that the above two sections Adding "infinite padding" and Computing metrics in eval_step() can easily be combined

so I think the different usecases are covered with the subsections, but if you feel there is a specific combination that should be added, feel free to add some more specific comments and I'll write it up.

Ah indeed a parallel stoppinig criteria is not so different when you use padding. I was thinking more towards a note on what to do when you don't want to pad. You might not want to include that route for simplicity though

but without padding different hosts would have different batch sizes? and this would also lead to re-compilations?

(in that case I think we should keep it simple with "solution=padding" since it seems superior, at the cost of a little more complicated code, but that added complication is quite small, especially in the case where one compute the metrics in the main eval loop)

flax/jax_utils.py

cgarciae · 2022-05-18T19:32:14Z

Left a new comment about static_return which is not urgent.
A previous comment that was marked as resolved says that maybe you were going to change the name vs_p to variables but I still see vs_p. Maybe I misunderstood, just wanted to clarify that in case its not intended.

cgarciae

LGTM @andsteing! Enjoyed the read, also made me realize all clu and jax_metrics metrics should support masking.

andsteing · 2022-05-19T11:50:58Z

@cgarciae yes exactly, ideally all metrics support a single mask feature that is expected exactly for this reason; how the mask is incorporated for intermediate value updates then depends on the individual metrics ...

andsteing force-pushed the doc branch 2 times, most recently from d764441 to b281a16 Compare May 10, 2022 12:14

andsteing requested review from marcvanzee and jheek May 10, 2022 12:14

andsteing force-pushed the doc branch from b281a16 to 613af1b Compare May 10, 2022 12:17

andsteing force-pushed the doc branch 2 times, most recently from 23afb4d to 2180895 Compare May 10, 2022 13:11

andsteing marked this pull request as ready for review May 10, 2022 13:11

andsteing force-pushed the doc branch 2 times, most recently from 742a8c5 to a6c6ad0 Compare May 10, 2022 13:16

Adds "full eval" HOWTO.

5a34691

andsteing force-pushed the doc branch from a6c6ad0 to 5a34691 Compare May 10, 2022 13:18

cgarciae reviewed May 10, 2022

View reviewed changes

docs/howtos/full_eval.rst Outdated Show resolved Hide resolved

Replaced vs[_p] with variables.

63e371a

andsteing requested a review from cgarciae May 10, 2022 15:05

marcvanzee approved these changes May 12, 2022

View reviewed changes

This was referenced May 12, 2022

Correctly compute metrics over entire evalset/testset #527

Closed

Write a guide, HOWTO and/or example that shows how to correctly deal with the last batch during eval #1850

Closed

marcvanzee assigned andsteing May 12, 2022

Adds static_return arg and updates rst/Colab.

f5dec70

The added sub-section "Computing metrics in ``eval_step()``" and corresponding Colab cell show how to use the new argument to compute metrics inside the `eval_step()`. In response to comment by @cgarciae

jheek reviewed May 16, 2022

View reviewed changes

docs/howtos/full_eval.rst Show resolved Hide resolved

jheek reviewed May 16, 2022

View reviewed changes

andsteing requested a review from jheek May 16, 2022 12:05

cgarciae reviewed May 18, 2022

View reviewed changes

flax/jax_utils.py Show resolved Hide resolved

Replaces last vs_p with variables.

d0822db

andsteing requested a review from cgarciae May 19, 2022 05:50

cgarciae reviewed May 19, 2022

View reviewed changes

andsteing removed the request for review from jheek May 19, 2022 11:49

jheek approved these changes May 19, 2022

View reviewed changes

andsteing added the pull ready label May 19, 2022

cgarciae approved these changes May 19, 2022

View reviewed changes

copybara-service bot merged commit 315da77 into google:main May 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds "full eval" HOWTO. #2111

Adds "full eval" HOWTO. #2111

andsteing commented May 10, 2022 •

edited

review-notebook-app bot commented May 10, 2022

codecov-commenter commented May 10, 2022 •

edited

marcvanzee left a comment

marcvanzee May 12, 2022

andsteing May 16, 2022

cgarciae commented May 12, 2022

andsteing commented May 16, 2022

jheek May 16, 2022

andsteing May 16, 2022

jheek May 16, 2022

andsteing May 16, 2022

cgarciae commented May 18, 2022 •

edited

cgarciae left a comment

andsteing commented May 19, 2022

Adds "full eval" HOWTO. #2111

Adds "full eval" HOWTO. #2111

Conversation

andsteing commented May 10, 2022 • edited

review-notebook-app bot commented May 10, 2022

codecov-commenter commented May 10, 2022 • edited

Codecov Report

marcvanzee left a comment

Choose a reason for hiding this comment

marcvanzee May 12, 2022

Choose a reason for hiding this comment

andsteing May 16, 2022

Choose a reason for hiding this comment

cgarciae commented May 12, 2022

andsteing commented May 16, 2022

jheek May 16, 2022

Choose a reason for hiding this comment

andsteing May 16, 2022

Choose a reason for hiding this comment

jheek May 16, 2022

Choose a reason for hiding this comment

andsteing May 16, 2022

Choose a reason for hiding this comment

cgarciae commented May 18, 2022 • edited

cgarciae left a comment

Choose a reason for hiding this comment

andsteing commented May 19, 2022

andsteing commented May 10, 2022 •

edited

codecov-commenter commented May 10, 2022 •

edited

cgarciae commented May 18, 2022 •

edited