Make CodeEval respect device_eval_batch_size #2969

josejg · 2024-02-07T04:23:18Z

What does this PR do?

Re-implementation of our CodeEval so that it respects the device eval batch size

What issue(s) does this change relate to?

Since the previous implementation silently overrode device_eval_batch_size to be generations_per_sample, there are two clear benefits of the rewrite:

Better utilization when generations_per_sample < device_eval_batch_size
Prevents OOM when generations_per_sample >> device_eval_batch_size

Regression testing

These are some regression tests using the PR code and the previous implementation (plus a patch to use temperature 0.2) which was not possible before @maxisawesome foundry PR (which was not merged at the time I ran these experiments). This experiments are with generations_per_sample=20 so some variance is expected.

Model	This PR	Before
meta-llama/Llama-2-7b-hf	12.47	13.69
codellama/CodeLlama-7b-hf	27.99	27.68
codellama/CodeLlama-13b-hf	30.73	30.37
deepseek-ai/deepseek-coder-6.7b-base	39.82	40.3
codellama/CodeLlama-34b-hf	41.31	42.29
deepseek-ai/deepseek-coder-33b-base	47.96	47.56
Phind/Phind-CodeLlama-34B-v2	65.12	65.27

Before submitting

Have you read the contributor guidelines?
Was this change discussed/approved in a GitHub issue first? It is much more likely to be merged if so.
Did you update any related docs and document your change?
Did you update any related tests and add any new tests related to your change? (see testing)
Did you run the tests locally to make sure they pass?
Did you run pre-commit on your change? (see the pre-commit section of prerequisites)

dakinggg

Overall lgtm, the metric thing is a little gross, but the current design does not afford any other options that i see, and i think its safe, so im good with it.

composer/datasets/in_context_learning_evaluation.py

composer/metrics/nlp.py

dakinggg

Could you include in the PR description evidence that before and after this PR, results don't change (assuming you carefully set the hparams to be the same). Or if that isn't possible for some reason, at least a number on a popular model that we can match to something. Otherwise, LGTM and will approve once you have that

composer/metrics/nlp.py

josejg · 2024-02-14T00:15:18Z

Could you include in the PR description evidence that before and after this PR, results don't change (assuming you carefully set the hparams to be the same). Or if that isn't possible for some reason, at least a number on a popular model that we can match to something. Otherwise, LGTM and will approve once you have that

Added experiments on public models to the PR description

composer/datasets/in_context_learning_evaluation.py

…atch_code_eval

josejg · 2024-02-14T03:28:09Z

I think we have a couple of tests that have the right intention but are testing the wrong thing:

The idea is to make sure that inputs aren't overly left padded. The issue is that the ICL dataset maps using dataset.map and decides the left padding based on the entire dataset. Before we were not catching that because we were only looking at all the examples.

To be fair, the test is testing for the ideal behaviour, the issue is that our ICL code doesn't do that and that contribution feels out of scope for this PR.

…atch_code_eval

josejg · 2024-02-14T04:38:08Z

Reworked some of the tests. However, I'd like to know whether we can remove unnecessary left padding in a per batch basis.

dakinggg

@josejg I think it looks good to me and you are right that the tests were simply incorrect before. Are you saying you want to trim the extra left padding from each individual batch (since padding is determined based on the full dataset as opposed to each individual batch)?

…atch_code_eval

composer/datasets/in_context_learning_evaluation.py

…atch_code_eval

composer/metrics/nlp.py

dakinggg

🚢

josejg requested a review from a team as a code owner February 7, 2024 04:23

josejg mentioned this pull request Feb 7, 2024

Make CodeEval respect device_eval_batch_size mosaicml/llm-foundry#956

Merged

dakinggg reviewed Feb 10, 2024

View reviewed changes

josejg added 3 commits February 11, 2024 13:57

Make CodeEval respect device_eval_batch_size

ec018a3

fix

b4a341c

fix

31782da

josejg force-pushed the batch_code_eval branch from 9069663 to 31782da Compare February 11, 2024 21:57

josejg added 5 commits February 12, 2024 22:39

Avoid materializing dataset in memory

a009fa5

fix

98d3d3c

fix

665e6de

Helper func

e143dc6

fix

e5758de

dakinggg reviewed Feb 13, 2024

View reviewed changes

composer/metrics/nlp.py Outdated Show resolved Hide resolved

Remove todo

c7239ec

eitanturok reviewed Feb 14, 2024

View reviewed changes

composer/datasets/in_context_learning_evaluation.py Show resolved Hide resolved

eitanturok reviewed Feb 14, 2024

View reviewed changes

composer/datasets/in_context_learning_evaluation.py Show resolved Hide resolved

eitanturok mentioned this pull request Feb 14, 2024

Replace num_beams with generations_per_sample mosaicml/llm-foundry#971

Closed

eitanturok and others added 4 commits February 14, 2024 03:18

fix documentation

2e358fa

use pre-commit run

e8fbed8

Tests

7483976

Merge branch 'batch_code_eval' of github.com:mosaicml/composer into b…

665ae18

…atch_code_eval

eitanturok and others added 3 commits February 14, 2024 04:02

fix error msg

080ed17

left padding

a4f2c59

Merge branch 'batch_code_eval' of github.com:mosaicml/composer into b…

8afbe07

…atch_code_eval

dakinggg reviewed Feb 14, 2024

View reviewed changes

josejg added 2 commits February 13, 2024 22:49

final touches

37bda69

Fix gpu test

4b625a6

josejg and others added 5 commits February 13, 2024 23:32

Tests

c288ea8

Return tensor

86b8657

Merge branch 'dev' into batch_code_eval

c4ba100

fix

bea48f6

Merge branch 'batch_code_eval' of github.com:mosaicml/composer into b…

7fbabd9

…atch_code_eval

maxisawesome reviewed Feb 14, 2024

View reviewed changes

composer/datasets/in_context_learning_evaluation.py Outdated Show resolved Hide resolved

maxisawesome reviewed Feb 14, 2024

View reviewed changes

composer/datasets/in_context_learning_evaluation.py Show resolved Hide resolved

josejg and others added 4 commits February 14, 2024 16:01

pyright

72e283a

Merge branch 'dev' into batch_code_eval

91aef41

fix pyright

84114a5

Merge branch 'batch_code_eval' of github.com:mosaicml/composer into b…

14292e1

…atch_code_eval

dakinggg reviewed Feb 15, 2024

View reviewed changes

composer/metrics/nlp.py Outdated Show resolved Hide resolved

josejg added 2 commits February 14, 2024 19:05

Allow subset batches

2628759

pyright

076d113

dakinggg approved these changes Feb 15, 2024

View reviewed changes

josejg merged commit 6a5972f into dev Feb 15, 2024
14 checks passed

josejg deleted the batch_code_eval branch February 15, 2024 16:27

josejg restored the batch_code_eval branch February 16, 2024 08:00

bmosaicml mentioned this pull request Feb 23, 2024

Migrate ICL classes to foundry mosaicml/llm-foundry#936

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make CodeEval respect device_eval_batch_size #2969

Make CodeEval respect device_eval_batch_size #2969

josejg commented Feb 7, 2024 •

edited

dakinggg left a comment

dakinggg left a comment

josejg commented Feb 14, 2024

josejg commented Feb 14, 2024 •

edited

josejg commented Feb 14, 2024

dakinggg left a comment

dakinggg left a comment

Make CodeEval respect device_eval_batch_size #2969

Make CodeEval respect device_eval_batch_size #2969

Conversation

josejg commented Feb 7, 2024 • edited

What does this PR do?

What issue(s) does this change relate to?

Regression testing

Before submitting

dakinggg left a comment

Choose a reason for hiding this comment

dakinggg left a comment

Choose a reason for hiding this comment

josejg commented Feb 14, 2024

josejg commented Feb 14, 2024 • edited

josejg commented Feb 14, 2024

dakinggg left a comment

Choose a reason for hiding this comment

dakinggg left a comment

Choose a reason for hiding this comment

josejg commented Feb 7, 2024 •

edited

josejg commented Feb 14, 2024 •

edited