Dynamically set `max_new_tokens` based on output feature length, GMSL and model window size #3713

arnavgarg1 · 2023-10-10T19:37:47Z

TLDR; This fixes a bug in model.evaluation() that caused underreporting of performance metrics in a majority of cases. By setting max_new_tokens to the largest possible value needed, we ensure that the metric number from model.evaluate() are a good representation of true model performance.

This PR updates the config validation setter methods to ensure that the generation.max_new_tokens parameter in the model configuration is set correctly based on the maximum sequence length of the output features. This ensures accurate token generation for LLM model types. It also includes handling for different cases and provides informative logging for configuration changes.

Specifically:

If the user sets max_new_tokens in generation to a value != the default value, it is respected as is.
Otherwise, we fall back to the max_sequence_length set for the output feature since that is the max number of tokens it learns during fine-tuning.
Otherwise, we fall back to the global_max_sequence_length if neither case 1 or case 2 are met. This is an overestimation of the number of required tokens, but it is acceptable for now.
Finally, if none of case 1, 2 or 3 are met, set max_new_tokens to the model's window size / 2. This feels like a good tradeoff of inference/evaluation time vs likely covering a majority of the total tokens allocated to the output feature. Can update this value in the future.

… max sequence length and model window size

ludwig/schema/model_types/utils.py

github-actions · 2023-10-10T21:15:03Z

Unit Test Results

  6 files   6 suites 23m 2s ⏱️
12 tests   9 ✔️   3 💤 0 ❌
60 runs 42 ✔️ 18 💤 0 ❌

Results for commit d40b75f.

♻️ This comment has been updated with latest results.

ludwig/schema/model_types/utils.py

justinxzhao · 2023-10-12T13:56:23Z

tests/ludwig/utils/test_llm_utils.py

@@ -57,6 +59,29 @@ def test_set_pad_token_already_exists():
    assert tokenizer.pad_token_id == 1


+class TestSetContextLen:


Interesting mechanic for grouping tests - curious if you saw this pattern recommended somewhere?

@justinxzhao Definitely didn't see it recommended anywhere, but I wanted to find a logical way to group these tests together since they're about the same "topic" but testing different aspects of it, so decided to write a class. Is that fine, or would you like me to just write 4 individual tests?

The idea in general is that since they're all testing the same function but different scenarios, it makes sense to either put them all in the same dedicated module for clarity or in some sort of container like a class. Typically you can just use parameterization but this one would require a lot of conditionals to be used in the test so I decided to skip it. Alternatively, I could also combine them all into one test. All options are ok - no strong preference.

I can see this being a useful way to organize tests particularly for very large test files. It's a bit more to maintain, but seems sufficiently lightweight.

Got it, let me split them up!

ludwig/schema/model_types/utils.py

justinxzhao · 2023-10-12T22:11:54Z

tests/ludwig/utils/test_llm_utils.py

@@ -57,6 +59,29 @@ def test_set_pad_token_already_exists():
    assert tokenizer.pad_token_id == 1


+class TestSetContextLen:


I can see this being a useful way to organize tests particularly for very large test files. It's a bit more to maintain, but seems sufficiently lightweight.

arnavgarg1 added 2 commits October 10, 2023 22:34

Dynamically set max_new_tokens based on output feature length, global…

42d0fbc

… max sequence length and model window size

TODO comment

d9c8805

arnavgarg1 requested review from tgaddair and justinxzhao October 10, 2023 19:56

Fix comment'

00a97df

justinxzhao reviewed Oct 10, 2023

View reviewed changes

ludwig/schema/model_types/utils.py Outdated Show resolved Hide resolved

arnavgarg1 added 3 commits October 11, 2023 01:46

Address PR comments

7598806

Fix failing tests

a57db6f

Fix tests

1cec86f

arnavgarg1 commented Oct 10, 2023

View reviewed changes

ludwig/schema/model_types/utils.py Outdated Show resolved Hide resolved

Revert to base

58406b7

arnavgarg1 changed the title ~~Dynamically set max_new_tokens based on output feature length, GMSL and model window size~~ [WIP] Dynamically set max_new_tokens based on output feature length, GMSL and model window size Oct 10, 2023

Add temporary buffer

f6adf08

arnavgarg1 marked this pull request as draft October 10, 2023 23:37

arnavgarg1 added 2 commits October 11, 2023 20:51

Merge branch 'master' into max_sequence_length

db69970

Fix issues by defaulting to model context len // 2

27011e0

arnavgarg1 changed the title ~~[WIP] Dynamically set max_new_tokens based on output feature length, GMSL and model window size~~ Dynamically set max_new_tokens based on output feature length, GMSL and model window size Oct 11, 2023

arnavgarg1 marked this pull request as ready for review October 11, 2023 19:06

arnavgarg1 requested a review from justinxzhao October 11, 2023 19:06

arnavgarg1 added 2 commits October 11, 2023 21:08

Add better comment for test

a8a2cc8

Group tests under a single test class

4d198d4

arnavgarg1 requested review from jeffkinnison and Infernaught October 11, 2023 19:11

Refactor

3b5ec39

justinxzhao reviewed Oct 12, 2023

View reviewed changes

Address comments

d40b75f

arnavgarg1 requested a review from justinxzhao October 12, 2023 16:59

justinxzhao approved these changes Oct 12, 2023

View reviewed changes

arnavgarg1 merged commit cb1f7d9 into master Oct 13, 2023
17 checks passed

arnavgarg1 deleted the max_sequence_length branch October 13, 2023 08:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamically set `max_new_tokens` based on output feature length, GMSL and model window size #3713

Dynamically set `max_new_tokens` based on output feature length, GMSL and model window size #3713

arnavgarg1 commented Oct 10, 2023 •

edited

Loading

github-actions bot commented Oct 10, 2023 •

edited

Loading

justinxzhao Oct 12, 2023

arnavgarg1 Oct 12, 2023

arnavgarg1 Oct 12, 2023 •

edited

Loading

justinxzhao Oct 12, 2023

arnavgarg1 Oct 12, 2023

justinxzhao Oct 12, 2023

		@@ -57,6 +59,29 @@ def test_set_pad_token_already_exists():
		assert tokenizer.pad_token_id == 1


		class TestSetContextLen:

Dynamically set max_new_tokens based on output feature length, GMSL and model window size #3713

Dynamically set max_new_tokens based on output feature length, GMSL and model window size #3713

Conversation

arnavgarg1 commented Oct 10, 2023 • edited Loading

github-actions bot commented Oct 10, 2023 • edited Loading

Unit Test Results

justinxzhao Oct 12, 2023

Choose a reason for hiding this comment

arnavgarg1 Oct 12, 2023

Choose a reason for hiding this comment

arnavgarg1 Oct 12, 2023 • edited Loading

Choose a reason for hiding this comment

justinxzhao Oct 12, 2023

Choose a reason for hiding this comment

arnavgarg1 Oct 12, 2023

Choose a reason for hiding this comment

justinxzhao Oct 12, 2023

Choose a reason for hiding this comment

Dynamically set `max_new_tokens` based on output feature length, GMSL and model window size #3713

Dynamically set `max_new_tokens` based on output feature length, GMSL and model window size #3713

arnavgarg1 commented Oct 10, 2023 •

edited

Loading

github-actions bot commented Oct 10, 2023 •

edited

Loading

arnavgarg1 Oct 12, 2023 •

edited

Loading