Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamically set max_new_tokens based on output feature length, GMSL and model window size #3713

Merged
merged 14 commits into from
Oct 13, 2023

Conversation

arnavgarg1
Copy link
Contributor

@arnavgarg1 arnavgarg1 commented Oct 10, 2023

TLDR; This fixes a bug in model.evaluation() that caused underreporting of performance metrics in a majority of cases. By setting max_new_tokens to the largest possible value needed, we ensure that the metric number from model.evaluate() are a good representation of true model performance.


This PR updates the config validation setter methods to ensure that the generation.max_new_tokens parameter in the model configuration is set correctly based on the maximum sequence length of the output features. This ensures accurate token generation for LLM model types. It also includes handling for different cases and provides informative logging for configuration changes.

Specifically:

  1. If the user sets max_new_tokens in generation to a value != the default value, it is respected as is.
  2. Otherwise, we fall back to the max_sequence_length set for the output feature since that is the max number of tokens it learns during fine-tuning.
  3. Otherwise, we fall back to the global_max_sequence_length if neither case 1 or case 2 are met. This is an overestimation of the number of required tokens, but it is acceptable for now.
  4. Finally, if none of case 1, 2 or 3 are met, set max_new_tokens to the model's window size / 2. This feels like a good tradeoff of inference/evaluation time vs likely covering a majority of the total tokens allocated to the output feature. Can update this value in the future.

@github-actions
Copy link

github-actions bot commented Oct 10, 2023

Unit Test Results

  6 files    6 suites   23m 2s ⏱️
12 tests   9 ✔️   3 💤 0
60 runs  42 ✔️ 18 💤 0

Results for commit d40b75f.

♻️ This comment has been updated with latest results.

@arnavgarg1 arnavgarg1 changed the title Dynamically set max_new_tokens based on output feature length, GMSL and model window size [WIP] Dynamically set max_new_tokens based on output feature length, GMSL and model window size Oct 10, 2023
@arnavgarg1 arnavgarg1 marked this pull request as draft October 10, 2023 23:37
@arnavgarg1 arnavgarg1 changed the title [WIP] Dynamically set max_new_tokens based on output feature length, GMSL and model window size Dynamically set max_new_tokens based on output feature length, GMSL and model window size Oct 11, 2023
@arnavgarg1 arnavgarg1 marked this pull request as ready for review October 11, 2023 19:06
@@ -57,6 +59,29 @@ def test_set_pad_token_already_exists():
assert tokenizer.pad_token_id == 1


class TestSetContextLen:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting mechanic for grouping tests - curious if you saw this pattern recommended somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinxzhao Definitely didn't see it recommended anywhere, but I wanted to find a logical way to group these tests together since they're about the same "topic" but testing different aspects of it, so decided to write a class. Is that fine, or would you like me to just write 4 individual tests?

Copy link
Contributor Author

@arnavgarg1 arnavgarg1 Oct 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea in general is that since they're all testing the same function but different scenarios, it makes sense to either put them all in the same dedicated module for clarity or in some sort of container like a class. Typically you can just use parameterization but this one would require a lot of conditionals to be used in the test so I decided to skip it. Alternatively, I could also combine them all into one test. All options are ok - no strong preference.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see this being a useful way to organize tests particularly for very large test files. It's a bit more to maintain, but seems sufficiently lightweight.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, let me split them up!

ludwig/schema/model_types/utils.py Outdated Show resolved Hide resolved
ludwig/schema/model_types/utils.py Outdated Show resolved Hide resolved
@@ -57,6 +59,29 @@ def test_set_pad_token_already_exists():
assert tokenizer.pad_token_id == 1


class TestSetContextLen:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see this being a useful way to organize tests particularly for very large test files. It's a bit more to maintain, but seems sufficiently lightweight.

@arnavgarg1 arnavgarg1 merged commit cb1f7d9 into master Oct 13, 2023
17 checks passed
@arnavgarg1 arnavgarg1 deleted the max_sequence_length branch October 13, 2023 08:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants