New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add batch size tuning for LLMs #3871

Merged

Infernaught merged 22 commits into master from llm_batch_size

Jan 22, 2024

Contributor

Infernaught commented Jan 9, 2024

This PR extends Ludwig's batch size tuning functionality to LLMs.

For each batch size, we generate synthetic data in the following way:
We consider three values:
(1) The sum of the max_sequence_lengths of the input feature and the output feature
(2) The global_max_sequence_length
(3) The model's context length

If (1) is the smallest, then we generate synthetic inputs and outputs with the corresponding max_sequence_lengths.
If (2) is the smallest, then we generate synthetic inputs and outputs with length global_max_sequence_length/2 + 1.
If (3) is the smallest, then we generate synthetic inputs and outputs with length context_len/2 + 1.

Infernaught requested review from w4nderlust, tgaddair, justinxzhao, arnavgarg1, geoffreyangus, jeffkinnison and alexsherstinsky as code owners

January 9, 2024 23:13

github-actions bot commented Jan 9, 2024 •

edited

Loading

Unit Test Results

  6 files ±0   6 suites ±0 14m 16s ⏱️ -1s
12 tests ±0   9 ✔️ ±0   3 💤 ±0 0 ❌ ±0
60 runs ±0 42 ✔️ ±0 18 💤 ±0 0 ❌ ±0

Results for commit de19d8c. ± Comparison against base commit 138cc4a.

♻️ This comment has been updated with latest results.

Infernaught commented

View reviewed changes

ludwig/trainers/trainer_llm.py Show resolved Hide resolved

justinxzhao reviewed

View reviewed changes

ludwig/trainers/trainer_llm.py Outdated

Comment on lines 515 to 495

		input_msl = input_feature.input_shape[0]
		output_msl = output_feature.output_shape[0]

Contributor

justinxzhao Jan 10, 2024

Is this the most reliable way to get the MSL? Or should we be looking up properties in the feature's preprocessing configuration?

Contributor Author

Infernaught Jan 10, 2024 •

edited

Loading

Based on what I see here, it looks like at least the input_shape should provide a tighter upper bound than looking at the preprocessing configuration, but maybe I'm misinterpreting that line. However, I do think output_shape needs to be changed.

Contributor

justinxzhao Jan 10, 2024

Ok! Can you leave a quick comment about this in the code?

ludwig/trainers/trainer_llm.py Outdated

+                      snapshot_weights: bool = True,
+                      on_best_batch_size_updated: Optional[Callable[[int, float, int], None]] = None,
+                      tune_for_training: bool = True,
+                      max_sequence_length: Optional[int] = None,

Contributor

justinxzhao Jan 10, 2024

nit: Rename to global_max_sequence_length?

ludwig/trainers/trainer_llm.py Show resolved Hide resolved

Infernaught force-pushed the llm_batch_size branch 7 times, most recently from 3c22b27 to cda2a7b Compare

January 11, 2024 20:47

Infernaught commented

View reviewed changes

ludwig/trainers/trainer_llm.py Show resolved Hide resolved

Infernaught mentioned this pull request

Add batch size tuning docs ludwig-ai/ludwig-docs#341

Open

Infernaught force-pushed the llm_batch_size branch from 96855de to e708ede Compare

January 18, 2024 17:05

Infernaught requested a review from justinxzhao

January 18, 2024 18:50

arnavgarg1 reviewed

View reviewed changes

ludwig/trainers/trainer_llm.py Outdated

Comment on lines 526 to 527

		if not self.vocab_size:
		self.vocab_size = len(trainer.model.config_obj.input_features[0].encoder.vocab)

Contributor

arnavgarg1 Jan 19, 2024 •

edited

Loading

nit: Since the trainer object is available outside of the _TrainerBatchSizeEvaluator class, we don't actually need to add this if condition. Instead, we can just do

# This is useful to create the synthetic input and target data which will be a
# random sequence of integers between 0 and vocab_size
self.vocab_size = len(trainer.model.config_obj.input_features[0].encoder.vocab)

in the constructor itself.

arnavgarg1 reviewed

View reviewed changes

ludwig/trainers/trainer_llm.py Outdated

Comment on lines 531 to 532

		if trainer.model.config_obj.output_features[0].preprocessing.max_sequence_length:
		self.output_msl = trainer.model.config_obj.output_features[0].preprocessing.max_sequence_length

Contributor

arnavgarg1 Jan 19, 2024 •

edited

Loading

I think this can also move into the constructor, something like this:

# Get the length of the longest output sequence from the training data
self.output_msl = self.output_feature.output_shape[0]
if trainer.model.config_obj.output_features[0].preprocessing.max_sequence_length:
    # max_sequence_length here is the smaller value between the global max sequence length of the model
    # and the model's context length
    self.output_msl = trainer.model.config_obj.output_features[0].preprocessing.max_sequence_length

arnavgarg1 reviewed

View reviewed changes

ludwig/trainers/trainer_llm.py Show resolved Hide resolved

justinxzhao reviewed

View reviewed changes

ludwig/utils/batch_size_tuner.py Outdated

@@ @@ -51,7 +52,9 @@ def _is_valid_batch_size(batch_size): @@
                           gc.collect()
                           try:
-                              samples_per_sec = self.evaluate(batch_size, total_steps=5)
+                              samples_per_sec = self.evaluate(
+                                  batch_size, total_steps=5, global_max_sequence_length=global_max_sequence_length

Contributor

justinxzhao Jan 19, 2024

nit: Make this a constant in the file.

Contributor Author

Infernaught Jan 19, 2024

I'm not quite sure I understand. Why should samples_per_sec be a constant?

Contributor

arnavgarg1 Jan 22, 2024

@Infernaught I think @justinxzhao was referring to the 5 in total_steps=5.

Contributor Author

Infernaught Jan 22, 2024

Ohh I see

justinxzhao approved these changes

View reviewed changes

Infernaught added 4 commits

January 22, 2024 14:23


          Add batch size tuning for LLMs

a6c6490


          Change max_sequence_length to global_max_sequence_length

18e8f15


          Change max_sequence_length to global_max_sequence_length

4d4225d


          Fix output_msl

bc9b4d4

Infernaught and others added 15 commits

January 22, 2024 14:23


          Add comment

8359b42


          Fix output_msl

8f3ac93


          Set integer division

da9a20a


          Fix output_msl

8ef0c0b


          [pre-commit.ci] auto fixes from pre-commit.com hooks

2eeff8a

for more information, see https://pre-commit.ci


          Fix batch size tuning


          Add test

7ddcf7d


          Fix and move test

bd4010b


          Undo changes to test_trainer

0e12b10


          Update test_trainer.py

ddb3660


          [pre-commit.ci] auto fixes from pre-commit.com hooks

3f92f8a

for more information, see https://pre-commit.ci


          Fix failing test

56444f9


          Fix retrieval step function

9d311bf


          Fix batch size test

c3c60c7


          Address changes

82f0c7c

Infernaught force-pushed the llm_batch_size branch from 84e569e to 82f0c7c Compare

January 22, 2024 19:27

arnavgarg1 reviewed

View reviewed changes

tests/integration_tests/test_llm.py

Comment on lines +1258 to +1270

+              def test_llm_batch_size_tuning():
+                  dataset = pd.DataFrame({"instruction": ["a"] * 100, "output": ["a"] * 100})
+                  config = yaml.safe_load(
+                      """
+                  model_type: llm
+                  input_features:
+                      - name: instruction
+                        type: text
+                  output_features:
+                      - name: output
+                        type: text
+                  prompt:
+                      template: >-
+                          {instruction}

Contributor

arnavgarg1 Jan 22, 2024

@Infernaught Seeing the same test twice? on line 1258 and line 1348

Contributor

arnavgarg1 Jan 22, 2024

I think pre-commit is also complaining about this

Contributor Author

Infernaught Jan 22, 2024

Interesting. Probably an issue with merging?

Contributor

arnavgarg1 Jan 22, 2024

Likely so


          Add constant

6c4633b

Infernaught force-pushed the llm_batch_size branch from dd4462a to 6c4633b Compare

January 22, 2024 20:59

Infernaught and others added 2 commits

January 22, 2024 16:00


          Remove copy of test

cc9bb6a


          [pre-commit.ci] auto fixes from pre-commit.com hooks

de19d8c

for more information, see https://pre-commit.ci

arnavgarg1 approved these changes

View reviewed changes

Infernaught merged commit 91c8975 into master

18 checks passed

Infernaught deleted the llm_batch_size branch

January 22, 2024 22:02

vijayi1 pushed a commit to vijayi1/ludwig that referenced this pull request


          Add batch size tuning for LLMs (ludwig-ai#3871)

5c0221c

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Arnav Garg <arnav@predibase.com>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

justinxzhao justinxzhao approved these changes

arnavgarg1 arnavgarg1 approved these changes

w4nderlust Awaiting requested review from w4nderlust w4nderlust is a code owner

tgaddair Awaiting requested review from tgaddair tgaddair is a code owner

geoffreyangus Awaiting requested review from geoffreyangus geoffreyangus is a code owner

jeffkinnison Awaiting requested review from jeffkinnison jeffkinnison is a code owner

alexsherstinsky Awaiting requested review from alexsherstinsky alexsherstinsky is a code owner

Labels

None yet