Add `effective_batch_size` to auto-adjust gradient accumulation #3533

tgaddair · 2023-08-15T17:53:23Z

For LLM fine-tuning, it's often the case that the batch size per GPU (batch_size in Ludwig) is very small (1 or 2), but the ideal batch size for model convergence is 32 or so. In these cases, we want to use gradient accumulation to compensate for the low batch size. Additionally, adding training workers with DeepSpeed further increases the effective batch size, meaning the user needs to do a lot of quick math to figure out how to set the gradient_accumulation_steps.

This PR adds a new trainer param called effective_batch_size that auto-adjusts gradient accumulation based on this value.

for more information, see https://pre-commit.ci

github-actions · 2023-08-15T20:20:30Z

Unit Test Results

  6 files ±0   6 suites ±0 1h 6m 47s ⏱️ - 2m 38s
34 tests ±0 29 ✔️ ±0   5 💤 ±0 0 ❌ ±0
88 runs ±0 72 ✔️ ±0 16 💤 ±0 0 ❌ ±0

Results for commit 49d264a. ± Comparison against base commit 090918d.

♻️ This comment has been updated with latest results.

for more information, see https://pre-commit.ci

…otal-bs

for more information, see https://pre-commit.ci

…otal-bs

arnavgarg1 · 2023-08-22T17:53:21Z

ludwig/schema/trainer.py

+            "The effective batch size is the total number of samples used to compute a single gradient update "
+            "to the model weights. This differs from `batch_size` by taking `gradient_accumulation_steps` and number "
+            "of training worker processes into account. In practice, "
+            "`effective_batch_size = batch_size * gradient_accumulation_steps * num_workers`. "
+            "If 'auto', the effective batch size is derivied implicitly from `batch_size`, but if set explicitly, then "
+            "one of `batch_size` or `gradient_accumulation_steps` must be set to something other than 'auto', and "
+            "consequently will be set following the formula given above."
+        ),


Nice, this is a very clear description of how this works

arnavgarg1 · 2023-08-22T17:55:15Z

ludwig/api.py

+        if not self.config_obj.trainer.can_tune_batch_size():
+            # Models like GBMs don't have batch sizes to be tuned
+            return


Is it worth logging a message here to indicate this very clearly?

This code path always gets executed regardless of user config, so I wouldn't add a message. Would likely confuse the user.

arnavgarg1

LGTM! This is an awesome change.

One thing I wanted to confirm is, what is the exact behavior when batch_size, effective_batch_size and gradient_accumulation_steps are set to auto? Is it that we don't update the batch_size and grad_accumulation_steps when we make the first call to self.config_obj.trainer.update_batch_size_grad_accum(num_workers), then actually perform batch size tuning if batch size tuning is auto to set it to an actual number, and now that we have workers + batch size, we recompute batch size (which doesn't change) but update gradient accumulation steps to 1 while leaving effective batch size to auto? but in the case that effective batch size is not auto, we calculate it because we have batch_size, num workers and the desired effective batch size?

tgaddair · 2023-08-23T16:20:41Z

@arnavgarg1 if everything is set to auto (the default), then it's equivalent to the current behavior where we just set the batch size to maximize GPU utilization and then set gradient accumulation to 1.

tgaddair added 11 commits August 10, 2023 13:05

WIP: effective_batch_size

d981c51

Auto update gradient_accumulation_steps

e0b556a

Set num_training_workers

a1c3367

Use num_training_workers

96f0bcb

Fixed gradient accum

2954298

Fixed schema

9e8e373

WIP: update grad_accum in tuner

7afa0f6

Fixed rendering batch_size and gradient_accumulation_steps

ec797b7

Fixed backend num_workers

27ff1c0

Remove restriction on setting everything to auto

bf849a6

Unit tests

70903f6

tgaddair requested review from geoffreyangus and arnavgarg1 August 15, 2023 17:53

[pre-commit.ci] auto fixes from pre-commit.com hooks

6f49cfb

for more information, see https://pre-commit.ci

tgaddair and others added 15 commits August 15, 2023 14:09

Fixed none trainer

107d4b2

Merge

1b4faff

[pre-commit.ci] auto fixes from pre-commit.com hooks

ee615fa

for more information, see https://pre-commit.ci

GBM

65f882e

Merge branch 'total-bs' of https://github.com/ludwig-ai/ludwig into t…

ddee785

…otal-bs

[pre-commit.ci] auto fixes from pre-commit.com hooks

39b73c3

for more information, see https://pre-commit.ci

Fixed tuning gbms

7cf6f75

Merge

73c1940

[pre-commit.ci] auto fixes from pre-commit.com hooks

07381bb

for more information, see https://pre-commit.ci

Merge branch 'master' into total-bs

a09705a

Fixed try finally

a7c2170

Tune for training

6ce70f9

Break out batch size tuning

1d4b33e

Plumb ray backend

3885f80

[pre-commit.ci] auto fixes from pre-commit.com hooks

9ce68e6

for more information, see https://pre-commit.ci

tgaddair added 2 commits August 20, 2023 21:03

Fixed signature

cbc25cd

Merge branch 'total-bs' of https://github.com/ludwig-ai/ludwig into t…

49d264a

…otal-bs

arnavgarg1 reviewed Aug 22, 2023

View reviewed changes

tgaddair merged commit 8d4c96b into master Aug 23, 2023
15 of 16 checks passed

tgaddair deleted the total-bs branch August 23, 2023 16:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `effective_batch_size` to auto-adjust gradient accumulation #3533

Add `effective_batch_size` to auto-adjust gradient accumulation #3533

tgaddair commented Aug 15, 2023

github-actions bot commented Aug 15, 2023 •

edited

Loading

arnavgarg1 Aug 22, 2023

arnavgarg1 Aug 22, 2023

tgaddair Aug 23, 2023

arnavgarg1 left a comment

tgaddair commented Aug 23, 2023

Add effective_batch_size to auto-adjust gradient accumulation #3533

Add effective_batch_size to auto-adjust gradient accumulation #3533

Conversation

tgaddair commented Aug 15, 2023

github-actions bot commented Aug 15, 2023 • edited Loading

Unit Test Results

arnavgarg1 Aug 22, 2023

Choose a reason for hiding this comment

arnavgarg1 Aug 22, 2023

Choose a reason for hiding this comment

tgaddair Aug 23, 2023

Choose a reason for hiding this comment

arnavgarg1 left a comment

Choose a reason for hiding this comment

tgaddair commented Aug 23, 2023

Add `effective_batch_size` to auto-adjust gradient accumulation #3533

Add `effective_batch_size` to auto-adjust gradient accumulation #3533

github-actions bot commented Aug 15, 2023 •

edited

Loading