feat(train): add accelerate for multi gpu training #2154

pkooij · 2025-10-09T13:18:38Z

This PR adds accelerate integration and docs to LeRobot. We keep it basic and not yet add all accelerate options (deep speed etc)

We use accelerate even when having a single GPU to make the code cleaner in lerobot_train.py and so we can utilize their methods for device discovery etc.
Added docs
Only main process does logging, uploading and dataset downloading

Tested

single gpu training on mac
single gpu training on H100
4x multi gpu training on H100

- Added support for multi-GPU training by introducing an `accelerator` parameter in training functions. - Updated `update_policy` to handle gradient updates based on the presence of an accelerator. - Modified logging to prevent duplicate messages in non-main processes. - Enhanced `set_seed` and `get_safe_torch_device` functions to accommodate accelerator usage. - Updated `MetricsTracker` to account for the number of processes when calculating metrics. - Introduced a new feature in `pyproject.toml` for the `accelerate` library dependency.

…esses - Added `init_logging` calls to ensure proper logging setup when using the accelerator and in standard training mode. - This change enhances the clarity and consistency of logging during training sessions.

HuggingFaceDocBuilderDev · 2025-10-09T13:22:25Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…gface/lerobot into feat/accelerate-melt-gpus

michel-aractingi

First look at the Pr its done very well, great job!
I just have two comments, I'll give it a deeper dive tomorrow and test it.

src/lerobot/utils/utils.py

src/lerobot/scripts/lerobot_train.py

src/lerobot/utils/utils.py

michel-aractingi

for me its LGTM, waiting for @imstevenpmwork approval

AdilZouitine and others added 3 commits October 2, 2025 18:11

add docs and only push model once

4b7cd72

pkooij assigned AdilZouitine and pkooij Oct 9, 2025

pkooij added enhancement Suggestions for new features or improvements policies Items related to robot policies labels Oct 9, 2025

Merge branch 'main' into feat/accelerate-melt-gpus

52751e8

pkooij and others added 21 commits October 10, 2025 10:09

Merge branch 'main' into feat/accelerate-melt-gpus

629bbca

Place logging under accelerate and update docs

95b6035

Merge branch 'feat/accelerate-melt-gpus' of https://github.com/huggin…

d709acf

…gface/lerobot into feat/accelerate-melt-gpus

fix pre commit

771b03c

Merge branch 'main' into feat/accelerate-melt-gpus

deaeb42

only log in main process

b65172f

Merge branch 'feat/accelerate-melt-gpus' of https://github.com/huggin…

8ebda30

…gface/lerobot into feat/accelerate-melt-gpus

main logging

63fcebd

try with local rank

a74affa

add tests

c711a62

change runner

4c40be5

fix test

43bef1d

dont push to hub in multi gpu tests

252bca9

Merge branch 'main' into feat/accelerate-melt-gpus

ed267d4

pre download dataset in tests

0d79130

Merge branch 'feat/accelerate-melt-gpus' of https://github.com/huggin…

2bc154e

…gface/lerobot into feat/accelerate-melt-gpus

small fixes

6486982

fix path optimizer state

a86cea5

update docs, and small improvements in train

50ff388

simplify accelerate main process detection

cabc47c

small improvements in train

a0d0b00

pkooij added 5 commits October 14, 2025 14:48

cleanup

4170d1b

fix bug

9950bfd

scale lr decay if we reduce steps

a66b50d

cleanup logging

f8a185f

fix formatting

ebf64bd

pkooij requested a review from michel-aractingi October 14, 2025 15:41

Merge branch 'main' into feat/accelerate-melt-gpus

5730b0e

pkooij marked this pull request as ready for review October 14, 2025 15:43

Merge branch 'main' into feat/accelerate-melt-gpus

1d86482

michel-aractingi reviewed Oct 14, 2025

View reviewed changes

src/lerobot/utils/utils.py Show resolved Hide resolved

src/lerobot/scripts/lerobot_train.py Outdated Show resolved Hide resolved

pkooij added 3 commits October 15, 2025 12:57

encorperate feedback pr

300d614

add min memory to cpu tests

c775d8d

use accelerate to determin logging

8a32764

michel-aractingi reviewed Oct 15, 2025

View reviewed changes

src/lerobot/utils/utils.py Outdated Show resolved Hide resolved

pkooij and others added 2 commits October 16, 2025 14:46

Merge branch 'main' into feat/accelerate-melt-gpus

ccdf06f

fix precommit and fix tests

8765b57

michel-aractingi previously approved these changes Oct 16, 2025

View reviewed changes

imstevenpmwork dismissed michel-aractingi’s stale review via 317a0bc October 16, 2025 15:13

imstevenpmwork changed the title ~~Add Accelerate -> melt gpus~~ feat(train): add accelerate for multi gpu training Oct 16, 2025

chore: minor details

0ef404c

imstevenpmwork force-pushed the feat/accelerate-melt-gpus branch from 317a0bc to 0ef404c Compare October 16, 2025 15:18

imstevenpmwork requested a review from michel-aractingi October 16, 2025 15:20

michel-aractingi approved these changes Oct 16, 2025

View reviewed changes

imstevenpmwork merged commit e82e7a0 into main Oct 16, 2025
17 checks passed

imstevenpmwork deleted the feat/accelerate-melt-gpus branch October 16, 2025 15:41

This was referenced Oct 17, 2025

Add distributed training with accelerate #317

Closed

Support multi-gpus training with accelerate #778

Closed

[WIP] Multi-gpus training with accelerate #1246

Closed

imstevenpmwork mentioned this pull request Oct 17, 2025

Release 0.4.0 #1654

Open

jadechoghari mentioned this pull request Oct 17, 2025

Are there plans to support distributed training? #1632

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(train): add accelerate for multi gpu training #2154

feat(train): add accelerate for multi gpu training #2154

Uh oh!

pkooij commented Oct 9, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Oct 9, 2025

Uh oh!

michel-aractingi left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

michel-aractingi left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

feat(train): add accelerate for multi gpu training #2154

feat(train): add accelerate for multi gpu training #2154

Uh oh!

Conversation

pkooij commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Oct 9, 2025

Uh oh!

michel-aractingi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

michel-aractingi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

pkooij commented Oct 9, 2025 •

edited

Loading