Skip to content

[docs] training on specific hardware#44799

Open
stevhliu wants to merge 6 commits intohuggingface:mainfrom
stevhliu:hardware
Open

[docs] training on specific hardware#44799
stevhliu wants to merge 6 commits intohuggingface:mainfrom
stevhliu:hardware

Conversation

@stevhliu
Copy link
Copy Markdown
Member

updates the Hardware section of the docs for training:

  • combined CPU/Distributed CPU into a single doc
  • add more info to the Gaudi doc (mixed precision, torch.compile, distributed training)
  • add more info to the MPS doc (mixed precision, model loading + device selection)
  • remove the GPU doc since all that info is covered elsewhere now making it redundant

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@@ -15,23 +15,64 @@ rendered properly in your Markdown viewer.

# Intel Gaudi
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@regisss, would you mind taking a look here please? 🙏

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this work @stevhliu, I just left one comment :)

@stevhliu stevhliu requested a review from SunMarc March 17, 2026 18:38
Copy link
Copy Markdown
Member

@pcuenca pcuenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a quick look at the mps section, it looks good. Happy to take a look at the rest if you need it @stevhliu!

@stevhliu
Copy link
Copy Markdown
Member Author

thanks @pcuenca! happy to get your feedback on the rest if you don't mind/have the time!

Comment on lines -176 to -177
- local: perf_train_cpu_many
title: Distributed CPUs
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we redirect to perf_train_cpu?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think its better to merge the two cpu docs rather than redirect. the single perf_train_cpu doc is already quite thin and perf_train_cpu_many don't really fit the other docs in the section which are more focused on methods rather than hardware

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree! I'm talking about avoiding a 404 when users visit https://huggingface.co/docs/transformers/en/perf_train_cpu_many after it's gone.

So adding an entry here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohhh yes, my bad i misunderstood!

Refer to the [Gaudi docs](https://docs.habana.ai/en/latest/index.html) for more details.
## Mixed precision

All Gaudi generations support bf16 natively. Only Gaudi 2 and Gaudi 3 support fp16.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fp16 is not supported on any Gaudi generation 😁

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah thanks, i must've gotten confused here!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol is that a bug then?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I'm going to take a look at it and open a PR in Transformers :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants