Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
|
||
| The examples below show how to subclass some of these methods. | ||
|
|
||
| ## get_train_dataloader |
There was a problem hiding this comment.
maybe you can review these sections @qgallouedec since they're TRL examples? 🙏
| @@ -0,0 +1,30 @@ | |||
| # Callbacks | |||
There was a problem hiding this comment.
you can ignore this section as its covered in #44239
SunMarc
left a comment
There was a problem hiding this comment.
Thanks a lot ! Left a couple of comment but overall really good !
docs/source/en/perf_train_gpu_one.md
Outdated
| ## GaLore | ||
|
|
||
| [Gradient Low-Rank Projection (GaLore)](https://hf.co/papers/2403.03507) significantly reduces memory usage when training large language models (LLMs). One of GaLores key benefits is *full-parameter* learning, unlike low-rank adaptation methods like [LoRA](https://hf.co/papers/2106.09685), which produces better model performance. | ||
|
|
||
| Install the [GaLore](https://github.com/jiaweizzhao/GaLore) and [TRL](https://hf.co/docs/trl/index) libraries. | ||
|
|
||
| ```bash | ||
| pip install galore-torch trl | ||
| ``` | ||
|
|
||
| Pick a GaLore optimizer (`"galore_adamw"`, `"galore_adafactor"`, `"galore_adamw_8bit`") and pass it to the `optim` parameter in [`trl.SFTConfig`]. Use the `optim_target_modules` parameter to specify which modules to adapt (can be a list of strings, regex, or a full path). | ||
|
|
There was a problem hiding this comment.
Not sure if I would put that here. The doc here is quite nice as it is super light. So adding this galore section will incite ppl in the future to put everything there as we ded in trainer.md.
There was a problem hiding this comment.
good point!
i'll move galore --> optimizers.md since thats what it actually is
move liger --> performance > speed optimizations > kernels (will do in future pr once i have those sections)
move neftune --> trainer cookbook recipes (will do in future pr once i have those sections)
this way we can keep this doc here more light and it focuses on general GPU training techniques?
|
|
||
| Subclass [`Trainer`] methods to change training behavior without rewriting the entire loop. Subclassing modifies the *training loop*, for example the forward pass or loss computation. | ||
|
|
||
| Before subclassing, consider whether you need to change *what* [`Trainer`] computes or *when* and *whether* it acts. For timing and conditional logic, use a [Callback](./trainer_callbacks) instead. Callbacks control when things happen (logging, evaluation, early stopping) and subclassing changes what happens (loss computation, data loading, optimization). |
There was a problem hiding this comment.
maybe add some examples of trainer in trl or axolotl
There was a problem hiding this comment.
added link to examples in trl/axolotl at the bottom
qgallouedec
left a comment
There was a problem hiding this comment.
nice, I shared some suggestions :)
docs/source/en/trainer_customize.md
Outdated
| | method | description | | ||
| |---|---| | ||
| | [`~Trainer.get_train_dataloader`] | create a training DataLoader | | ||
| | [`~Trainer.get_eval_dataloader`] | create an evaluation DataLoader | | ||
| | [`~Trainer.get_test_dataloader`] | create a test DataLoader | | ||
| | [`~Trainer.log`] | log information about the training process | | ||
| | [`~Trainer.create_optimizer_and_scheduler`] | create an optimizer and learning rate scheduler (can also be separately customized with [`~Trainer.create_optimizer`] and [`~Trainer.create_scheduler`] if they weren't passed in `__init__`) | | ||
| | [`~Trainer.compute_loss`] | compute the loss of a batch of training inputs | | ||
| | [`~Trainer.training_step`] | perform the training step | | ||
| | [`~Trainer.prediction_step`] | perform the prediction and test step | | ||
| | [`~Trainer.evaluate`] | evaluate the model and return the evaluation metric | |
There was a problem hiding this comment.
not convinced by the added value of this table. it seems redundant with the [[autodoc]] Trainer
There was a problem hiding this comment.
removed in favor of [[autodoc]] Trainer !
docs/source/en/training.md
Outdated
|
|
||
| metric = evaluate.load("accuracy") | ||
| - Set `bf16=True` for fast mixed precision training if your hardware supports it (Ampere+ GPUs). Otherwise, fall back to `fp16=True` on older hardware. | ||
| - Enable `gradient_accumulation_steps` and `gradient_checkpointing` to simulate training on larger batches and reduce memory usage. |
There was a problem hiding this comment.
mixing these two seems a bit weird. Maybe you meant per_device_train_batch_size instead of gradient_checkpointing?
There was a problem hiding this comment.
splitted into two separate points to avoid conflating the two!
part 1 of refactoring the
Trainerdocstoctreea bit to accommodate new sections and docstrainer.mdto be a clearer entry point (will expand the## Next stepssection as we continue for better navigation). everything else here is either moved to their relevant sections or removed because its duplicate contenttraining.mdtutorial to show training a language model instead of BERT loltrainer_customize.mdguide showing how to subclassget_train_dataloaderandcompute_lossusing real-world examples from TRL (we can add more examples here later)