Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recipe tutorial + TorchTune Overview #316

Merged
merged 16 commits into from
Feb 7, 2024
Merged

Recipe tutorial + TorchTune Overview #316

merged 16 commits into from
Feb 7, 2024

Conversation

kartikayk
Copy link
Contributor

@kartikayk kartikayk commented Feb 6, 2024

Context

As per title

Changelog

As per title

Test plan

image

Copy link

netlify bot commented Feb 6, 2024

Deploy Preview for torchtune-preview ready!

Name Link
🔨 Latest commit e5058f4
🔍 Latest deploy log https://app.netlify.com/sites/torchtune-preview/deploys/65c3bfc89ced2f0008961287
😎 Deploy Preview https://deploy-preview-316--torchtune-preview.netlify.app/examples/recipe_deepdive
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 6, 2024
Copy link
Contributor

@ebsmothers ebsmothers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First off, I love this PR. I'm really happy to see that we are making a concerted effort to properly educate users on extending and writing their own training recipes. Aside from the inline comments, a couple other general thoughts:

  • I feel we are mixing the "how" with they "why" a bit here. While some users may care about why we structured our recipes this way or what we don't envision our recipes as, other users will just want to understand the different pieces at some level of detail and how they fit together. (One counterpoint to this is that we need to educate users, but my countercounterpoint is that we need to engage them first.)
  • Focusing just on the "how", we should think a lot about what the most engaging way to get users to (a) easily use our recipes, (b) extend our recipes, and (c) write their own recipes. I claim it cannot be done in a single tutorial.
    • For (a) I could imagine a more config-focused tutorial, like "here's how to experiment with different stuff out of the box using recipe X".
    • I think you start addressing (b) at the end of this document, but I imagine we could def just do a deep-dive tutorial on "training Llama2 on dataset Y", where we walk through writing a dataset class and integrating into a recipe with minimal changes (e.g. we just change one of the _setup_* methods).
    • For (c) I think we should really provide an end-to-end walkthrough of building up an entire recipe in detail. This is also a really great way to show off all the different features of our library in a single place.


Recipes are the primary entry points for TorchTune users. These can be thought of as "targeted" end-to-end pipelines for training and optionally evaluating LLMs. Each recipe implements a training method (eg: full fine-tuning) with a set of meaningful features (eg: FSDP + Activation Checkpointing + Gradient Accumulation + Mixed Precision training) applied to a given model family (eg: Llama2).

As model training gets more and more complex, it becomes harder to anticipate new model architectures and training methodologies while also reasoning about every possible trade-off (eg: memory vs model quality). We believe a) users are best suited to make trade-offs specific to their use cases and b) there's no one-size-fits-all solution. As a result, recipes are meant to be easy to understand, extend and debug, *and not* generalized entry points for all possible settings.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo this line is a bit out of place, we are kind of getting into our philosophies in a way that's not immediately helpful to a user.

Comment on lines 21 to 25
Each recipe consists of three components:

- **Configurable parameters**, specified through yaml configs [example](../recipes/configs/alpaca_llama2_full_finetune.yaml), command-line overrides and dataclasses [example](../recipes/params.py)
- **Recipe Script**, entry-point which puts everything together including parsing and validating configs, setting up the environment, and correctly using the recipe class
- **Recipe Class**, core logic needed for training, exposed to users through a set of APIs [interface](../recipes/interfaces.py)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really like the way you've laid this out here

Comment on lines 31 to 35
### What Recipes are not?

- Monolithic Trainers. A recipe is **not** a monolithic trainer meant to support every possible feature through 100s of flags.
- Genealized entry-points. A recipe is **not** meant to support every possible model archichtecture or fine-tuning method.
- Wrappers around external frameworks. A recipe is **not** meant to be a wrapper around external frameworks. These are fully written in native-PyTorch using TorchTune building blocks. Dependencies are primarily in the form of additional utilities or interoperability with the surrounding ecosystem (eg: EluetherAI's evaluation harness).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a controversial opinion, but I think we should take this out. I know we said that we wanna be super strict about proper recipe usage, but imo a tutorial should have the tone of "let's try some stuff out and see what happens", while this does not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I touched upon this a bit in my other comment, but I really want to drive home the "why configs are structured this way" in this deep dive. Does that make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Configs -> recipes? Otherwise makes sense

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops yeh sorry, recipes


If you're new to TorchTune or to LLMs generally, configs would be the first concept to understand and get familiar with. If you're an advanced user writing your own recipes, adding config files will improve your experimentation velocity and ability to collaborate on experiments.

For more information on the structure of TorchTune configs, refer to the [Recipes README](../recipes/README.md)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I think it's fine to point to additional resources, if we are claiming this as a tutorial then we should at least provide some minimal example of a config here. Personally I as a user would like to see some sort of end-to-end recipe definition broken up into chunks that are elucidated by the surrounding text.


```
# Extract state dict from checkpoint
ckpt_dict = self.load_checkpoint(ckpt_path=params.model_checkpoint)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would also add some comment about how each of these _setup_* methods is user-defined, with pointers to the ones in full_finetune.py as a starting point.

Example script for [full fine-tuning](../recipes/full_finetune.py):

```
# Launch using TuneCLI which uses TorchRun under the hood
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think the order that we exposit these concepts is really dependent on who we want to read and understand the tutorial. If we just want to touch on all the individual concepts as a jumping-off point, then I think this order makes sense. But in contrast, if we want to really hold a user's hand through the entire process of writing their own recipe, then the main loop should be the last piece we show them to tie everything together.

for curr_epoch in range(self.epochs_run, self.total_epochs):

for idx, batch in enumerate(
pbar := tqdm(self._dataloader, disable=not (rank == 0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if we are omitting some details (which I think you do below), I would also take out pbar/tqdm here

Comment on lines 197 to 217
**Adding a new dataset**
- Add a new dataset to [datasets](../torchtune/datasets/)
- Add the new dataset and associated params to the [params dataclass](../recipes/params.py)
- If needed:
- Clone the recipe into a new file
- Update the ```_setup_data``` method to configure the new dataset, dataloader and sampler
- Update the ```train``` method to read the samples/batches correctly

 

**Adding a new model**
- Add a new model to [models](../torchtune/models/) with associated building blocks in [modules](../torchtune/modules/). More details in [this tutorial](../tutorials/)
- If needed:
- Clone the recipe into a new file
- Update the ```_setup_model``` method to correctly instantiate model and load the state dict
- Update the ```train``` method to call ```forward``` correctly

 

**Adding a new training method**
- TODO: Update this section after LoRA Recipe lands
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I think any one of these could serve as its own standalone tutorial

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 and may be better to define a new interface for this one so that the flexibility is clear


## What are Recipes?

Recipes are the primary entry points for TorchTune users. These can be thought of as "targeted" end-to-end pipelines for training and optionally evaluating LLMs. Each recipe implements a training method (eg: full fine-tuning) with a set of meaningful features (eg: FSDP + Activation Checkpointing + Gradient Accumulation + Mixed Precision training) applied to a given model family (eg: Llama2).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep lines short, otherwise it's impossible to make a review comment on a specific portion,.

@@ -0,0 +1,217 @@
# Training Recipe Deep-dive
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is creating a tutorials folder within the repo (not within the docs), is that by design? We already have a tutorials section on the docs, what's the plan with this newly added one?

@kartikayk
Copy link
Contributor Author

@ebsmothers all really good points. I think this first version is a bit of a brain dump from my side and I now need to clean this up. All of your questions are really helpful here.

While some users may care about why we structured our recipes this way or what we don't envision our recipes as, other users will just want to understand the different pieces at some level of detail and how they fit togethe

I very much agree with this. I think this is focusing on "why we structured our recipes this way or what we don't envision our recipes as". Let me comment on "users will just want to understand the different pieces" below.

a) easily use our recipes, (b) extend our recipes, and (c) write their own recipes. I claim it cannot be done in a single tutorial

Again this break down makes a lot of sense. @joecummings is working on a tutorial which should cover a). I think b) should definitely be a different tutorial. Let me update this to add c) here since it makes sense for this to be in this tutorial. Does this make sense?

Copy link
Contributor

@RdoubleA RdoubleA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's discuss offline on how this will integrate with tutorials on how to create a params dataclass for a recipe: #311


 

### What Recipes are not?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### What Recipes are not?
### What are Recipes not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, not sure about this change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to meta AI, both are grammatically correct. I've tried saying both in my head numerous times and now they both sound the same. so I guess your choice :)

### What Recipes are not?

- Monolithic Trainers. A recipe is **not** a monolithic trainer meant to support every possible feature through 100s of flags.
- Genealized entry-points. A recipe is **not** meant to support every possible model archichtecture or fine-tuning method.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Genealized entry-points. A recipe is **not** meant to support every possible model archichtecture or fine-tuning method.
- Generalized entry-points. A recipe is **not** meant to support every possible model architecture or fine-tuning method.

- Extract and validate training params
- Intialize [Recipe Class](#recipe-class) which in-turn intializes recipe state
- Load and Validate checkpoint to update recipe state if resuming training
- Initialize recipe components (model, tokeinzer, optimizer, loss and dataloader) from checkpoint (if applicable)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Initialize recipe components (model, tokeinzer, optimizer, loss and dataloader) from checkpoint (if applicable)
- Initialize recipe components (model, tokenizer, optimizer, loss and dataloader) from checkpoint (if applicable)


## Recipe Class

The Recipe Class carries the core logic for training a model. Each class implements a relevant [interface](../recipes/interfaces.py) and exposes a set of APIs. For fine-tuning, the structure of this class [[full finetune example](../recipes/full_finetune.py)] is as follows:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed earlier conversations on structure of recipes - but what I understood is that this serves more of a suggestion or a starting point. We should emphasize that here, otherwise this reads as if we're creating another Lightning API that users must follow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Structured classes != Trainers :)

Copy link
Contributor

@hardikjshah hardikjshah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets some small comments

- Intialize [Recipe Class](#recipe-class) which in-turn intializes recipe state
- Load and Validate checkpoint to update recipe state if resuming training
- Initialize recipe components (model, tokeinzer, optimizer, loss and dataloader) from checkpoint (if applicable)
- Train the model, with checkpoints at the end of every epoch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with checkpoints at the end of every epoch

This might be too prescriptive and better to drop or generalize as "save checkpoints"


**Adding a new dataset**
- Add a new dataset to [datasets](../torchtune/datasets/)
- Add the new dataset and associated params to the [params dataclass](../recipes/params.py)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is an open question around params that are recipe specific or params that are common across all recipes.

I think we should go with recipe specific params else that one params.py file will become huge and will not have clear ownership as everyone will add their own params to the same file.


#### How do I write my own recipe?

Before writing a new recipe, check the [recipes folder](../recipes/) to see if an existing recipe satisfies your use case. If not, following are some common scenarions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, its not clear what criteria one should use to decide if one should update a recipe or create a new one. Do you want to add some context around that ?

for eg. if a new dataset , new model, new training method are all new recipes -- then when is it better to edit an existing recipe and if you do that -- how do you ensure that you do not break anything for other users of that recipe.

- Add the new dataset and associated params to the [params dataclass](../recipes/params.py)
- If needed:
- Clone the recipe into a new file
- Update the ```_setup_data``` method to configure the new dataset, dataloader and sampler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think , this is ending up being a strong recommendation to use this class structure only.
Asking people to copy and update private methods seems a bit hacky.

Can we not define a more detailed protocol on top of the basic one ( FTRecipeInterface ) where these methods are not private ? Users should realize that these interfaces are a suggestion and they can reuse or define their own thing from scratch and the tutorial currently feels a bit too prescriptive.

Comment on lines 197 to 217
**Adding a new dataset**
- Add a new dataset to [datasets](../torchtune/datasets/)
- Add the new dataset and associated params to the [params dataclass](../recipes/params.py)
- If needed:
- Clone the recipe into a new file
- Update the ```_setup_data``` method to configure the new dataset, dataloader and sampler
- Update the ```train``` method to read the samples/batches correctly

 

**Adding a new model**
- Add a new model to [models](../torchtune/models/) with associated building blocks in [modules](../torchtune/modules/). More details in [this tutorial](../tutorials/)
- If needed:
- Clone the recipe into a new file
- Update the ```_setup_model``` method to correctly instantiate model and load the state dict
- Update the ```train``` method to call ```forward``` correctly

 

**Adding a new training method**
- TODO: Update this section after LoRA Recipe lands
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 and may be better to define a new interface for this one so that the flexibility is clear

@kartikayk
Copy link
Contributor Author

Thanks so much everyone for the helpful comments! I reduced the scope of the tutorial and focused primarily on the recipe deep dive. We can follow up with a "step-by-step tutorial for writing a recipe" in a separate tutorial. Some other changes suggested would be longer discussions.

@kartikayk
Copy link
Contributor Author

Also FYI - I snuck in the TorchTune Overview section into this PR as well :)

@kartikayk kartikayk changed the title Recipe tutorial Recipe tutorial + TorchTune Overview Feb 7, 2024

The library provides:

- Native-PyTorch implementations of popular LLMs, with convertors to transform checkpoints into TorchTune's format
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the team chat, @rohan-varma suggested to avoid using the term "TorchTune" and I can confirm that from a personal point of view and without further information about what that "TorchTune format" can be, I feel a bit put-off by having to deal with yet another kind of format.

This could probably be reformulated to feel less scary? (Sorry I don't have a suggestion)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, missed this - so as scary as it sounds, we do introduce a new format. Native-PyTorch implementation can mean a bunch of different things (in fact we have 3 different ways in which our own implementation could have evolved), but for the models we support the checkpoint format (state dict) is closely tied to our architecture implementation. You won't be able to load a HF cpt in our repo without converting for example. Maybe there's a better term than "format"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the checkpoint format (state dict) is closely tied to our architecture implementation

I actually don't think this is true. the state_dicts that we support are just expected to be fqn to tensor mappings. Of course, to load them into actual nn.Module instantiations, keys need to match up appropriately. But there's nothing specific here to TorchTune.


- Native-PyTorch implementations of popular LLMs, with convertors to transform checkpoints into TorchTune's format
- Training recipes for popular fine-tuning techniques with reference benchmarks and comprehensive correctness checks
- Integration with HuggingFace Datasets for training and EleutherAI's Eval Harness for evaluation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a link to both?

Suggested change
- Integration with HuggingFace Datasets for training and EleutherAI's Eval Harness for evaluation
- Integration with `HuggingFace Datasets <https://huggingface.co/docs/datasets/en/index>`_ for training and `EleutherAI's Eval <https://github.com/EleutherAI/lm-evaluation-harness>`_ Harness for evaluation

Design Principles
-----------------

TorchTune embodies `PyTorch’s design philosophy <https://pytorch.org/docs/stable/community/design.html>`_, especially "usability over everything else".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

over everything else

Lol, I see what you did there

Recipe Script
-------------

This is the primary entry point for each recipe and provides the user with control over how the recipe is setup, model(s) is(are)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This felt a bit overkill when reading

Suggested change
This is the primary entry point for each recipe and provides the user with control over how the recipe is setup, model(s) is(are)
This is the primary entry point for each recipe and provides the user with control over how the recipe is setup, how models are

docs/source/examples/recipe_deepdive.rst Show resolved Hide resolved

def setup(...):

# Load checkpoint from specified path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment and the vast majority of the other comments below are largely useless IMHO as they're not adding any new info to what the code already does. E.g.

        # Load tokenizer
        self._tokenizer = self._setup_tokenizer(...)

I'm all for code comments but they should be explaining "why" or "how", not "what". The "what" is very self-explainatory from this single line of code already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed a few obvious ones!

Training Recipe Deep-Dive
=========================

This tutorial will walk you through the design of training-recipes in TorchTune.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 general comments:

There's no link to existing recipes in this tutorial. This could be helpful to illustrate some of the points that are being made, or to "follow-along". Otherwise, this whole tutorial feels very abstract and not particularly actionable. As a first-time reader, I don't really know what to make of it.

Also I don't know if this is planned for later, but I feel like a main section is missing, i.e. "how to use existing recipes" (this should probably come after "Waht are recipes" and before "How should I structure my own recipe").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this and seemed like a new tutorial instead of adding to this. I think some of this is reflected in the tutorials that @joecummings and @RdoubleA are landing and so we can do a quick refactor after all of those are in.

Each recipe consists of three components:

- **Configurable parameters**, specified through yaml configs, command-line overrides and dataclasses
- **Recipe Script**, entry-point which puts everything together including parsing and validating configs, setting up the environment, and correctly using the recipe class
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Recipe script the code in recipe_main() method in full_finetune.py? Trying to understand the difference between Recipe Script and Recipe Class in full_finetune.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right - it's pretty simple right now. But this can get quite complex, especially with multi stage training etc

@kartikayk kartikayk merged commit 6a4c561 into main Feb 7, 2024
11 of 15 checks passed
@kartikayk kartikayk deleted the recipe_tutorial branch February 7, 2024 17:47
@RdoubleA RdoubleA mentioned this pull request Feb 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants