Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation on adding new recipe params #311

Merged
merged 17 commits into from
Feb 9, 2024

Conversation

RdoubleA
Copy link
Contributor

@RdoubleA RdoubleA commented Feb 6, 2024

Context

We need to provide users guidance on how params for new recipes should be added and integrate with the existing dataclass/config system. I've added it as a tutorial. In a follow up, we should consider how to consolidate this tutorial and the recipe tutorial in #316

Addresses #283

Test plan

image

Copy link

netlify bot commented Feb 6, 2024

Deploy Preview for torchtune-preview ready!

Name Link
🔨 Latest commit b77213a
🔍 Latest deploy log https://app.netlify.com/sites/torchtune-preview/deploys/65c57aef6862af000845e650
😎 Deploy Preview https://deploy-preview-311--torchtune-preview.netlify.app/examples/configs
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 6, 2024
```
In the dataclass, all fields should have defaults assigned to them. If a reasonable value cannot be assigned or it is a required argument, use the null value for that data type as the default and ensure that it is set by the user in the `__post_init__` (see Parameter Validation). The dataclass should go in the `recipes/params/` folder and the name of the file should match the name of the recipe file you are creating.

To link the dataclass object with config and CLI parsing, you can use the `TuneArgumentParser` object and funnel the parsed arguments into your dataclass.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: make this a helper function so it's much easier to set up in any recipe

config_to_params = {
os.path.join(ROOT_DIR, "alpaca_llama2_full_finetune.yaml"): FullFinetuneParams,
...,
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: figure out how to automate this

@NicolasHug
Copy link
Member

@RdoubleA do you need a review or is this still [WIP] (as the title prefix suggests)

@@ -53,3 +53,60 @@ To run the generation recipe, run this command from inside the main `/torchtune`
```
python -m recipes.alpaca_generate --native-checkpoint-path /tmp/finetune-llm/model_0.ckpt --tokenizer-path ~/llama/tokenizer.model --input "What is some cool music from the 1920s?"
```

 
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good for now maybe, but I would push for this to be in our actual documentation, not the README.

@RdoubleA RdoubleA changed the title [WIP] Add documentation on adding new recipes [WIP] Add documentation on adding new recipe params Feb 6, 2024
@RdoubleA RdoubleA changed the title [WIP] Add documentation on adding new recipe params Add documentation on adding new recipe params Feb 6, 2024
@RdoubleA
Copy link
Contributor Author

RdoubleA commented Feb 6, 2024

apologies for the confusion, this was pending some discussions around where to put tutorials and interactions with other tutorials. It is ready for review now @NicolasHug @kartikayk

@@ -0,0 +1,68 @@
======================================
Creating parameters for custom recipes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels so specific and without much context. Could this be part of a "Create a custom recipe" tutorial? On it's own, I don't know if it stands.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I agree, I think the plan is to merge with @kartikayk 's tutorial once we're both checked in

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, I love it!

Not sure I understand the reasoning of checking both in, then combining. Would it make more sense to wait for Kartikay to get his checked in, then rebase and add this stuff on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not the other way around :)

ya you're right that's simpler, there just seemed to be a lot of discussion on the recipe tutorial that I might as well get this in first. I'm ok to wait and rebase too, either works

Creating parameters for custom recipes
======================================

In general, you should expose the minimal amount of parameters you need to run and experiment with your recipes. These should be collected in a dataclass object that is passed into the recipe.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds a bit arbitrary without the reasoning i.e. you want to be thoughtful of what you expose because a) not everything is actually configured and b) you don't want to bloat configs since bloated configs are more error prone (typos, untested combinations of params etc), harder to read and harder to manage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I didn't want to go too much into the philosophy of what to expose in configs. but since users are the one designing it, it might be useful to share these details

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a line capturing the reasoning around why we have a dataclass i.e. single source of truth, easier to validate etc.


Write config
------------
Now that you've set up the recipe, the parameters dataclass, and the parser, you can create a simple config in :code:`recipes/configs/` that specifies values for any of the fields you defined in the dataclass. Anything that is not specified should have a default value in the dataclass.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe - "Let's assume you already have a recipe and want to add a config".


Testing configs
---------------
TorchTune has testing for every config added to the library, namely ensuring that it instantiates the dataclass and runs the recipe correctly. To add your config to this test suite, simply update the dictionary in :code:`recipes/tests/configs/test_configs.py`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users don't need to do this for configs which wont be checked in? So maybe prefix this with, "if you plan on adding a PR for landing this in the repo" or something like that

@RdoubleA RdoubleA force-pushed the rafiayub/config_documentation branch 3 times, most recently from d90c067 to ab1c112 Compare February 8, 2024 18:22
@@ -54,35 +54,25 @@ What Recipes are not?
- **Wrappers around external frameworks.** A recipe is **not** meant to be a wrapper around external frameworks. These are fully written in native-PyTorch using TorchTune building blocks. Dependencies are primarily in the form of additional utilities or interoperability with the surrounding ecosystem (eg: EluetherAI's evaluation harness).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while you're here: Eluether -> Eleuther

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guilty as charged

---------------
If you plan on contributing your config to the repo, we recommend adding it to the testing suite. TorchTune has testing for every config added to the library, namely ensuring that it instantiates the dataclass and runs the recipe correctly.

To add your config to this test suite, simply update the dictionary in :code:`recipes/tests/configs/test_configs.py`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to point directly to a variable or something? I can easily imagine something like this getting stale if we ever move things around

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah good point, yes there is


Testing configs
---------------
If you plan on contributing your config to the repo, we recommend adding it to the testing suite. TorchTune has testing for every config added to the library, namely ensuring that it instantiates the dataclass and runs the recipe correctly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol i didn't even know this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tutorial already doing its job.. #353

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r u even a torchtune dev

Command-line overrides
----------------------
To enable quick experimentation, you can specify override values to parameters in your config
via the :code:`tune` command. These should be specified with the flag :code:`--override k1=v1 k2=v2 ...`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I would give an explicit example here, I screwed this up the first few times I tried it. Might also be worth making this section a bit more visible tbh

Defining parameters for custom recipes
--------------------------------------

In general, you should expose the minimal amount of parameters you need to run and experiment with your recipes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is a nit cause overall I think this tutorial is pretty well-structured. But I feel like you are leading with config best practices when it may be better to lead with "how can I get something simple running". Only after that should you tell someone "hey now that you're ready to write your own stuff, here are best practices you should keep in mind"

Comment on lines +79 to +80
logger = utils.get_logger("DEBUG")
logger.info(msg=f"Running finetune_llm.py with parameters {params}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's not part of this PR per se, but do we plan to do something like this by default eventually?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see #329


.. code-block:: bash

tune <recipe> --config <config> --override ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't it also need to be added here first? Maybe I'm missing something obvious

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know about this. I'll update the tutorial for now but we should revisit this design

@RdoubleA RdoubleA force-pushed the rafiayub/config_documentation branch from 1751fb2 to 0444e6f Compare February 8, 2024 20:08
@kartikayk
Copy link
Contributor

@RdoubleA seems like some duplication since "Finetune your first LLM" appears twice. We should remove this from under tutorials.

Screenshot 2024-02-08 at 12 38 41 PM


There are two primary entry points for users to configure parameters: configs and
CLI overrides. Configs are YAML files that define all the parameters that the user
wishes to run a recipe with in a single location. These can be overridden on the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
wishes to run a recipe with in a single location. These can be overridden on the
wishes to run a recipe within a single location. These can be overridden on the

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait maybe ignore this.. I realized the with comes from "the user wishes to run a recipe with". But personally I would still change the phrasing here, why not "all the parameters needed to run a recipe", or something like that?

@RdoubleA RdoubleA force-pushed the rafiayub/config_documentation branch from fe93b53 to f5e4ecf Compare February 9, 2024 00:20
* Understand the :ref:`fundamentals of recipes<recipe_deepdive>`


Where parameters are sourced
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sourced here reads really weirdly. Maybe "Where do parameters live?"

RdoubleA and others added 4 commits February 8, 2024 17:02
Co-authored-by: Kartikay Khandelwal <47255723+kartikayk@users.noreply.github.com>
Co-authored-by: Kartikay Khandelwal <47255723+kartikayk@users.noreply.github.com>
Co-authored-by: Kartikay Khandelwal <47255723+kartikayk@users.noreply.github.com>
Co-authored-by: Kartikay Khandelwal <47255723+kartikayk@users.noreply.github.com>
Configs serve as the primary entry point for running recipes in TorchTune. They are
expected to be YAML files and simply list out values for parameters you want to define
for a particular run. The config parameters should be a subset of the dataclass parameters;
there should not be any new fields that are not in the dataclass. Any parameters that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
there should not be any new fields that are not in the dataclass. Any parameters that
there should not be any new fields that are not already in the dataclass. Any parameters that

Copy link
Contributor

@kartikayk kartikayk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits and suggestions, but overall looks good. Thanks for adding this!

@RdoubleA RdoubleA merged commit 1da83d6 into main Feb 9, 2024
15 checks passed
params = FullFinetuneParams(**args)

logger = utils.get_logger("DEBUG")
logger.info(msg=f"Running finetune_llm.py with parameters {params}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its now called full finetune

logger = utils.get_logger("DEBUG")
logger.info(msg=f"Running finetune_llm.py with parameters {params}")

recipe(params)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We call recipe(params), but I don't see where recipe is defined / imported. Is this expected to be executable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, it's meant to be a placeholder for the main function of the user's recipe. can add a comment to clarify that in a follow up, since this whole code chunk could change after #329

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants