Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

E2E Tutorial #690

Merged
merged 10 commits into from
Apr 13, 2024
Merged

E2E Tutorial #690

merged 10 commits into from
Apr 13, 2024

Conversation

kartikayk
Copy link
Contributor

Context

Adding an e2e tutorial which captures the complete flow within TorchTune. This tutorial is a good overview of the capabilities of the library and so adding this to the Getting Started section instead of burying it in the tutorials section.

Changelog

  • Add said tutorial
  • Couple of other fixes including changing eval and quantize configs to default to HF models
  • A short fix to the quantize API

Test plan

  • Ran all of the recipes

Tutorial Rendered

Screenshot 2024-04-10 at 6 03 38 PM

Updates to the entry page rendered

Screenshot 2024-04-10 at 6 04 12 PM

Copy link

pytorch-bot bot commented Apr 11, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/690

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ac0aae2 with merge base a783aca (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 11, 2024
will depend on factors such as the model, amount and nature of training data, your hardware
setup and the end task for which the model will be used
- Evaluate the model on some benchmarks to validate model quality
- Run some generations to make sure the model output looks reasonable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would switch "run generations" and "evaluate model". I find it faster and easier to run generations to check reasonable output first.

TorchTune, and how TorchTune makes it easy to use popular tools and libraries from the ecosystem.

We'll use the Llama2 7B model for this tutorial. You can find a complete set of models supported
by TorchTune `here <https://github.com/pytorch/torchtune/blob/main/README.md#introduction>`_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should change this eventually. The official set of models should live in the docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not seeing Gemma2b in that link.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeh we need to update the README to include Gemma2 - thats next on deck!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add it here - #668

freezes the base LLM and adds a very small percentage of learnable parameters. This helps keep
memory associated with gradients and optimizer state low. Using TorchTune, you should be able to
fine-tune a Llama2 7B model with LoRA in less than 16GB of GPU memory using bfloat16 on a
RTX 3090/4090. For more information on how to use LoRA, take a look at our
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add link for RTX 3090/4090

The "merged weights" (see the :ref:`LoRA Tutorial <lora_finetune_label>` for more details)
are split across two checkpoint files similar to the source checkpoints from the HF Hub.
In fact the keys would be identical between these checkpoints. For more details see the
checkpointing tutorial. We also have a third checkpoint file which is much smaller in size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link to checkpointing tutorial

Run Evaluation using EleutherAI's Eval Harness
----------------------------------------------

We've fine-tuned a model. But how well does this model really do? Let's run some Evaluations!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We've fine-tuned a model. But how well does this model really do? Let's run some Evaluations!
We've fine-tuned a model. But how well does this model really do? Let's run some evaluations!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure about this one!



Once the config is updated, let's kick off evaluation! We'll use the
``truthfulqa_mc2`` task which is also the default in the config.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link to what the TruthfulQA MC2 task is and also add a sentence explaining it.


[evaluator.py:324] Running loglikelihood requests
[eleuther_eval.py:195] Eval completed in 121.27 seconds.
[eleuther_eval.py:197] truthfulqa_mc2: {'acc,none': 0.48919958543950917 ...}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stay tuned, I want to change our output for this.

[eleuther_eval.py:195] Eval completed in 121.27 seconds.
[eleuther_eval.py:197] truthfulqa_mc2: {'acc,none': 0.48919958543950917 ...}

So seems like our fine-tuned model gets ~48% on this task. Which is pretty good.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
So seems like our fine-tuned model gets ~48% on this task. Which is pretty good.
So seems like our fine-tuned model gets ~48% on this task, which is pretty good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this line is awkward. Why not just combine with the subsequent sentences plus a bit describing what truthfulqa_mc2 is actually doing? Basically something like:

The Truthful QA dataset measures a model's propensity to be truthful when answering questions. Specifically, we will evaluate our model on the truthfulqa_mc2 task, which measures the model's zero-shot accuracy on a question followed by one or more true responses and one or more false responses. We can run evaluation on our downloaded checkpoints first as a baseline
... (hopefully the command should just be a config change to the directory)

Now, we evaluate our fine-tuned model
...

We can see that the fine-tuned model yields an 8% overall improvement in zero-shot accuracy on this task.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to Evan's paragraphs, much more compelling and informative

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated!

Comment on lines +357 to +358
Once the config is updated, let's kick off quantization! We'll use the default
quantization method from the config.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if for each of these recipes (where it's relevant), we want to point to something like "for all available quantization/(fine-tuning) methods available in TorchTune, see here"


.. code-block:: yaml

checkpointer:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, I am seeing a lot of checkpointer across all these various tasks. Wondering if for some of the cases we want to just override things via CLI? E.g. in some cases (though not here) can't we just set checkpointer.checkpoint_dir=<checkpoint_dir> and call it a day? Or is that too much of a black box/defeats the purpose of showing off tune cp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope this is a good point. I modifed this in the first one but then leaving as is when we have a ton of files to change

@@ -53,7 +53,7 @@ def _setup_model(
with utils.set_default_dtype(self._dtype), self._device:
model = config.instantiate(model_cfg)

model.load_state_dict(model_state_dict, assign=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why's this change needed when using HF?

--------

Fine-tuning an LLM is almost never itself the end goal. Usually this is one step in a much
larger worfklow. An example workflow might look something like this:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
larger worfklow. An example workflow might look something like this:
larger workflow. An example workflow might look something like this:

setup and the end task for which the model will be used
- Evaluate the model on some benchmarks to validate model quality
- Run some generations to make sure the model output looks reasonable
- Quantize the model for efficient inference followed by optionally exporting it for specific
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a standard offramp or should this bullet be about more general offramping?

docs/source/examples/e2e_flow.rst Outdated Show resolved Hide resolved
docs/source/examples/e2e_flow.rst Outdated Show resolved Hide resolved
TorchTune, and how TorchTune makes it easy to use popular tools and libraries from the ecosystem.

We'll use the Llama2 7B model for this tutorial. You can find a complete set of models supported
by TorchTune `here <https://github.com/pytorch/torchtune/blob/main/README.md#introduction>`_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add it here - #668


Indeed, the bridge is pretty cool! Seems like our LLM knows what it's talking
about.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add some intuition on what we should be looking for when checking the generated output? How do we know something is off?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too subjective I think

-----------------------------------------

We saw that the generation recipe took around 11.6 seconds to generate 300 tokens.
One technique commonly used to speed up inference is quantization. TorchTune provides
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explain briefly what quantization is

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this and/or provide a link to a reference or 4-bit weights-only quantization

.. note::
Unlike the fine-tuned checkpoints, this output a single checkpoint file. This is
because our quantization APIs currently don't support any conversion across formats.
As a result you won't be able to use these quantized models outside of TorchTune.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but can you export these quantized models with executorch? or HF inference? (sorry not familiar with our offramp options)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now unfortunately no. Quantized models tie to a particular model definition. So we dont have any way to use these outside tune.

docs/source/examples/e2e_flow.rst Show resolved Hide resolved
docs/source/examples/e2e_flow.rst Show resolved Hide resolved
@RdoubleA
Copy link
Contributor

One high level comment - will this replace the first fine-tuning tutorial? It seems there's a lot of overlap between the two

recipes/README.md Outdated Show resolved Hide resolved
recipes/README.md Outdated Show resolved Hide resolved
docs/source/examples/e2e_flow.rst Outdated Show resolved Hide resolved
docs/source/examples/e2e_flow.rst Outdated Show resolved Hide resolved
docs/source/examples/e2e_flow.rst Outdated Show resolved Hide resolved
docs/source/examples/e2e_flow.rst Outdated Show resolved Hide resolved
------------------------------------------------

As we mentioned above, one of the benefits of handling of the checkpoint
conversion is that users can directly work with standard formats. This helps
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😃

Suggested change
conversion is that users can directly work with standard formats. This helps
conversion is that you can directly work with standard formats. This helps

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually personally I'm 50/50 on these whole first two sentences. It's a nice sentiment but I feel you show the benefits below, no need to spell it out here too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mainly use it as a flow thing - otherwise it's too much of an abrupt change?

docs/source/examples/e2e_flow.rst Outdated Show resolved Hide resolved
docs/source/examples/e2e_flow.rst Outdated Show resolved Hide resolved
docs/source/examples/e2e_flow.rst Outdated Show resolved Hide resolved
Comment on lines 469 to 470
sd_1 = torch.load('/tmp/Llama-2-7b-hf/hf_model_0001_0.pt', mmap=True, map_locations='cpu')
sd_2 = torch.load('/tmp/Llama-2-7b-hf/hf_model_0002_0.pt', mmap=True, map_location='cpu')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the folder here be <checkpoint_dir>? (I might just be misunderstanding)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope you're right! good catch

@kartikayk kartikayk merged commit ada5224 into main Apr 13, 2024
27 checks passed
@kartikayk kartikayk deleted the e2e_tutorial branch April 13, 2024 02:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants