E2E Tutorial #690

kartikayk · 2024-04-11T01:05:38Z

Context

Adding an e2e tutorial which captures the complete flow within TorchTune. This tutorial is a good overview of the capabilities of the library and so adding this to the Getting Started section instead of burying it in the tutorials section.

Changelog

Add said tutorial
Couple of other fixes including changing eval and quantize configs to default to HF models
A short fix to the quantize API

Test plan

Ran all of the recipes

Tutorial Rendered

Updates to the entry page rendered

pytorch-bot · 2024-04-11T01:05:41Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/690

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ac0aae2 with merge base a783aca ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

docs/source/examples/e2e_flow.rst

joecummings · 2024-04-11T12:25:55Z

docs/source/examples/e2e_flow.rst

+  will depend on factors such as the model, amount and nature of training data, your hardware
+  setup and the end task for which the model will be used
+- Evaluate the model on some benchmarks to validate model quality
+- Run some generations to make sure the model output looks reasonable


I would switch "run generations" and "evaluate model". I find it faster and easier to run generations to check reasonable output first.

docs/source/examples/e2e_flow.rst

joecummings · 2024-04-11T12:27:04Z

docs/source/examples/e2e_flow.rst

+TorchTune, and how TorchTune makes it easy to use popular tools and libraries from the ecosystem.
+
+We'll use the Llama2 7B model for this tutorial. You can find a complete set of models supported
+by TorchTune `here <https://github.com/pytorch/torchtune/blob/main/README.md#introduction>`_.


We should change this eventually. The official set of models should live in the docs.

Also not seeing Gemma2b in that link.

Yeh we need to update the README to include Gemma2 - thats next on deck!

I can add it here - #668

joecummings · 2024-04-11T12:28:17Z

docs/source/examples/e2e_flow.rst

+freezes the base LLM and adds a very small percentage of learnable parameters. This helps keep
+memory associated with gradients and optimizer state low. Using TorchTune, you should be able to
+fine-tune a Llama2 7B model with LoRA in less than 16GB of GPU memory using bfloat16 on a
+RTX 3090/4090. For more information on how to use LoRA, take a look at our


Add link for RTX 3090/4090

docs/source/examples/e2e_flow.rst

joecummings · 2024-04-11T12:29:51Z

docs/source/examples/e2e_flow.rst

+The "merged weights" (see the :ref:`LoRA Tutorial <lora_finetune_label>` for more details)
+are split across two checkpoint files similar to the source checkpoints from the HF Hub.
+In fact the keys would be identical between these checkpoints. For more details see the
+checkpointing tutorial. We also have a third checkpoint file which is much smaller in size


Link to checkpointing tutorial

joecummings · 2024-04-11T12:30:10Z

docs/source/examples/e2e_flow.rst

+Run Evaluation using EleutherAI's Eval Harness
+----------------------------------------------
+
+We've fine-tuned a model. But how well does this model really do? Let's run some Evaluations!


Suggested change

We've fine-tuned a model. But how well does this model really do? Let's run some Evaluations!

We've fine-tuned a model. But how well does this model really do? Let's run some evaluations!

not sure about this one!

docs/source/examples/e2e_flow.rst

joecummings · 2024-04-11T12:31:41Z

docs/source/examples/e2e_flow.rst

+
+
+Once the config is updated, let's kick off evaluation! We'll use the
+``truthfulqa_mc2`` task which is also the default in the config.


Link to what the TruthfulQA MC2 task is and also add a sentence explaining it.

joecummings · 2024-04-11T12:32:01Z

docs/source/examples/e2e_flow.rst

+
+    [evaluator.py:324] Running loglikelihood requests
+    [eleuther_eval.py:195] Eval completed in 121.27 seconds.
+    [eleuther_eval.py:197] truthfulqa_mc2: {'acc,none': 0.48919958543950917 ...}


Stay tuned, I want to change our output for this.

joecummings · 2024-04-11T12:32:16Z

docs/source/examples/e2e_flow.rst

+    [eleuther_eval.py:195] Eval completed in 121.27 seconds.
+    [eleuther_eval.py:197] truthfulqa_mc2: {'acc,none': 0.48919958543950917 ...}
+
+So seems like our fine-tuned model gets ~48% on this task. Which is pretty good.


Suggested change

So seems like our fine-tuned model gets ~48% on this task. Which is pretty good.

So seems like our fine-tuned model gets ~48% on this task, which is pretty good.

I feel like this line is awkward. Why not just combine with the subsequent sentences plus a bit describing what truthfulqa_mc2 is actually doing? Basically something like:

The Truthful QA dataset measures a model's propensity to be truthful when answering questions. Specifically, we will evaluate our model on the truthfulqa_mc2 task, which measures the model's zero-shot accuracy on a question followed by one or more true responses and one or more false responses. We can run evaluation on our downloaded checkpoints first as a baseline
... (hopefully the command should just be a config change to the directory)

Now, we evaluate our fine-tuned model
...

We can see that the fine-tuned model yields an 8% overall improvement in zero-shot accuracy on this task.

+1 to Evan's paragraphs, much more compelling and informative

docs/source/examples/e2e_flow.rst

ebsmothers · 2024-04-11T14:18:17Z

docs/source/examples/e2e_flow.rst

+Once the config is updated, let's kick off quantization! We'll use the default
+quantization method from the config.


Wondering if for each of these recipes (where it's relevant), we want to point to something like "for all available quantization/(fine-tuning) methods available in TorchTune, see here"

docs/source/examples/e2e_flow.rst

ebsmothers · 2024-04-11T14:20:40Z

docs/source/examples/e2e_flow.rst

+
+.. code-block:: yaml
+
+    checkpointer:


Just curious, I am seeing a lot of checkpointer across all these various tasks. Wondering if for some of the cases we want to just override things via CLI? E.g. in some cases (though not here) can't we just set checkpointer.checkpoint_dir=<checkpoint_dir> and call it a day? Or is that too much of a black box/defeats the purpose of showing off tune cp

Nope this is a good point. I modifed this in the first one but then leaving as is when we have a ton of files to change

docs/source/examples/e2e_flow.rst

rohan-varma · 2024-04-11T17:17:16Z

recipes/quantize.py

@@ -53,7 +53,7 @@ def _setup_model(
        with utils.set_default_dtype(self._dtype), self._device:
            model = config.instantiate(model_cfg)

-        model.load_state_dict(model_state_dict, assign=True)


why's this change needed when using HF?

RdoubleA · 2024-04-11T17:21:00Z

docs/source/examples/e2e_flow.rst

+--------
+
+Fine-tuning an LLM is almost never itself the end goal. Usually this is one step in a much
+larger worfklow. An example workflow might look something like this:


Suggested change

larger worfklow. An example workflow might look something like this:

larger workflow. An example workflow might look something like this:

RdoubleA · 2024-04-11T17:22:27Z

docs/source/examples/e2e_flow.rst

+  setup and the end task for which the model will be used
+- Evaluate the model on some benchmarks to validate model quality
+- Run some generations to make sure the model output looks reasonable
+- Quantize the model for efficient inference followed by optionally exporting it for specific


is this a standard offramp or should this bullet be about more general offramping?

docs/source/examples/e2e_flow.rst

RdoubleA · 2024-04-11T17:23:38Z

docs/source/examples/e2e_flow.rst

+TorchTune, and how TorchTune makes it easy to use popular tools and libraries from the ecosystem.
+
+We'll use the Llama2 7B model for this tutorial. You can find a complete set of models supported
+by TorchTune `here <https://github.com/pytorch/torchtune/blob/main/README.md#introduction>`_.


I can add it here - #668

RdoubleA · 2024-04-11T17:37:06Z

docs/source/examples/e2e_flow.rst

+
+Indeed, the bridge is pretty cool! Seems like our LLM knows what it's talking
+about.
+


Should we add some intuition on what we should be looking for when checking the generated output? How do we know something is off?

too subjective I think

RdoubleA · 2024-04-11T17:38:51Z

docs/source/examples/e2e_flow.rst

+-----------------------------------------
+
+We saw that the generation recipe took around 11.6 seconds to generate 300 tokens.
+One technique commonly used to speed up inference is quantization. TorchTune provides


explain briefly what quantization is

Yeah this and/or provide a link to a reference or 4-bit weights-only quantization

RdoubleA · 2024-04-11T17:40:59Z

docs/source/examples/e2e_flow.rst

+.. note::
+    Unlike the fine-tuned checkpoints, this output a single checkpoint file. This is
+    because our quantization APIs currently don't support any conversion across formats.
+    As a result you won't be able to use these quantized models outside of TorchTune.


but can you export these quantized models with executorch? or HF inference? (sorry not familiar with our offramp options)

For now unfortunately no. Quantized models tie to a particular model definition. So we dont have any way to use these outside tune.

docs/source/examples/e2e_flow.rst

RdoubleA · 2024-04-11T17:48:32Z

One high level comment - will this replace the first fine-tuning tutorial? It seems there's a lot of overlap between the two

recipes/README.md

docs/source/examples/e2e_flow.rst

ebsmothers · 2024-04-12T01:16:34Z

docs/source/examples/e2e_flow.rst

+------------------------------------------------
+
+As we mentioned above, one of the benefits of handling of the checkpoint
+conversion is that users can directly work with standard formats. This helps


😃

Suggested change

conversion is that users can directly work with standard formats. This helps

conversion is that you can directly work with standard formats. This helps

Actually personally I'm 50/50 on these whole first two sentences. It's a nice sentiment but I feel you show the benefits below, no need to spell it out here too.

I mainly use it as a flow thing - otherwise it's too much of an abrupt change?

docs/source/examples/e2e_flow.rst

ebsmothers · 2024-04-12T01:50:19Z

docs/source/examples/e2e_flow.rst

+    sd_1 = torch.load('/tmp/Llama-2-7b-hf/hf_model_0001_0.pt', mmap=True, map_locations='cpu')
+    sd_2 = torch.load('/tmp/Llama-2-7b-hf/hf_model_0002_0.pt', mmap=True, map_location='cpu')


Should the folder here be <checkpoint_dir>? (I might just be misunderstanding)

nope you're right! good catch

docs/source/examples/e2e_flow.rst

kartikayk requested a review from joecummings April 11, 2024 01:05

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 11, 2024

kartikayk requested review from ebsmothers, RdoubleA, matthewdzmura and jerryzh168 April 11, 2024 01:05

joecummings reviewed Apr 11, 2024

View reviewed changes

docs/source/examples/e2e_flow.rst Outdated Show resolved Hide resolved