Add Phi3 Mini 4K Instruct Model to torchtune #876

kartikayk · 2024-04-26T03:10:05Z

Imp Note: The tokenizer still needs some work, this will be a follow up PR.

Context

What is the purpose of this PR? Is it to

add a new feature
fix a bug
update tests and/or documentation
other (please add here)

Please link to any issues this PR addresses.

Changelog

This PR adds support for the Phi3 Mini 4K Instruct model to torchtune. Specifically we add the following:

Phi3's RoPE module. This is not numerically equivalent to the Llama2 or Llama3 models and care needs to be taken to have correct behavior for bf16 training
State dict conversion logic accounting for the fused qkv and gate_up projection matrices
Config for multi-gpu full finetuning

Test plan

Please make sure to do each of the following if applicable to your PR. (If you're not sure about any one of these just ask and we will happily help.)

run pre-commit hooks and linters (make sure you've first installed via pre-commit install)
add unit tests for any new functionality
update docstrings for any new or updated methods or classes
run unit tests via pytest tests
run recipe tests via pytest tests -m integration_test
manually run any new or modified recipes with sufficient proof of correctness
- include relevant commands and any other artifacts in this summary (pastes of loss curves, eval results, etc.)

Unit Tests

Detailed comparisons with reference implementation

pytest tests/torchtune

Full-finetune Recipe

pytorch-bot · 2024-04-26T03:10:08Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/876

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 872911a with merge base 3890200 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

recipes/configs/phi3/mini_full.yaml

joecummings · 2024-04-26T03:31:19Z

recipes/configs/phi3/mini_full_low_memory.yaml

+#   pip install bitsandbytes
+#
+# To launch on a single device, run the following command from root:
+#   tune run full_finetune_single_device --config recipes/config/phi3/mini_full_low_memory.yaml


Suggested change

# tune run full_finetune_single_device --config recipes/config/phi3/mini_full_low_memory.yaml

# tune run full_finetune_single_device --config phi3/mini_full_low_memory.yaml

oh I havent added this to the registry, so this won't work. I'll let you take care of that

joecummings · 2024-04-26T03:31:34Z

recipes/configs/phi3/mini_full_low_memory.yaml

+# You can add specific overrides through the command line. For example
+# to override the checkpointer directory while launching training
+# you can run:
+#   tune run full_finetune_single_device --config recipes/config/phi3/mini_full_low_memory.yaml checkpointer.checkpoint_dir=<YOUR_CHECKPOINT_DIR>


Suggested change

# tune run full_finetune_single_device --config recipes/config/phi3/mini_full_low_memory.yaml checkpointer.checkpoint_dir=<YOUR_CHECKPOINT_DIR>

# tune run full_finetune_single_device --config phi3/mini_full_low_memory.yaml checkpointer.checkpoint_dir=<YOUR_CHECKPOINT_DIR>

recipes/configs/phi3/mini_full_low_memory.yaml

joecummings · 2024-04-26T03:32:11Z

tests/torchtune/modules/test_position_embeddings.py

+        x_out = rope_phi3(input)
+
+        # check the numerics of the computed tensor
+        assert_expected(x_out.mean(), tensor(-0.0005), atol=1e-4)


Can you include in the PR where these numbers come from?

Info's in the gist! Don't really want to replicate all of the info in the context again

torchtune/models/phi3/_model_builders.py

joecummings · 2024-04-26T03:33:37Z

torchtune/models/phi3/_position_embeddings.py

+from torch import nn, Tensor
+
+
+class Phi3RotaryPositionalEmbeddings(nn.Module):


You're a hero

joecummings · 2024-04-26T03:34:24Z

torchtune/utils/_checkpointing/_checkpointer.py

-            num_kv_heads=self._config["num_key_value_heads"],
-            dim=self._config["hidden_size"],
-        )
+        if self._model_type == ModelType.PHI3_MINI:


Does capitalization change this? B/c in the config it's PHI_MINI and in the enum it's phi_mini

Yup, it matters!

joecummings

Mostly nits! Thanks @kartikayk

ebsmothers · 2024-04-26T03:32:42Z

torchtune/models/phi3/_model_builders.py

+
+def phi3_tokenizer(path: str) -> SentencePieceTokenizer:
+    tokenizer = SentencePieceTokenizer(path)
+    tokenizer.pad_id = 32000


Looking at the HF config it says that eos_id and pad_id are the same, is that right?

Yes, but also will let @joecummings confirm that!

torchtune/models/phi3/_position_embeddings.py

ebsmothers · 2024-04-26T03:37:53Z

torchtune/models/phi3/_position_embeddings.py

+        TODO: The implementation below can be made more efficient
+        for inference.
+        """
+        # input tensor has shape [b, s, n_h, n_d]


(based on the above docstring)

Suggested change

# input tensor has shape [b, s, n_h, n_d]

# input tensor has shape [b, s, n_h, h_d]

oh good catch!

joecummings

Looks good!

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 26, 2024

kartikayk requested review from ebsmothers and joecummings April 26, 2024 03:10