Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
353 changes: 353 additions & 0 deletions distributed_training/llama2/README.md

Large diffs are not rendered by default.

191 changes: 0 additions & 191 deletions distributed_training/llama2/demo.md

This file was deleted.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
113 changes: 113 additions & 0 deletions distributed_training/llama2/load-back-FSDP-checkpoints.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "58901f38",
"metadata": {},
"source": [
"# Loading back FSDP checkpoints\n",
"\n",
"For more information: https://github.com/facebookresearch/llama-recipes/blob/main/docs/inference.md#loading-back-fsdp-checkpoints"
]
},
{
"cell_type": "markdown",
"id": "98b57d30",
"metadata": {},
"source": [
"## All of the code in this notebook should be run in the OCI Data Science Notebook Terminal!"
]
},
{
"cell_type": "markdown",
"id": "05a59132",
"metadata": {},
"source": [
"Before you start make sure that you've installed the `pytorch20_p39_gpu_v2` Conda and activate it in the `Terminal`\n",
"\n",
"```bash\n",
"odsc conda install -s pytorch20_p39_gpu_v2\n",
"```\n",
"\n",
"... then activate it\n",
"\n",
"```bash\n",
"conda activate /home/datascience/conda/pytorch20_p39_gpu_v2\n",
"```\n",
"\n",
"Then install all of the required dependancies\n",
"\n",
"```bash\n",
"!pip install tokenizers==0.13.3 -U && pip install transformers -U && pip install llama-recipes==0.0.1\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "6e2aecea",
"metadata": {},
"source": [
"Following commands work best when you execute them in the `terminal` too!\n",
"\n",
"First you have to login to access the Llama2 model\n",
"```bash\n",
"!huggingface-cli login\n",
"```\n",
"\n",
"Then run the checkpoint conververter, it looks like following\n",
"\n",
"```bash\n",
"python -m llama_recipes.inference.checkpoint_converter_fsdp_hf --fsdp_checkpoint_path /mnt/llama2/outputs/lvp-7b/ocid1.datasciencejob.oc1.eu-frankfurt-1.amaaaaaan/fine-tuned-meta-llama/Llama-2-7b-hf --consolidated_model_path /mnt/llama2/fsdp_consolidated_checkpoints --HF_model_path_or_name \"meta-llama/Llama-2-13b-hf\"\n",
"```\n",
"\n",
"Replace the `--fsdp_checkpoint_path` with the folder you specified by the `--dist_checkpoint_root_folder` which will be the location at your object storage bucket, as per the example above. Notice that we ran this in OCI Data Science Notebooks and mounted the object storage bucket used to store the FSDP checkpoints under `/mnt/llama2`. The `--consolidated_model_path` is the path where the consolidated weights will be stored back. The `--HF_model_path_or_name` is the name of the model used for the fine-tuning, or if you downloaded the model locally, the location of the downloaded model.\n",
"\n",
"If the merging process was successful, you should see in your `--consolidated_model_path` folder something like this:\n",
"\n",
"```bash\n",
" 0 drwxr-xr-x. 1 datascience users 0 Oct 18 15:48 .\n",
" 0 drwxr-xr-x. 1 datascience users 0 Oct 18 14:38 ..\n",
" 512 -rw-r--r--. 1 datascience users 42 Oct 18 16:35 added_tokens.json\n",
"1.0K -rw-r--r--. 1 datascience users 656 Oct 18 16:35 config.json\n",
" 512 -rw-r--r--. 1 datascience users 111 Oct 18 16:35 generation_config.json\n",
"9.2G -rw-r--r--. 1 datascience users 9.2G Oct 18 16:35 pytorch_model-00001-of-00003.bin\n",
"9.3G -rw-r--r--. 1 datascience users 9.3G Oct 18 16:36 pytorch_model-00002-of-00003.bin\n",
"6.7G -rw-r--r--. 1 datascience users 6.7G Oct 18 16:36 pytorch_model-00003-of-00003.bin\n",
" 24K -rw-r--r--. 1 datascience users 24K Oct 18 16:36 pytorch_model.bin.index.json\n",
" 512 -rw-r--r--. 1 datascience users 72 Oct 18 16:35 special_tokens_map.json\n",
"1.5K -rw-r--r--. 1 datascience users 1.2K Oct 18 16:35 tokenizer_config.json\n",
"489K -rw-r--r--. 1 datascience users 489K Oct 18 16:35 tokenizer.model\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2407ae40",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [conda env:pytorch20_p39_gpu_v2]",
"language": "python",
"name": "conda-env-pytorch20_p39_gpu_v2-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading