<a href="https://colab.research.google.com/github/orynycz/llm/blob/main/walkthrough.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Step 0: Get access to the model at https://huggingface.co/meta-llama/Llama-2-7b-hf

In [None]:
# Step 1: Select A100 as the hardware accelerator

# Runtime -> Change runtime type -> Hardware accelerator -> GPU
# This is a comment and not executable code.  The user must manually change the runtime type in the Colab interface.


In [None]:
# Step 2: Install torchtune
# Takes 1 minute 30 seconds
# Reference: https://pytorch.org/torchtune/main/install.html

# Suppress output, remove to troubleshoot
%%capture

# Install stable version of pre-requisite PyTorch libraries using pip
!pip install torch torchvision torchao

!pip install torchtune

# Install the EleutherAI evaluation harness
! pip install lm_eval==0.4.*

In [None]:
# Step 3: Set storage and workspace
# Takes 11 seconds

project_name = 'walkthru' # set this!
experiment_number = '00' # set this!

# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Specifiy directories
import os
os.environ['STORAGE'] = f'/content/drive/MyDrive/{project_name}/{experiment_number}'
os.environ['LAB'] = '/content'

# Ensure checkpoint and output directories exist
!mkdir -p $STORAGE/untuned
!mkdir -p $STORAGE/tuned
!mkdir -p $LAB/tuned

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
# Step 4: Download the Hugging Face format Llama2 7B model
# Takes 2 minutes 13 seconds

# IMPORTANT: Click the key symbol on the left sidebar to go to the Secrets section.
# Ensure there is an entry named "huggingface" and that "Notebook access" is checked.
# Ensure the secret value is correct. If you don't have a key, get one here: https://huggingface.co/meta-llama/Llama-2-7b-hf

from google.colab import userdata
import os
os.environ['ACCESS_TOKEN'] = userdata.get('huggingface')

!tune download \
meta-llama/Llama-2-7b-hf \
--output-dir $STORAGE/untuned \
--hf-token $ACCESS_TOKEN

Ignoring files matching the following patterns: None
.gitattributes: 100% 1.58k/1.58k [00:00<00:00, 11.9MB/s]
LICENSE.txt: 100% 7.02k/7.02k [00:00<00:00, 37.0MB/s]
README.md: 100% 22.3k/22.3k [00:00<00:00, 87.5MB/s]
Responsible-Use-Guide.pdf: 100% 1.25M/1.25M [00:00<00:00, 21.6MB/s]
USE_POLICY.md: 100% 4.77k/4.77k [00:00<00:00, 37.9MB/s]
config.json: 100% 609/609 [00:00<00:00, 3.40MB/s]
generation_config.json: 100% 188/188 [00:00<00:00, 1.68MB/s]
model-00001-of-00002.safetensors: 100% 9.98G/9.98G [00:40<00:00, 249MB/s]
model-00002-of-00002.safetensors: 100% 3.50G/3.50G [00:14<00:00, 242MB/s]
model.safetensors.index.json: 100% 26.8k/26.8k [00:00<00:00, 36.7MB/s]
pytorch_model-00001-of-00002.bin: 100% 9.98G/9.98G [00:48<00:00, 206MB/s]
pytorch_model-00002-of-00002.bin: 100% 3.50G/3.50G [00:16<00:00, 215MB/s]
pytorch_model.bin.index.json: 100% 26.8k/26.8k [00:00<00:00, 36.9MB/s]
special_tokens_map.json: 100% 414/414 [00:00<00:00, 3.77MB/s]
tokenizer.json: 100% 1.84M/1.84M [00:00<00:00, 7.

In [None]:
# Step 5: Copy checkpoint folder from storage to working directory
# Takes 4 minutes 20 seconds

!cp -r $STORAGE/untuned $LAB/untuned


In [None]:
# Step 6: Generate an answer to a question using the untuned model

!mkdir -p $LAB/generated

# Takes 1 minute 19 seconds on T4 GPU hardware accelerator
!tune run generate --config generation \
  output_dir=$LAB/generated \
  checkpoint_dir=$LAB/untuned \
  checkpointer.checkpoint_dir=$LAB/untuned \
  checkpointer.checkpoint_files=[model-00001-of-00002.safetensors,model-00002-of-00002.safetensors] \
  tokenizer.path=$LAB/untuned/tokenizer.model \
  temperature=0.6 \
  prompt.user="Is it true that two plus two equals five?"

INFO:torchtune.utils._logging:Running InferenceRecipe with resolved config:

checkpoint_dir: /content/untuned
checkpointer:
  _component_: torchtune.training.FullModelHFCheckpointer
  checkpoint_dir: /content/untuned
  checkpoint_files:
  - model-00001-of-00002.safetensors
  - model-00002-of-00002.safetensors
  model_type: LLAMA2
  output_dir: /content/generated
device: cuda
dtype: bf16
enable_kv_cache: true
max_new_tokens: 300
model:
  _component_: torchtune.models.llama2.llama2_7b
output_dir: /content/generated
prompt:
  system: null
  user: Is it true that two plus two equals five?
quantizer: null
seed: 1234
temperature: 0.6
tokenizer:
  _component_: torchtune.models.llama2.llama2_tokenizer
  max_seq_len: null
  path: /content/untuned/tokenizer.model
  prompt_template: null
top_k: 300

DEBUG:torchtune.utils._logging:Setting manual seed to local seed 1234. Local seed is seed + rank = 1234 + 0
INFO:torchtune.utils._logging:Model is initialized with precision torch.bfloat16.
INFO:torch

In [None]:
# Step x - Evaluate untuned model
# Takes 2 minutes 50 seconds

!tune run eleuther_eval \
--config eleuther_evaluation \
checkpointer.checkpoint_dir=$LAB/untuned \
tokenizer.path=$LAB/untuned/tokenizer.model \
batch_size=4


2025-02-19 09:32:11.709802: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-02-19 09:32:11.728239: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1739957531.750140   34990 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1739957531.756824   34990 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-02-19 09:32:11.778818: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instr

In [None]:
# Step 6: Perform fine tuning
# Takes 2-3 hours

!tune run lora_finetune_single_device \
--config llama2/7B_lora_single_device \
checkpointer.checkpoint_dir=$LAB/untuned \
tokenizer.path=$LAB/checkpoint/tokenizer.model \
checkpointer.output_dir=$LAB/tuned

# Copy recipe to local storage
!tune cp llama2/7B_lora_single_device $STORAGE/recipe.yaml

# Put output into Storage
!cp -r $LAB/tuned $STORAGE


INFO:torchtune.utils._logging:Running LoRAFinetuneRecipeSingleDevice with resolved config:

batch_size: 2
checkpointer:
  _component_: torchtune.training.FullModelHFCheckpointer
  adapter_checkpoint: null
  checkpoint_dir: /content/checkpoint
  checkpoint_files:
  - pytorch_model-00001-of-00002.bin
  - pytorch_model-00002-of-00002.bin
  model_type: LLAMA2
  output_dir: /content/output
  recipe_checkpoint: null
compile: false
dataset:
  _component_: torchtune.datasets.alpaca_cleaned_dataset
  packed: false
device: cuda
dtype: bf16
enable_activation_checkpointing: true
enable_activation_offloading: false
epochs: 1
gradient_accumulation_steps: 8
log_every_n_steps: 1
log_peak_memory_stats: true
loss:
  _component_: torchtune.modules.loss.CEWithChunkedOutputLoss
lr_scheduler:
  _component_: torchtune.training.lr_schedulers.get_cosine_schedule_with_warmup
  num_warmup_steps: 100
max_steps_per_epoch: null
metric_logger:
  _component_: torchtune.training.metric_logging.DiskLogger
  log_dir: /t

In [None]:
#Step x - Copy fine-tuned model from storage to lab
# Takes 4 minutes 33 seconds

!cp -r $STORAGE/tuned $LAB

In [None]:
# Step x - Evaluate fine-tuned model
# Takes 2 minutes 50 seconds

!tune run eleuther_eval \
--config eleuther_evaluation \
checkpointer.checkpoint_dir=$LAB/tuned/epoch_0 \
checkpointer.checkpoint_files=[ft-model-00001-of-00002.safetensors,ft-model-00002-of-00002.safetensors] \
tokenizer.path=$LAB/tuned/epoch_0/tokenizer.model \
batch_size=4


/bin/bash: -c: line 2: syntax error: unexpected end of file
2025-02-19 09:52:16.965744: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-02-19 09:52:16.982921: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1739958737.004422   40614 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1739958737.010992   40614 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-02-19 09:52:17.032586: I tensorflow/core/platform/cpu_feature_guard.cc:210] Th

In [None]:
!mkdir -p $LAB/generated

# Takes 45 seconds on T4 GPU hardware accelerator
!tune run generate --config generation \
  output_dir=$LAB/generated \
  checkpoint_dir=$LAB/tuned/epoch_0 \
  checkpointer.checkpoint_dir=$LAB/tuned/epoch_0 \
  checkpointer.checkpoint_files=[ft-model-00001-of-00002.safetensors,ft-model-00002-of-00002.safetensors] \
  tokenizer.path=$LAB/tuned/epoch_0/tokenizer.model \
  temperature=0.6 \
  prompt.user="Is it true that two plus two equals five?"

INFO:torchtune.utils._logging:Running InferenceRecipe with resolved config:

checkpoint_dir: /content/tuned/epoch_0
checkpointer:
  _component_: torchtune.training.FullModelHFCheckpointer
  checkpoint_dir: /content/tuned/epoch_0
  checkpoint_files:
  - ft-model-00001-of-00002.safetensors
  - ft-model-00002-of-00002.safetensors
  model_type: LLAMA2
  output_dir: /content/generated
device: cuda
dtype: bf16
enable_kv_cache: true
max_new_tokens: 300
model:
  _component_: torchtune.models.llama2.llama2_7b
output_dir: /content/generated
prompt:
  system: null
  user: Is it true that two plus two equals five?
quantizer: null
seed: 1234
temperature: 0.6
tokenizer:
  _component_: torchtune.models.llama2.llama2_tokenizer
  max_seq_len: null
  path: /content/tuned/epoch_0/tokenizer.model
  prompt_template: null
top_k: 300

DEBUG:torchtune.utils._logging:Setting manual seed to local seed 1234. Local seed is seed + rank = 1234 + 0
INFO:torchtune.utils._logging:Model is initialized with precision to