# 🦙 Simple LLaMA Finetuner
Many thanks to LXE: https://github.com/lxe/simple-llama-finetuner

The v0 of this notebook allows you to...

In [None]:
#@title Check GPU (5 seconds, run once per session)
#@markdown Check type of GPU and VRAM available. A minimum of 15000 MiB VRAM is required. If no GPU, go to the Colab menu Runtime > Change Runtime Type and select GPU

# !if nvidia-smi --query-gpu=name,memory.total,memory.free --format=csv,noheader | grep -q 'MiB'; then \
#   echo "You have a gpu and can proceed! Your GPU is:" ; \
# fi
# !nvidia-smi --query-gpu=name,memory.total,memory.free --format=csv,noheader

from subprocess import getoutput
import os

gpu_info = getoutput('nvidia-smi  --query-gpu=name --format=csv,noheader')
if("A100" in gpu_info):
    which_gpu = "A10G"
    # os.system(f"pip install -q https://github.com/camenduru/stable-diffusion-webui-colab/releases/download/0.0.15/xformers-0.0.15.dev0+4c06c79.d20221205-cp38-cp38-linux_x86_64.whl")
elif("T4" in gpu_info):
    which_gpu = "T4"
    # os.system(f"pip install --force-reinstall -q https://github.com/camenduru/stable-diffusion-webui-colab/releases/download/0.0.15/xformers-0.0.15.dev0+1515f77.d20221130-cp38-cp38-linux_x86_64.whl")
elif("V100" in gpu_info):
    which_gpu = "V100"
elif "failed" in gpu_info:
    print(gpu_info)
    which_gpu = "CPU"
else:
    which_gpu = gpu_info

if which_gpu != "CPU":
    free_vram = getoutput('nvidia-smi  --query-gpu=memory.free --format=csv,noheader')
    print(f"Congrats! You've got a GPU! It's a {which_gpu} model gpu with {free_vram} VRAM")
else:
    print("Boo, no gpu available")

In [None]:
#@title Allow Google Drive & set paths (2 minutes, run once per session)
import os, sys
from google.colab import drive
drive.mount('/content/drive')
# local_path = '/content/env'
# env_path = '/content/drive/MyDrive/colab_envs/asdf'
# os.makedirs(env_path, exist_ok=True)
# !ln -s $env_path $local_path
# sys.path.insert(0,env_path)

save_to_gdrive = True

#@markdown Huggingface's name/path of the initial model, this is a path in huggingface's repository (you probably don't want to change this)
MODEL_PATH = "decapoda-research/llama-7b-hf" #@param {type:"string"}

#@markdown Enter the directory name to save your newly-trained model weights. If using gdrive, this will store in your gdrive root folder.
OUTPUT_DIR = "simple-llama-finetuner" #@param {type:"string"}
if save_to_gdrive:
    OUTPUT_DIR = "/content/drive/MyDrive/" + OUTPUT_DIR
else:
    OUTPUT_DIR = "/content/" + OUTPUT_DIR

#@markdown You need to accept the model license before downloading or using the Stable Diffusion weights. Please, visit the [model card](https://huggingface.co/runwayml/stable-diffusion-v1-5), read the license and tick the checkbox if you agree. You have to be a registered user on 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work.
# https://huggingface.co/settings/tokens
!mkdir -p ~/.huggingface
HUGGINGFACE_TOKEN = "" #@param {type:"string"}
!echo -n "{HUGGINGFACE_TOKEN}" > ~/.huggingface/token

print("Done!")



In [None]:
#@title Model/dependency setup (2 minutes, run once per session)

import os.path

if not os.path.isfile(OUTPUT_DIR+"/main.py"):
  !git clone https://github.com/jimmoffet/simple-llama-finetuner.git $OUTPUT_DIR
else:
  print("Working directory looks good.")

try:
  import peft
  print("Dependencies look good.")
except:
  !cd $OUTPUT_DIR && pip install -r requirements.txt && git pull 
  print("Dependencies look good.")

print("Done!")
# !git clone https://github.com/lxe/simple-llama-finetuner.git $OUTPUT_DIR
# !cd $OUTPUT_DIR && git pull && pip install -r requirements.txt

In [None]:
#@title Run gradio UI (1 minute, run once per session)
#@markdown NOTE: first inference will take a long time (~400s), downloading models, etc...

!cd $OUTPUT_DIR && python main.py --share --path $MODEL_PATH

TODOs

1. Should be able to save standard checkpoint file by adding output dir etc... to training args, we should use save_total_limit = 1, load_best_model_at_end=True and NOT use save_strategy = "no", then use trainer.save_model('checkpoint_latest_best') after trainer.train. Check out: https://discuss.huggingface.co/t/save-only-best-model-in-trainer/8442/4

2. We should load trainer.train from a standard checkpoint file, if it exists. https://github.com/tloen/alpaca-lora/issues/44. We should also either use a single standard lora model name and remove choice from gradio UI, so we are always training from the last checkpoint, we should probably save a timestamped copy of each latest_best checkpoint...

3. Still can't replicate example results for prompts like: [Write a Python program that prints the first 10 Fibonacci numbers.](https://github.com/tloen/alpaca-lora#:~:text=Write%20a%20Python%20program%20that%20prints%20the%20first%2010%20Fibonacci%20numbers) Should probably attempt to run this on Colab T4 and see results: https://github.com/tloen/alpaca-lora/blob/main/generate.py Should also try to run this notebook as is with A100

4. Figure out better way than gradio proxy for exposing colab localhost?

NOTE: On T4 sometimes it just chokes and runs forever without throwing errors, if it's run for more than 100s and doesn't say "processing" to the left of the seconds count, you should copy the prompt, refresh the url, go back to inference, reload the models (sometimes this chokes, too), paste the prompt and start again

NOTE: currently hardcoding peft model from "tloen/alpaca-lora-7b" for text gen in gradio app above