## Fine-tuning Mistral 7b with AutoTrain

Setup Runtime
For fine-tuning Llama, a GPU instance is essential. Follow the directions below:

- Go to `Runtime` (located in the top menu bar).
- Select `Change Runtime Type`.
- Choose `T4 GPU` (or a comparable option).

### Step 1: Setup Environment

In [11]:
!pip install pandas autotrain-advanced -q

In [2]:
!autotrain setup --update-torch

> [1mINFO    Installing latest xformers[0m
> [1mINFO    Successfully installed latest xformers[0m
> [1mINFO    Installing latest PyTorch[0m
> [1mINFO    Successfully installed latest PyTorch[0m


## Step 2: Connect to HuggingFace for Model Upload

### Logging to Hugging Face
To make sure the model can be uploaded to be used for Inference, it's necessary to log in to the Hugging Face hub.

### Getting a Hugging Face token
Steps:

1. Navigate to this URL: https://huggingface.co/settings/tokens
2. Create a write `token` and copy it to your clipboard
3. Run the code below and enter your `token`

In [3]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## Step 3: Upload your dataset

Add your data set to the root directory in the Colab under the name train.csv. The AutoTrain command will look for your data there under that name.

#### Don't have a data set and want to try finetuning on an example data set?
If you don't have a dataset you can run these commands below to get an example data set and save it to train.csv

In [4]:
!git clone https://github.com/joshbickett/finetune-llama-2.git
%cd finetune-llama-2
%mv train.csv ../train.csv
%cd ..

Cloning into 'finetune-llama-2'...
remote: Enumerating objects: 70, done.[K
remote: Counting objects: 100% (70/70), done.[K
remote: Compressing objects: 100% (50/50), done.[K
remote: Total 70 (delta 38), reused 48 (delta 19), pack-reused 0[K
Receiving objects: 100% (70/70), 25.13 KiB | 12.57 MiB/s, done.
Resolving deltas: 100% (38/38), done.
/content/finetune-llama-2
/content


In [5]:
import pandas as pd
df = pd.read_csv("train.csv")
df

Unnamed: 0,Concept,Funny Description Prompt,text
0,A cactus at a dance party,"A cactus, wearing disco lights and surrounded ...",###Human:\nGenerate a midjourney prompt for A ...
1,A robot on a first date,"A robot, with a bouquet of USB cables, nervous...",###Human:\nGenerate a midjourney prompt for A ...
2,A snail at a speed contest,"A snail, with a mini rocket booster, confident...",###Human:\nGenerate a midjourney prompt for A ...
3,A penguin at a beach party,"A penguin, with sunscreen and a surfboard, try...",###Human:\nGenerate a midjourney prompt for A ...
4,A cloud in a bad mood,"A cloud, grumbling and dropping mini lightning...",###Human:\nGenerate a midjourney prompt for A ...
...,...,...,...
112,A donut feeling the hole emptiness,"A donut, in existential bakery therapy, ponder...",###Human:\nGenerate a midjourney prompt for A ...
113,A pineapple with a prickly attitude,"A pineapple, in a prickly personality class, s...",###Human:\nGenerate a midjourney prompt for A ...
114,A calculator crunching life's problems,"A calculator, at a problem-solving workshop, c...",###Human:\nGenerate a midjourney prompt for A ...
115,A kite reaching new heights,"A kite, in an altitude adjustment session, unt...",###Human:\nGenerate a midjourney prompt for A ...


In [6]:
df['text'][15]

'###Human:\nGenerate a midjourney prompt for A book on a mystery adventure\n\n###Assistant:\nA book, wearing detective glasses, flipping through its own pages, trying to solve the cliffhanger it was left on.'

## Step 4: Overview of AutoTrain command

#### Short overview of what the command flags do.

- `!autotrain`: Command executed in environments like a Jupyter notebook to run shell commands directly. `autotrain` is an automatic training utility.

- `llm`: A sub-command or argument specifying the type of task

- `--train`: Initiates the training process.

- `--project_name`: Sets the name of the project

- `--model abhishek/llama-2-7b-hf-small-shards`: Specifies original model that is hosted on Hugging Face named "llama-2-7b-hf-small-shards" under the "abhishek".

- `--data_path .`: The path to the dataset for training. The "." refers to the current directory. The `train.csv` file needs to be located in this directory.

- `--use_int4`: Use of INT4 quantization to reduce model size and speed up inference times at the cost of some precision.

- `--learning_rate 2e-4`: Sets the learning rate for training to 0.0002.

- `--train_batch_size 12`: Sets the batch size for training to 12.

- `--num_train_epochs 3`: The training process will iterate over the dataset 3 times.

### Steps needed before running
Go to the `!autotrain` code cell below and update it by following the steps below:

1. After `--project_name` replace `*enter-a-project-name*` with the name that you'd like to call the project
2. After `--repo_id` replace `*username*/*repository*`. Replace `*username*` with your Hugging Face username and `*repository*` with the repository name you'd like it to be created under. You don't need to create this repository before hand, it will automatically be created and uploaded once the training is completed.
3. Confirm that `train.csv` is in the root directory in the Colab. The `--data_path .` flag will make it so that AutoTrain looks for your data there.
4. Make sure to add the LoRA Target Modules to be trained `--target-modules q_proj, v_proj`
5. Once you've made these changes you're all set, run the command below!

In [7]:
!autotrain llm --train --project-name mistral-7b-mj-finetuned_en --model filipealmeida/Mistral-7B-Instruct-v0.1-sharded --data-path . --use-peft --quantization int4 --lr 2e-4 --batch-size 12 --epochs 3 --trainer sft --target_modules q_proj,v_proj --push-to-hub --token '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' --repo-id Swapnilg915/mistral-7b-mj-finetuned_en

> [1mINFO    Running LLM[0m
> [1mINFO    Params: Namespace(version=False, text_column='text', rejected_text_column='rejected', prompt_text_column='prompt', model_ref=None, warmup_ratio=0.1, optimizer='adamw_torch', scheduler='linear', weight_decay=0.0, max_grad_norm=1.0, add_eos_token=False, block_size=-1, peft=True, lora_r=16, lora_alpha=32, lora_dropout=0.05, logging_steps=-1, evaluation_strategy='epoch', save_total_limit=1, save_strategy='epoch', auto_find_batch_size=False, mixed_precision=None, quantization='int4', model_max_length=1024, trainer='sft', target_modules='q_proj,v_proj', merge_adapter=False, use_flash_attention_2=False, dpo_beta=0.1, apply_chat_template=False, padding=None, train=True, deploy=False, inference=False, username=None, backend='local-cli', token='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', repo_id='Swapnilg915/mistral-7b-mj-finetuned_en', push_to_hub=True, model='filipealmeida/Mistral-7B-Instruct-v0.1-sharded', project_name='mistral-7b-mj-finetuned_en', seed

## Step 5: Completed 🎉
After the command above is completed your Model will be uploaded to Hugging Face.

#### Learn more about AutoTrain (optional)
If you want to learn more about what command-line flags are available

## Step 6: Inference Engine

In [8]:
!autotrain llm -h

usage: autotrain <command> [<args>] llm [-h] [--text_column TEXT_COLUMN]
                                        [--rejected_text_column REJECTED_TEXT_COLUMN]
                                        [--prompt-text-column PROMPT_TEXT_COLUMN]
                                        [--model-ref MODEL_REF] [--warmup_ratio WARMUP_RATIO]
                                        [--optimizer OPTIMIZER] [--scheduler SCHEDULER]
                                        [--weight_decay WEIGHT_DECAY]
                                        [--max_grad_norm MAX_GRAD_NORM] [--add_eos_token]
                                        [--block_size BLOCK_SIZE] [--peft] [--lora_r LORA_R]
                                        [--lora_alpha LORA_ALPHA] [--lora_dropout LORA_DROPOUT]
                                        [--logging_steps LOGGING_STEPS]
                                        [--evaluation_strategy EVALUATION_STRATEGY]
                                        [--save_total_limit SAVE_TOTAL_L

In [12]:
!pip install -q peft  accelerate bitsandbytes safetensors

In [13]:
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import transformers
adapters_name = "Swapnilg915/mistral-7b-mj-finetuned"
model_name = "filipealmeida/Mistral-7B-Instruct-v0.1-sharded" #"mistralai/Mistral-7B-Instruct-v0.1"


device = "cuda" # the device to load the model onto

In [14]:
bnb_config = transformers.BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

In [15]:
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    load_in_4bit=True,
    torch_dtype=torch.bfloat16,
    quantization_config=bnb_config,
    device_map='auto'
)

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

  return self.fget.__get__(instance, owner)()


## Step 7: Peft Model Loading with upload model

In [16]:
model = PeftModel.from_pretrained(model, adapters_name)

adapter_config.json:   0%|          | 0.00/599 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/27.3M [00:00<?, ?B/s]

In [17]:
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.bos_token_id = 1

stop_token_ids = [0]

print(f"Successfully loaded the model {model_name} into memory")

Successfully loaded the model filipealmeida/Mistral-7B-Instruct-v0.1-sharded into memory


In [18]:
text = "[INST] generate a midjourney prompt for A person walks in the rain [/INST]"

encoded = tokenizer(text, return_tensors="pt", add_special_tokens=False)
model_input = encoded
model.to(device)
generated_ids = model.generate(**model_input, max_new_tokens=200, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


[INST] generate a midjourney prompt for A person walks in the rain [/INST] "As the rain beats down on his umbrella, the protagonist finds himself lost in the city he thought he knew so well. Take him on a journey through the rainy landscape, leading him through unexpected twists and turns towards a surprising encounter."</s>


In [None]:
#streamlit


In [19]:
pip install streamlit

Collecting streamlit
  Downloading streamlit-1.30.0-py2.py3-none-any.whl (8.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.4/8.4 MB[0m [31m64.5 MB/s[0m eta [36m0:00:00[0m
Collecting validators<1,>=0.2 (from streamlit)
  Downloading validators-0.22.0-py3-none-any.whl (26 kB)
Collecting gitpython!=3.1.19,<4,>=3.0.7 (from streamlit)
  Downloading GitPython-3.1.41-py3-none-any.whl (196 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m196.4/196.4 kB[0m [31m29.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pydeck<1,>=0.8.0b4 (from streamlit)
  Downloading pydeck-0.8.1b0-py2.py3-none-any.whl (4.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.8/4.8 MB[0m [31m61.5 MB/s[0m eta [36m0:00:00[0m
Collecting watchdog>=2.1.5 (from streamlit)
  Downloading watchdog-3.0.0-py3-none-manylinux2014_x86_64.whl (82 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m82.1/82.1 kB[0m [31m13.5 MB/s[0m eta [36m0:0

In [21]:
pip install langchain

Collecting langchain
  Downloading langchain-0.1.4-py3-none-any.whl (803 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/803.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━[0m [32m481.3/803.6 kB[0m [31m14.4 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m803.6/803.6 kB[0m [31m16.5 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain)
  Downloading dataclasses_json-0.6.3-py3-none-any.whl (28 kB)
Collecting jsonpatch<2.0,>=1.33 (from langchain)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl (12 kB)
Collecting langchain-community<0.1,>=0.0.14 (from langchain)
  Downloading langchain_community-0.0.16-py3-none-any.whl (1.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m71.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain-core<0.2,>=0.1.16 (from langchain)
  Downloa

In [22]:
pip install gradio



In [36]:
import gradio as gr

In [39]:
def format_prompt(message, history):
  prompt = "<s>"
  for user_prompt, bot_response in history:
    prompt += f"[INST] {user_prompt} [/INST]"
    prompt += f" {bot_response}</s> "
  prompt += f"[INST] {message} [/INST]"
  return prompt

In [56]:
def generate_one(prompt, history):
  print("\n prompt : ", prompt)
  # formatted_prompt = format_prompt(prompt, history)
  encoded = tokenizer(prompt, return_tensors="pt", add_special_tokens=False)
  model_input = encoded
  model.to(device)
  generated_ids = model.generate(**model_input, max_new_tokens=200, do_sample=True)
  decoded = tokenizer.batch_decode(generated_ids)
  response = decoded[0].replace("<s>", "").replace("</s>", "").replace("[INST]", "").replace("[/INST]", "")
  print("response : ", response)
  return response

In [57]:
with gr.Blocks() as demo:
    gr.ChatInterface(
        generate_one
    )

demo.queue().launch(debug=True)

Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
Running on public URL: https://6126eaee8b326f631e.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



 prompt :  what is the capital of Norway




response :  what is the capital of Norway?
Answer: Oslo


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



 prompt :  what is the capital of UAE?
response :  what is the capital of UAE?
The capital of UAE is Abu Dhabi.
Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7862 <> https://6126eaee8b326f631e.gradio.live


