# Fine-tuning Llama 7b using your data

In this notebook, we'll walk you through the steps to fine-tune Llama using your dataset. 
Follow along by running each cell in order!

### Package Installation

Before we get started, let's ensure we have all the necessary packages installed.

In [None]:
!pip install pandas


In [None]:
!pip install autotrain-advanced


### Setup Autotrain
Required if you are using Google Colab

In [None]:
!autotrain setup --update-torch


### Run Autotrain LLM help tool (optional)
If you want to learn more about what command-line flags are available

In [None]:
!autotrain llm -h

### Log into Hugging Face Hub

To make sure the model can be uploaded and shared via the Hugging Face platform, it's necessary to log in to the Hugging Face hub. 

#### Locating Hugging Face token
You can create User Access Tokens at this URL: https://huggingface.co/settings/tokens

Step: 
1. Navigate to this URL 
2. Create a `write` token and copy it to your clipboard
3. Run the code below and enter your token



In [None]:
from huggingface_hub import notebook_login
notebook_login()

### Run the Autotrain command

- `!autotrain`: Command executed in environments like a Jupyter notebook to run shell commands directly. `autotrain` is an automatic training utility or script.

- `llm`: A sub-command or argument specifying the type of task

- `--train`: Initiates the training process.

- `--project_name my-llm`: Sets the name of the project or task to "my-llm".

- `--model abhishek/llama-2-7b-hf-small-shards`: Specifies the model that is hosted on Hugging Face named "llama-2-7b-hf-small-shards" under the "abhishek".

- `--data_path .`: The path to the dataset for training. The "." refers to the current directory. The `training.csv` file needs to be located in this directory. 

- `--use_peft`: I'm not sure what this does

- `--use_int4`: Use of INT4 quantization to reduce model size and speed up inference times at the cost of some precision.

- `--learning_rate 2e-4`: Sets the learning rate for training to 0.0002.

- `--train_batch_size 12`: Sets the batch size for training to 12.

- `--num_train_epochs 3`: The training process will iterate over the dataset 3 times.

- `--trainer sft`: Sets the trainer or training algorithm to "sft".


In [None]:
!autotrain llm --train --project_name my-llm --model abhishek/llama-2-7b-hf-small-shards --data_path . --use_peft --use_int4 --learning_rate 2e-4 --train_batch_size 12 --num_train_epochs 3 --trainer sft