## 📦 Installation
To get started with fine-tuning your own LLM models using the Simplifine library, install it directly from the GitHub repository using the following code:

In [None]:
# Install the latest Simplifine library from the GitHub repository
!pip install git+https://github.com/simplifine-llm/Simplifine.git -q

# The 'pip install' command is used to install Python packages.
# The '-q' option stands for 'quiet', which minimizes the amount of output produced during the installation.
# 'git+https://github.com/simplifine-llm/Simplifine.git' specifies the URL of the GitHub repository from which to install the package.


## 🚀 Fine-Tuning LLaMA-3 8B Model

In this section, we will focus on fine-tuning the LLaMA-3 8B model using the Simplifine library. Follow the steps below to set up your environment, initialize WandB, prepare your dataset, and configure the Simplifine client.

In [None]:
from simplifine_alpha.train_utils import Client
import wandb
import os

# Disabling WandB logging. Change this if you'd like to enable it.
# Note that you will need a WandB token if you enable logging.
wandb.init(mode='disabled')

# Define your dataset template and response keys.
# Be sure to adjust the keys, response template, and dataset accordingly.
template = '''### TITLE: {title}\n ### ABSTRACT: {abstract}\n ###EXPLANATION: {explanation}'''
response_template = '\n ###EXPLANATION:'
keys = ['title', 'abstract', 'explanation']
dataset_name = ''  # Provide a Hugging Face dataset name if applicable.

# Set the model name to LLaMA-3 8B. Note that larger models may cause OOM (Out of Memory) errors.
model_name = 'meta-llama/Meta-Llama-3-8B'
hf_token = ''  # Insert your Hugging Face token here to access the LLaMA-3 model.

from_hf = True  # Set to False if using custom data.

# Option to use your own dataset. Change `own_data` to True if you have custom data.
own_data = False
if own_data:
    from_hf = False
    data = {}  # Insert your custom dataset here.

# Set up the Simplifine client with your API key and GPU type.
simplifine_api_key = ''
gpu_type = 'a100'  # Options are 'l4' or 'a100'

client = Client(api_key=simplifine_api_key, gpu_type=gpu_type)

# Start the training process for fine-tuning LLaMA-3 8B. Adjust parameters for parallelization if needed.
client.sft_train_cloud(
    model_name=model_name, 
    from_hf=from_hf, 
    dataset_name=dataset_name,
    keys=keys,
    template=template, 
    job_name='ddp_job',
    response_template=response_template, 
    use_zero=True, 
    use_ddp=False, 
    hf_token=hf_token
)

## 📊 Checking Job Status

After initiating the fine-tuning process, you might want to check the status of your training jobs. The following code will help you extract and display the statuses of the most recent jobs.

In [None]:
# Retrieve the status of all jobs from the client.
status = client.get_all_jobs()

# Display the status of the last 5 jobs.
for num, i in enumerate(status[-5:]):
    print(f'Number {num} status: {i}\n')

## 💾 Downloading the Trained Model

Once your fine-tuning job is complete, the next step is to download the trained model. Follow the steps below to create a folder and save the model locally.

In [None]:
job_id = ''  # Get your job ID from the list of job statuses above.

# Create a folder to store the trained model.
os.mkdir('sf_trained_model_ZeRO')

# Download and save the model to the specified folder.
# This might take some time, so relax and enjoy a cup of coffee! :)
client.download_model(job_id=job_id, extract_to='/content/sf_trained_model_ZeRO')


## 🧪 Testing Your Fine-Tuned Model

Now that you've downloaded your fine-tuned model, it's time to test it. We'll load the model and tokenizer using the `transformers` library and generate a sample output.

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer

# Define the path where the trained model is stored.
path = '/content/sf_trained_model_ZeRO'

# Load the fine-tuned model and tokenizer.
sf_model = AutoModelForCausalLM.from_pretrained(path)
sf_tokenizer = AutoTokenizer.from_pretrained(path)

# Create an example input for the model.
input_example = '''### TITLE: title 1\n ### ABSTRACT: abstract 1\n ###EXPLANATION: '''

# Tokenize the input example.
input_example = sf_tokenizer(input_example, return_tensors='pt')

# Generate output from the fine-tuned model.
output = sf_model.generate(input_example['input_ids'],
                           attention_mask=input_example['attention_mask'],
                           max_length=30,
                           eos_token_id=sf_tokenizer.eos_token_id,
                           early_stopping=True,
                           pad_token_id=sf_tokenizer.eos_token_id)

# Decode and print the generated output.
print(sf_tokenizer.decode(output[0]))