<a href="https://colab.research.google.com/github/mudogruer/SLMs/blob/main/SciQ_Mixtral.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **SciqMixtral: LLM Fine-Tuning with Predibase**

This quickstart will show you how to prompt, fine-tune, and deploy LLMs in Predibase. We'll be following a code generation use case where our end result will be a fine-tuned Mixtral-8x7b model that takes in natural language as input and returns code as output.

In [None]:
pip install -U predibase --quiet

# **Setup**

You'll first need to initialize your PredibaseClient object and configure your API token.

In [None]:
from predibase import PredibaseClient

pc = PredibaseClient(token="API-KEY")

# **Prompt a deployed LLM**

For our code generation use case, let's first see how Llama 2 7B performs out of the box.

If you are in the Predibase SaaS environment, you have access to shared [serverless LLM deployments](https://docs.predibase.com/ui-guide/llms/query-llm/shared_deployments), including Llama 2 7B.

If you are in a VPC environment, you'll need to first [deploy a pretrained LLM](https://docs.predibase.com/user-guide/inference/dedicated_deployments#pretrained-llm-deployment).

In [None]:
llm_deployment = pc.LLM("pb://deployments/mixtral-8x7b-instruct-v0-1")
result: list = llm_deployment.prompt("""
    Answer the following question based on the provided text.

    ### Question: What is the most common element in the world?

    ### Answer:
""", max_new_tokens=256)
print(result.response)

The most common element in the world is oxygen. It makes up about 46.6% of the Earth's crust by mass and is a crucial component of the air we breathe, the water we drink, and the earth we tread on.


# **Fine-tune a pretrained LLM**

Next we'll upload a dataset and fine-tune to see if we can get better performance.

The Sciq dataset is used for fine-tuning large language models to follow instructions to produce code from natural language and consists of the following columns:

- `question` that describes a task
- the expected `output`


In [None]:
!wget https://predibase-public-us-west-2.s3.us-west-2.amazonaws.com/datasets/code_alpaca_800.csv

--2023-10-06 20:55:05--  https://predibase-public-us-west-2.s3.us-west-2.amazonaws.com/datasets/code_alpaca_800.csv
Resolving predibase-public-us-west-2.s3.us-west-2.amazonaws.com (predibase-public-us-west-2.s3.us-west-2.amazonaws.com)... 52.92.152.234, 52.218.182.242, 52.218.221.121, ...
Connecting to predibase-public-us-west-2.s3.us-west-2.amazonaws.com (predibase-public-us-west-2.s3.us-west-2.amazonaws.com)|52.92.152.234|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 234707 (229K) [text/csv]
Saving to: ‘code_alpaca_800.csv’


2023-10-06 20:55:05 (2.06 MB/s) - ‘code_alpaca_800.csv’ saved [234707/234707]



**Now we will perform the following actions to start our fine-tuning job:**
1. Upload the dataset to Predibase for training
2. Create a prompt template to use for fine-tuning
3. Select the LLM we want to fine-tune
4. Kick off the fine-tuning job


In [None]:
# Upload the dataset to Predibase (estimated time: 2 minutes due to creation of Predibase dataset with dataset profile)
# If you've already uploaded the dataset before, you can skip uploading and get the dataset directly with
dataset = pc.get_dataset("sciq_dataset", "file_uploads")
#dataset = pc.upload_dataset("xyz.csv")

In [None]:
dataset

Dataset(id=9470, name=sciq_dataset, object_name=91feae762151461e93db755f03db1768, connection_id=6363, author=mustafa.dogruer@iu-study.org, created=2024-03-09T16:21:48.401064Z, updated=2024-03-09T16:21:48.401064Z)

In [None]:
# Define the template used to prompt the model for each example
# Note the 4-space indentation, which is necessary for the YAML templating.
prompt_template = """
    Answer the following question based on the provided text.

    ### Question: {question}

    ### Answer:
"""

# Specify the Huggingface LLM you want to fine-tune
# Kick off a fine-tuning job on the uploaded dataset
llm = pc.LLM("hf://mistralai/Mixtral-8x7B-Instruct-v0.1")
job = llm.finetune(
    prompt_template=prompt_template,
    target="answer",
    dataset=dataset,
    repo="mixtral_sciq"
)

# Wait for the job to finish and get training updates and metrics
model = job.get()

✓ Queued 0:00:33   
✓ Preprocessing 0:00:25   


┌──────────┬──────────┬──────────────────┬──────────────────────────┬──────────┬──────────┬──────────┐
│  epochs  [0m│   time   [0m│     feature      [0m│          metric          [0m│  train   [0m│   val    [0m│   test   [0m│
├──────────┼──────────┼──────────────────┼──────────────────────────┼──────────┼──────────┼──────────┤
│    0     [0m│ 00:35:01 [0m│      answer      [0m│           bleu           [0m│    0     [0m│          [0m│    0     [0m│
│          [0m│          [0m│                  [0m│           loss           [0m│  5.4782  [0m│          [0m│  1.7131  [0m│
│          [0m│          [0m│                  [0m│  next_token_perplexity   [0m│ 21983.99…[0m│          [0m│ 18028.70…[0m│
│          [0m│          [0m│                  [0m│        perplexity        [0m│ 29359.39…[0m│          [0m│ 28417.15…[0m│
│          [0m│          [0m│                  [0m│     word_error_rate      [0m│ 19.5176  [0m│          [0m│ 20.8505  [0m│
│       

# **Prompt your fine-tuned LLM**

Predibase supports both real-time inference, as well as [batch inference](https://docs.predibase.com/user-guide/inference/batch_prediction).

#### **Real-time inference using _LoRAX_** (Recommended)

[LoRA eXchange (LoRAX)](https://predibase.com/blog/lorax-the-open-source-framework-for-serving-100s-of-fine-tuned-llms-in) allows you to prompt your fine-tuned LLM without needing to create a new deployment for each model you want to prompt. Predibase automatically loads your fine-tuned weights on top of a shared LLM deployment on demand. While this means that there will be a small amount of additional latency, the benefit is that a single LLM deployment can support many different fine-tuned model versions without requiring additional compute.

Note: Inference using dynamic adapter deployments is available to both SaaS and VPC users. Predibase provides shared [serverless base LLM deployments](https://docs.predibase.com/user-guide/inference/serverless_deployments) for use in our SaaS environment. VPC users need [deploy their own base model](https://docs.predibase.com/user-guide/inference/dedicated_deployments#pretrained-llm-deployment).

In [None]:
# Since our model was fine-tuned from a Llama-2-7b base, we'll use the shared deployment with the same model type.
base_deployment = pc.LLM("pb://deployments/mixtral-8x7b-instruct-v0-1")

# Now we just specify the adapter to use, which is the model we fine-tuned.
model = pc.get_model("mixtral_sciq")
adapter_deployment = base_deployment.with_adapter(model)

# Recall that our model was fine-tuned using a template that accepts an {instruction}
# and an {input}. This template is automatically applied when prompting.
result = adapter_deployment.prompt("What is the formula of sugar?",max_new_tokens=256)

print(result.response)

c 12 h 22 o 11


In [None]:
!pip install -q -U transformers bert-score evaluate datasets

In [None]:
from datasets import load_dataset
dataset_test = load_dataset('sciq',split='test')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [None]:
dataset_test

Dataset({
    features: ['question', 'distractor3', 'distractor1', 'distractor2', 'correct_answer', 'support'],
    num_rows: 1000
})

In [None]:
answers = []
for question in dataset_test["question"]:
    prompt = question
    answer = adapter_deployment.prompt(prompt,temperature=0.1,max_new_tokens=256)
    answers.append(answer.response)

In [None]:
answers[20:30]

['diffusion',
 'plant cell',
 'goosebumps',
 'bone fractures',
 'bonds',
 'vitamins',
 'nitrogen',
 'asexual',
 'stomach',
 'reproduction']

In [None]:
from evaluate import load
import numpy as np
bertscore = load("bertscore")
predictions = answers
references = dataset_test["correct_answer"]
results = bertscore.compute(predictions=predictions, references=references, model_type="distilbert-base-uncased")
print("precision: ",round(np.mean(list(results["precision"])),5))
print("recall: ",round(np.mean(list(results["recall"])),5))
print("f1: ",round(np.mean(list(results["f1"])),5))

Downloading builder script:   0%|          | 0.00/7.95k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

precision:  0.91723
recall:  0.91379
f1:  0.91498


In [None]:
comparison_dataset = load_dataset('sciq',split='test[:20%]')

In [None]:
answers_comparison = []
for question in comparison_dataset["question"]:
    prompt = question
    answer = adapter_deployment.prompt(prompt,temperature=0.1,max_new_tokens=256)
    answers_comparison.append(answer.response)

In [None]:
bertscore = load("bertscore")
predictions = answers_comparison
references = comparison_dataset["correct_answer"]
results = bertscore.compute(predictions=predictions, references=references, model_type="distilbert-base-uncased")
print("precision: ",round(np.mean(list(results["precision"])),5))
print("recall: ",round(np.mean(list(results["recall"])),5))
print("f1: ",round(np.mean(list(results["f1"])),5))

ERROR:root:Internal Python error in the inspect module.
Below is the traceback from this internal error.

ERROR:root:Internal Python error in the inspect module.
Below is the traceback from this internal error.

ERROR:root:Internal Python error in the inspect module.
Below is the traceback from this internal error.



NameError: name 'load' is not defined

During handling of the above exception, another exception occurred:

AttributeError: 'NameError' object has no attribute '_render_traceback_'

During handling of the above exception, another exception occurred:

AssertionError
NameError: name 'load' is not defined

During handling of the above exception, another exception occurred:

AttributeError: 'NameError' object has no attribute '_render_traceback_'

During handling of the above exception, another exception occurred:

TypeError: object of type 'NoneType' has no len()

During handling of the above exception, another exception occurred:

AttributeError: 'TypeError' object has no attribute '_render_traceback_'

During handling of the above exception, another exception occurred:

AssertionError
NameError: name 'load' is not defined

During handling of the above exception, another exception occurred:

AttributeError: 'NameError' object has no attribute '_render_traceback_'

During handling of the 