<a href="https://colab.research.google.com/github/jiangzhx/colab/blob/main/Quickstart_LLM_Fine_Tuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Quickstart: LLM Fine-Tuning with Predibase**

This quickstart will show you how to prompt, fine-tune, and deploy LLMs in Predibase. We'll be following a code generation use case where our end result will be a fine-tuned Llama 2 7B model that takes in natural language as input and returns code as output.

In [None]:
pip install -U predibase --quiet

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/59.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m59.6/59.6 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.4/12.4 MB[0m [31m43.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m129.9/129.9 kB[0m [31m8.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.4/49.4 kB[0m [31m4.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m64.0 MB/s[0m eta [36m0:00:00

# **Setup**

You'll first need to initialize your PredibaseClient object and configure your API token.

In [None]:
from predibase import PredibaseClient

pc = PredibaseClient(token="{your-api-token}")

# **Prompt a deployed LLM**

For our code generation use case, let's first see how Llama 2 7B performs out of the box.

If you are in the Predibase SaaS environment, you have access to shared [serverless LLM deployments](https://docs.predibase.com/ui-guide/llms/query-llm/shared_deployments), including Llama 2 7B.

If you are in a VPC environment, you'll need to first [deploy a pretrained LLM](https://docs.predibase.com/user-guide/inference/dedicated_deployments#pretrained-llm-deployment).

In [None]:
llm_deployment = pc.LLM("pb://deployments/llama-2-7b")
result: list = llm_deployment.prompt("""
    Below is an instruction that describes a task, paired with an input
    that may provide further context. Write a response that appropriately
    completes the request.

    ### Instruction: Write an algorithm in Java to reverse the words in a string.

    ### Input: The quick brown fox

    ### Response:
""", max_new_tokens=256)
print(result.response)


    ### Instruction: Write an algorithm in Java to reverse the words in a string.

    ### Input: The quick brown fox

    ### Response:

    ### Instruction: Write an algorithm in Java to reverse the words in a string.

    ### Input: The quick brown fox

    ### Response:

    ### Instruction: Write an algorithm in Java to reverse the words in a string.

    ### Input: The quick brown fox

    ### Response:

    ### Instruction: Write an algorithm in Java to reverse the words in a string.

    ### Input: The quick brown fox

    ### Response:

    ### Instruction: Write an algorithm in Java to reverse the words in a string.

    ### Input: The quick brown fox

    ### Response:

    ### Instruction: Write an algorithm in Java to reverse the words in a string.

    ### Input: The quick brown fox

    ### Response:

    ### Instruction: Write an algorithm in Java to reverse the words in a string.

    ### Input: The quick brown fox

    ###


# **Fine-tune a pretrained LLM**

Next we'll upload a dataset and fine-tune to see if we can get better performance.

The [Code Alpaca](https://github.com/sahil280114/codealpaca) dataset is used for fine-tuning large language models to follow instructions to produce code from natural language and consists of the following columns:

- `instruction` that describes a task
- `input` when additional context is required for the instruction
- the expected `output`


For the sake of this quickstart, we've created a version of the Code Alpaca dataset with fewer rows so that the model trains significantly faster.

In [None]:
!wget https://predibase-public-us-west-2.s3.us-west-2.amazonaws.com/datasets/code_alpaca_800.csv

--2023-10-06 20:55:05--  https://predibase-public-us-west-2.s3.us-west-2.amazonaws.com/datasets/code_alpaca_800.csv
Resolving predibase-public-us-west-2.s3.us-west-2.amazonaws.com (predibase-public-us-west-2.s3.us-west-2.amazonaws.com)... 52.92.152.234, 52.218.182.242, 52.218.221.121, ...
Connecting to predibase-public-us-west-2.s3.us-west-2.amazonaws.com (predibase-public-us-west-2.s3.us-west-2.amazonaws.com)|52.92.152.234|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 234707 (229K) [text/csv]
Saving to: ‘code_alpaca_800.csv’


2023-10-06 20:55:05 (2.06 MB/s) - ‘code_alpaca_800.csv’ saved [234707/234707]



**Now we will perform the following actions to start our fine-tuning job:**
1. Upload the dataset to Predibase for training
2. Create a prompt template to use for fine-tuning
3. Select the LLM we want to fine-tune
4. Kick off the fine-tuning job

The fine-tuning job should take around 35-45 minutes total. Queueing time depends on how quickly we're able acquire resources and what other jobs might be ahead in the queue. The training time itself should be around 25-30 minutes. As the model trains, you can receive updated metrics in your notebook or terminal. You can also see metrics and visualizations in the Predibase UI.

In [None]:
# Upload the dataset to Predibase (estimated time: 2 minutes due to creation of Predibase dataset with dataset profile)
# If you've already uploaded the dataset before, you can skip uploading and get the dataset directly with
# "dataset = pc.get_dataset("code_alpaca_800", "file_uploads")".
dataset = pc.upload_dataset("code_alpaca_800.csv")

In [None]:
# Define the template used to prompt the model for each example
# Note the 4-space indentation, which is necessary for the YAML templating.
prompt_template = """Below is an instruction that describes a task, paired with an input
    that may provide further context. Write a response that appropriately
    completes the request.

    ### Instruction: {instruction}

    ### Input: {input}

    ### Response:
"""

# Specify the Huggingface LLM you want to fine-tune
# Kick off a fine-tuning job on the uploaded dataset
llm = pc.LLM("hf://meta-llama/Llama-2-7b-hf")
job = llm.finetune(
    prompt_template=prompt_template,
    target="output",
    dataset=dataset,
    # repo="optional-custom-model-repository-name"
)

# Wait for the job to finish and get training updates and metrics
model = job.get()

✓ Queued 00:07:59   
✓ Preprocessing 00:08:09   


┌──────────┬──────────┬──────────────────┬──────────────────────────┬──────────┬──────────┬──────────┐
│  epochs  [0m│   time   [0m│     feature      [0m│          metric          [0m│  train   [0m│   val    [0m│   test   [0m│
├──────────┼──────────┼──────────────────┼──────────────────────────┼──────────┼──────────┼──────────┤
│    1     [0m│ 00:14:22 [0m│     combined     [0m│           loss           [0m│  0.8729  [0m│          [0m│  1.5568  [0m│
│          [0m│          [0m│      output      [0m│           bleu           [0m│  0.1988  [0m│          [0m│  0.1616  [0m│
│          [0m│          [0m│                  [0m│     char_error_rate      [0m│  1.3440  [0m│          [0m│  1.3493  [0m│
│          [0m│          [0m│                  [0m│           loss           [0m│  0.8729  [0m│          [0m│  1.5568  [0m│
│          [0m│          [0m│                  [0m│  next_token_perplexity   [0m│ 15032.72…[0m│          [0m│ 17560.66…[0m│
│       

✓ Evaluating 00:40:59   
✓ Visualizing 00:41:01   
Ready


# **Prompt your fine-tuned LLM**

Predibase supports both real-time inference, as well as [batch inference](https://docs.predibase.com/user-guide/inference/batch_prediction).

#### **Real-time inference using _LoRAX_** (Recommended)

[LoRA eXchange (LoRAX)](https://predibase.com/blog/lorax-the-open-source-framework-for-serving-100s-of-fine-tuned-llms-in) allows you to prompt your fine-tuned LLM without needing to create a new deployment for each model you want to prompt. Predibase automatically loads your fine-tuned weights on top of a shared LLM deployment on demand. While this means that there will be a small amount of additional latency, the benefit is that a single LLM deployment can support many different fine-tuned model versions without requiring additional compute.

Note: Inference using dynamic adapter deployments is available to both SaaS and VPC users. Predibase provides shared [serverless base LLM deployments](https://docs.predibase.com/user-guide/inference/serverless_deployments) for use in our SaaS environment. VPC users need [deploy their own base model](https://docs.predibase.com/user-guide/inference/dedicated_deployments#pretrained-llm-deployment).

In [None]:
# Since our model was fine-tuned from a Llama-2-7b base, we'll use the shared deployment with the same model type.
base_deployment = pc.LLM("pb://deployments/llama-2-7b")

# Now we just specify the adapter to use, which is the model we fine-tuned.
model = pc.get_model("Llama-2-7b-hf-code_alpaca_800")
adapter_deployment = base_deployment.with_adapter(model)

# Recall that our model was fine-tuned using a template that accepts an {instruction}
# and an {input}. This template is automatically applied when prompting.
result = adapter_deployment.prompt(
    {
      "instruction": "Write an algorithm in Java to reverse the words in a string.",
      "input": "The quick brown fox"
    },
    max_new_tokens=256)

print(result.response)

   public static String reverseWords(String str) {
        String[] words = str.split(" ");
        StringBuilder sb = new StringBuilder();
        for (int i = words.length - 1; i >= 0; i--) {
            sb.append(words[i]).append(" ");
        }
        return sb.toString();
    }


#### **Real-time inference using a _Dedicated Deployment_** (VPC and Premium SaaS)

Once deployed, you can use the prompt method in the SDK to query your model or use the Query Editor in the Predibase UI. Deploying the fine-tuned LLM from this Quickstart guide should take around 10 minutes.

Note: Only **VPC and Premium SaaS users with the Admin role** will be able to deploy a fine-tuned LLM.

In [None]:
finetuned_llm = model.deploy("llama-2-7b-finetune").get()

# Recall that our model was fine-tuned using a template that accepts an {instruction}
# and an {input}. This template is automatically applied when prompting.
result = finetuned_llm.prompt(
    {
        "instruction": "Write an algorithm in Java to reverse the words in a string.",
        "input": "The quick brown fox"
    },
    max_new_tokens=256)

print(result.response)

# **What's Next?**

*   [Advanced fine-tuning customization](https://docs.predibase.com/user-guide/training/finetune#customizing-fine-tuning-with-different-parameter-values-in-line)
*   [Prompt via REST](https://docs.predibase.com/user-guide/inference/rest_api)
