## Preparing the Data

We will demonstrate fine-tuning open source models in order to classify emails into two categories, based on their content.

We will prepare 950 examples to fine-tune on, and use 50 examples to test the performance of our fine-tune. 



In [None]:
%pip install scikit-learn

In [None]:
from sklearn.datasets import fetch_20newsgroups
import pandas as pd

categories = ['rec.sport.baseball', 'rec.sport.hockey']
sports_dataset = fetch_20newsgroups(subset='train', shuffle=True, random_state=42, categories=categories)

labels = [sports_dataset.target_names[x].split('.')[-1] for x in sports_dataset['target']]
texts = [text.strip() for text in sports_dataset['data']]
df = pd.DataFrame(zip(texts, labels), columns = ['raw_prompt','response'])[:1000]
df_train = df[:950]
df_test = df[950:]

Let's first take a look at our data:

In [None]:
df_train['raw_prompt'].iloc[0]

In [None]:
df_train['response'].value_counts()

In [None]:
df_test['response'].value_counts()

Since we are training a text generation model, let's do a bit of (extremely basic) prompt engineering to use the model for classification.

In [None]:
def build_prompt(text: str):
    return f"Prompt: {text}\nCategory: "

def prepare_df(df: pd.DataFrame):
    # df['prompt'] = df.apply(lambda row: build_prompt(row['raw_prompt']), axis=1)
    df['prompt'] = df['raw_prompt'].apply(build_prompt)
    df.drop('raw_prompt', axis=1, inplace=True)

In [None]:
prepare_df(df_train)

In [None]:
df_train.head()

The data needs to end up in a CSV file that has two columns: `prompt` and `response`, and that is publicly accessible.

In [None]:
df_train.to_csv("sports_training_dataset.csv")

Currently, data needs to be uploaded to a publicly accessible web URL so that it can be read for fine-tuning. Publicly accessible HTTP and HTTPS URLs are currently supported. Support for privately sharing data with the LLM Engine API is coming shortly. For quick iteration, you can look into tools like Pastebin or Github Gists to quickly host your CSV files in a public manner. We created an example Github Gist you can see [here](https://gist.github.com/tigss/7cec73251a37de72756a3b15eace9965). To use the gist, you can just use the URL given when you click the “Raw” button ([URL](https://gist.githubusercontent.com/tigss/7cec73251a37de72756a3b15eace9965/raw/85d9742890e1e6b0c06468507292893b820c13c9/llm_sample_data.csv)).

We've uploaded our CSV file to `s3://scale-demo-datasets/sports/sports_training_dataset.csv`, which maps to a URL of `https://scale-demo-datasets.s3.us-west-2.amazonaws.com/sports/sports_training_dataset.csv`.

## Fine-Tuning the Model

Next, we create the fine-tune from our training file via the FineTune API. Note: this can take roughly 15-20 minutes with a few hundred examples, as there is a queue of jobs to run.

For this section, you will need an API key to interact with Scale. To retrieve your API key, head to [Scale Spellbook](https://spellbook.scale.com/) where you will get an API key on the [settings](https://spellbook.scale.com/settings) page.

In [None]:
# Note: you must have the environment variable SCALE_API_KEY set to your Spellbook API key. 

from llmengine import FineTune, Completion, Model

FineTune.validate_api_key()

In [None]:


create_fine_tune_response = FineTune.create(
    model="llama-7b",
    training_file="https://scale-demo-datasets.s3.us-west-2.amazonaws.com/sports/sports_training_dataset.csv",
    validation_file=None,
    hyperparameters={},
    suffix="my-first-fine-tune"
)

fine_tune_id = create_fine_tune_response.fine_tune_id

In [None]:
# Wait for fine tune to complete


fine_tune_status = FineTune.retrieve(fine_tune_id).status
print(fine_tune_status)
if fine_tune_status == "SUCCESS":
    print("Fine-Tune Succeeded!")
elif fine_tune_status in ["FAILURE", "CANCELLED"]:
    raise ValueError("Fine Tune failed")


Assuming you're running this script for the first time, we can get your fine-tune via looking at all the models available.

In [None]:
all_models = Model.list().model_endpoints

# We want to get just your fine-tuned models.
your_personal_fine_tunes = [model for model in all_models if not model.spec.public_inference]

your_fine_tuned_model = your_personal_fine_tunes[0].name

If you've already created a fine-tune from a previous run, we can also just use that value

In [None]:
# hard-coded value from a previous run of this script
your_fine_tuned_model = "llama-7b.my-first-finetune.2023-07-17-19-44-20"

## Test the Fine-Tune

Next, we run our model on the test dataset via the Completions API. Since we trained using a prompt template, use that prompt template when making predictions. Note: you may have to wait a few minutes after the fine-tune succeeds in order for your model to be loaded.

In [None]:
def get_classification(prompt: str):
    for _ in range(5):
        try:
            response = Completion.create(
                model_name=your_fine_tuned_model, 
                prompt=build_prompt(prompt), 
                max_new_tokens=2, 
                temperature=0.01
            )
            return response.outputs[0].text.rstrip("\n")
        except Exception as e:
            print(e)
    else:
        return "Error"

In [None]:
df_test["predicted_response"] = df_test["raw_prompt"].apply(get_classification)

Let's peek at the data and calculate our test accuracy!

In [None]:
df_test.head()

In [None]:
num_correct = len(df_test[df_test["predicted_response"] == (df_test["response"])])

In [None]:
num_correct / len(df_test)

In [None]:
df_test[df_test["predicted_response"] != df_test["response"]]