[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/openlayer-ai/examples-gallery/blob/main/llms/general-llm/product-names.ipynb)


# <a id="top">Product names with LLMs</a>

This notebook illustrates how general LLMs can be uploaded to the Openlayer platform.

## <a id="toc">Table of contents</a>

1. [**Problem statement**](#problem) 

2. [**Downloading the dataset**](#dataset-download)

3. [**Adding the model outputs to the dataset**](#model-output)

2. [**Uploading to the Openlayer platform**](#upload)
    - [Instantiating the client](#client)
    - [Creating a project](#project)
    - [Uploading datasets](#dataset)
    - [Uploading models](#model)
        - [Direct-to-API](#direct-to-api)
    - [Committing and pushing to the platform](#commit)

In [None]:
%%bash

if [ ! -e "requirements.txt" ]; then
    curl "https://raw.githubusercontent.com/openlayer-ai/examples-gallery/main/llms/general-llm/requirements.txt" --output "requirements.txt"
fi

In [None]:
!pip install -r requirements.txt

## <a id="problem">1. Problem statement </a>

[Back to top](#top)


In this notebook, we will use an LLM to generate product descriptions -- similar to [this example from OpenAI](https://platform.openai.com/examples/default-product-name-gen).

A short description and seed words are given to the LLM. It then should generate product name suggestions and help us figure out the target customer for such products -- outputting a JSON.

For example, if the input is:
```
description: A home milkshake maker
seed words: fast, healthy, compact
```
the output should be something like:
```
{
    "names": ["QuickBlend", "FitShake", "MiniMix"]
    "target_custommer": "College students that are into fitness and healthy living"
}

```

## <a id="dataset-download">2. Downloading the dataset </a>

[Back to top](#top)

The dataset we'll use to evaluate the LLM is stored in an S3 bucket. Run the cells below to download it and inspect it:

In [None]:
%%bash

if [ ! -e "product_descriptions.csv" ]; then
    curl "https://openlayer-static-assets.s3.us-west-2.amazonaws.com/examples-datasets/llms/llm-base/product_descriptions.csv" --output "product_descriptions.csv"
fi

In [None]:
import pandas as pd

In [None]:
dataset = pd.read_csv("product_descriptions.csv")

dataset.head()

Our dataset has two columns: one with descriptions and one with seed words, and they are the input variables to our LLM. We will now use it to get the LLM's outputs for each row.

## <a id="dataset-download">3. Adding model outputs to the dataset </a>

[Back to top](#top)

As mentioned, we now want to add an extra column to our dataset: the `model_output` column with the LLM's prediction for each row.

There are many ways to achieve this goal, and you can pursue the path you're most comfortable with. 

One of the possibilities is using the `openlayer` Python Client with one of the supported LLMs, such as GPT-4. 

We will exemplify how to do it now. **This assumes you have an OpenAI API key.** **If you prefer not to make requests to OpenAI**, you can [skip to this cell and download the resulting dataset with the model outputs if you'd like](#download-model-output).

First, let's pip install `openlayer`:

In [None]:
!pip install openlayer

The `openlayer` Python client comes with LLM runners, which are wrappers around common LLMs -- such as OpenAI's. The idea is that these LLM runners adhere to a common interface and can be called to make predictions on pandas dataframes. 

To use `openlayer`'s LLM runners, we must follow the steps:

**1. Create a new directory**

This directory will house all the configs and files related to the LLM of our choice. Let's call ours `llm_package`:

In [None]:
!mkdir llm_package

**2. Write a YAML config file**

Now, we can write a YAML config file called `model_config.yaml` to our newly created directory:

In [None]:
# One of the pieces of information that will go into our config is the `promptTemplate`
prompt_template = """
You will be provided with a product description and seed words, and your task is to generate a list
of product names and provide a short description of the target customer for such product. The output
must be a valid JSON with attributes `names` and `target_custommer`.

For example, given:
```
description: A home milkshake maker
seed words: fast, healthy, compact
```
the output should be something like:
```
{
    "names": ["QuickBlend", "FitShake", "MiniMix"]
    "target_custommer": "College students that are into fitness and healthy living"
}

```

description: {{ description }}
seed words: {{ seed_words }}
"""
prompt = [
    {"role": "system", "content": "You are a helpful assistant."}, 
    {"role": "user", "content": prompt_template}
]

In [None]:
import yaml

# Note the camelCase for the keys
model_config = {
    "prompt": prompt,
    "inputVariableNames": ["description", "seed_words"],
    "modelProvider": "OpenAI",
    "model": "gpt-3.5-turbo",
    "modelParameters": {
        "temperature": 0
    },
    "modelType": "api",
    "name": "Product name suggestor",
    "architectureType": "llm",
}

with open("llm_package/model_config.yaml", "w") as model_config_file:
    yaml.dump(model_config, model_config_file, default_flow_style=False)

You can check out the details for every field of the `model_config.yaml` file in our documentation. 

To highlight a few important fields:
- `prompt`: this is the prompt that will get sent to the LLM. Notice that our variables are refered to in the prompt template with double handlebars `{{ }}`. When we make the request, the prompt will get injected with the input variables data from the pandas dataframe. Also, we follow OpenAI's convention with messages with `role` and `content` regardless of the LLM provider you choose.
- `inputVariableNames`: this is a list with the names of the input variables. Each input variable should be a column in the pandas dataframe that we will use. Furthermore, these are the input variables referenced in the `promptTemplate` with the handlebars.
- `modelProvider`: one of the supported model providers, such as `OpenAI`.
- `model`: name of the model from the `modelProvider`. In our case `gpt-3.5-turbo`.
- `modelParameters`: a dictionary with the model parameters for that specific `model`. For `gpt-3.5-turbo`, for example, we could specify the `temperature`, the `tokenLimit`, etc.

**3. Get the model runner**

Now we can import `models` from `openlayer` and call the `get_model_runner` function, which will return a `ModelRunner` object. This is where we'll pass the OpenAI API key. For a different LLM `modelProvider` you might need to pass a different argument -- refer to our documentation for details.

In [None]:
from openlayer import models, tasks

llm_runner = models.get_model_runner(
    task_type=tasks.TaskType.LLM,
    model_package="llm_package",
    openai_api_key="YOUR_OPENAI_API_KEY_HERE"
)

In [None]:
llm_runner

**4. Run the LLM to get the predictions**

Every model runner comes with a `run` method. This method expects a pandas dataframe with the input variables as input and returns a pandas dataframe with a single column: the predictions.

For example, to get the output for the first few rows of our dataset:

In [None]:
llm_runner.run(dataset[:3])

Now, we can get the predictions for our full dataset and add them to the column `model_output`. 

**Note that this can take some time and incurs in costs.**

In [None]:
# There are costs in running this cell!
dataset["model_output"] = llm_runner.run(dataset)["output"]

<a id="download-model-output">**Run the cell below if you didn't want to make requests to OpenAI:**</a>

In [None]:
%%bash

if [ ! -e "product_descriptions_with_outputs.csv" ]; then
    curl "https://openlayer-static-assets.s3.us-west-2.amazonaws.com/examples-datasets/llms/llm-base/product_descriptions_with_outputs.csv" --output "product_descriptions_with_outputs.csv"
fi

In [None]:
dataset = pd.read_csv("product_descriptions_with_outputs.csv")

In [None]:
dataset.head()

## <a id="upload">4. Uploading to the Openlayer platform </a>

[Back to top](#top)

Now it's time to upload the datasets and model to the Openlayer platform.

### <a id="client">Instantiating the client</a>

In [None]:
import openlayer

client = openlayer.OpenlayerClient("YOUR_API_KEY_HERE")

### <a id="project">Creating a project on the platform</a>

In [None]:
from openlayer.tasks import TaskType

project = client.create_or_load_project(
    name="Product Suggestions Project",
    task_type=TaskType.LLM,
    description="Evaluating an LLM used for product development."
)

### <a id="dataset">Uploading datasets</a>

Before adding the datasets to a project, we need to do prepare a `dataset_config.yaml` file. 

This is a file that contains all the information needed by the Openlayer platform to utilize the dataset. It should include the column names, the input variable names, etc. For details on the fields of the `dataset_config.yaml` file, see the [API reference](https://reference.openlayer.com/reference/api/openlayer.OpenlayerClient.add_dataset.html#openlayer.OpenlayerClient.add_dataset).

Let's prepare the `dataset_config.yaml` files for our validation set:

In [None]:
# Some variables that will go into the `dataset_config.yaml` file
column_names = list(dataset.columns)
input_variable_names = ["description", "seed_words"]
output_column_name = "model_output"

In [None]:
import yaml 

validation_dataset_config = {
    "columnNames": column_names,
    "inputVariableNames": input_variable_names,
    "label": "validation",
    "outputColumnName": output_column_name,
}

with open("validation_dataset_config.yaml", "w") as dataset_config_file:
    yaml.dump(validation_dataset_config, dataset_config_file, default_flow_style=False)

In [None]:
# Validation set
project.add_dataframe(
    dataset_df=dataset,
    dataset_config_file_path="validation_dataset_config.yaml",
)

We can confirm that the validation set is now staged using the `project.status()` method. 

In [None]:
project.status()

### <a id="model">Uploading models</a>

When it comes to uploading models to the Openlayer platform, there are a few options:

- The first one is to upload a **shell model**. Shell models are the most straightforward way to get started. They are comprised of metadata and all of the analysis are done via their predictions (which are [uploaded with the datasets](#dataset), in the `outputColumnName`).
- The second one is to upload a **direct-to-API model**. In this is the analogous case to using one of `openlayer`'s model runners in the notebook environment. By doing, you'll be able to interact with the LLM using the platform's UI and also perform a series of robustness assessments on the model using data that is not in your dataset. 


Since we used an LLM runner on the Jupyter Notebook, we'll follow the **direct-to-API** approach. Refer to the other notebooks for shell model examples.

#### <a id="direct-to-api"> Direct-to-API </a>

To upload a direct-to-API LLM to Openlayer, you will need to create (or point to) a model config YAML file. This model config contains the `promptTemplate`, the `modelProvider`, etc. Essentially everything needed by the Openlayer platform to make direct requests to the LLM you're using.

Note that to use a direct-to-API model on the platform, you'll need to **provide your model provider's API key (such as the OpenAI API key) using the platform's UI**, under the project settings.

Since we used an LLM runner in this notebook, we already wrote a model config YAML file. We will write it again just for completeness:

In [None]:
import yaml

# Note the camelCase for the keys
model_config = {
    "prompt": prompt,
    "inputVariableNames": ["description", "seed_words"],
    "modelProvider": "OpenAI",
    "model": "gpt-3.5-turbo",
    "modelParameters": {
        "temperature": 0
    },
    "modelType": "api",
    "name": "Product name suggestor",
    "architectureType": "llm",
}

with open("llm_package/model_config.yaml", "w") as model_config_file:
    yaml.dump(model_config, model_config_file, default_flow_style=False)

In [None]:
# Adding the model
project.add_model(
    model_config_file_path="llm_package/model_config.yaml",
)

We can confirm that both the model and the validation set are now staged using the `project.status()` method. 

In [None]:
project.status()

### <a id="commit"> Committing and pushing to the platform </a>

Finally, we can commit the first project version to the platform. 

In [None]:
project.commit("Initial commit!")

In [None]:
project.status()

In [None]:
project.push()