[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/openlayer-ai/examples-gallery/blob/main/llms/translation/portuguese-translations.ipynb)


# <a id="top">Answering questions about a website with LLMs</a>

This notebook illustrates how an LLM used for QA can be uploaded to the Openlayer platform.

## <a id="toc">Table of contents</a>

1. [**Problem statement**](#problem) 

2. [**Downloading the dataset**](#dataset-download)

3. [**Adding the model outputs to the dataset**](#model-output)

2. [**Uploading to the Openlayer platform**](#upload)
    - [Instantiating the client](#client)
    - [Creating a project](#project)
    - [Uploading datasets](#dataset)
    - [Uploading models](#model)
        - [Shell models](#shell)
    - [Committing and pushing to the platform](#commit)

In [None]:
%%bash

if [ ! -e "requirements.txt" ]; then
    curl "https://raw.githubusercontent.com/openlayer-ai/examples-gallery/main/llms/translation/requirements.txt" --output "requirements.txt"
fi

In [None]:
!pip install -r requirements.txt

## <a id="problem">1. Problem statement </a>

[Back to top](#top)


In this notebook, we will use an LLM to translate sentences in English to Portuguese. 

To do so, we start with a dataset with sentences and ground truth translations, use an LLM to get translations, and finally upload the dataset and LLM to the Openlaye platform to evaluate the results.

## <a id="dataset-download">2. Downloading the dataset </a>

[Back to top](#top)

The dataset we'll use to evaluate the LLM is stored in an S3 bucket. Run the cells below to download it and inspect it:

In [None]:
%%bash

if [ ! -e "translation_pairs.csv" ]; then
    curl "https://openlayer-static-assets.s3.us-west-2.amazonaws.com/examples-datasets/llms/translation/translation_pairs.csv" --output "translation_pairs.csv"
fi

In [None]:
import pandas as pd

In [None]:
dataset = pd.read_csv("translation_pairs.csv")

dataset.head()

Our dataset has two columns: one named `english` -- with the original sentence in English -- and one named `portuguese` -- with the ground truth translations to Portuguese. 

Note that even though we have ground truths available in our case, this is not a blocker to use Openlayer. You can check out other Jupyter Notebook examples where we work on problems without access to ground truths.

We will now use an LLM to translate from English to Portuguese.

## <a id="dataset-download">3. Adding model outputs to the dataset </a>

[Back to top](#top)

As mentioned, we now want to add an extra column to our dataset: the `model_translation` column with the LLM's prediction for each row.

There are many ways to achieve this goal, and you can pursue the path you're most comfortable with. 

Here, we will provide you with a dataset with the `model_translation` column, which we obtained by giving the following prompt to an OpenAI GPT-4.

```
You will be provided with a sentence in English, and your task is to translate it into Portuguese (Brazil).

{{ english }}
```

Run the cell below to download the dataset with the extra `model_translation` column.

In [None]:
%%bash

if [ ! -e "translation_pairs_with_output.csv" ]; then
    curl "https://openlayer-static-assets.s3.us-west-2.amazonaws.com/examples-datasets/llms/translation/translation_pairs_with_output.csv" --output "translation_pairs_with_output.csv"
fi

In [None]:
dataset = pd.read_csv("translation_pairs_with_output.csv")

dataset.head()

## <a id="upload">4. Uploading to the Openlayer platform </a>

[Back to top](#top)

Now it's time to upload the datasets and model to the Openlayer platform.

### <a id="client">Instantiating the client</a>

In [None]:
import openlayer

client = openlayer.OpenlayerClient("YOUR_API_KEY_HERE")

### <a id="project">Creating a project on the platform</a>

In [None]:
from openlayer.tasks import TaskType

project = client.create_or_load_project(
    name="Translation with LLMs",
    task_type=TaskType.LLMTranslation,
    description="Evaluating translations with an LLM from En -> Pt."
)

### <a id="dataset">Uploading datasets</a>

Before adding the datasets to a project, we need to do prepare a `dataset_config.yaml` file. 

This is a file that contains all the information needed by the Openlayer platform to utilize the dataset. It should include the column names, the input variable names, etc. For details on the fields of the `dataset_config.yaml` file, see the [API reference](https://reference.openlayer.com/reference/api/openlayer.OpenlayerClient.add_dataset.html#openlayer.OpenlayerClient.add_dataset).

Let's prepare the `dataset_config.yaml` files for our validation set:

In [None]:
# Some variables that will go into the `dataset_config.yaml` file
column_names = list(dataset.columns)
input_variable_names = ["english"]
ground_truth_column_name = "portuguese"
output_column_name = "model_translation"

In [None]:
import yaml 

validation_dataset_config = {
    "columnNames": column_names,
    "inputVariableNames": input_variable_names,
    "label": "validation",
    "outputColumnName": output_column_name,
    "groundTruthColumnName": ground_truth_column_name
}

with open("validation_dataset_config.yaml", "w") as dataset_config_file:
    yaml.dump(validation_dataset_config, dataset_config_file, default_flow_style=False)

In [None]:
# Validation set
project.add_dataframe(
    dataset_df=dataset,
    dataset_config_file_path="validation_dataset_config.yaml",
)

We can confirm that the validation set is now staged using the `project.status()` method. 

In [None]:
project.status()

### <a id="model">Uploading models</a>

When it comes to uploading models to the Openlayer platform, there are a few options:

- The first one is to upload a **shell model**. Shell models are the most straightforward way to get started. They are comprised of metadata and all of the analysis are done via their predictions (which are [uploaded with the datasets](#dataset), in the `outputColumnName`).
- The second one is to upload a **direct-to-API model**. In this is the analogous case to using one of `openlayer`'s model runners in the notebook environment. By doing, you'll be able to interact with the LLM using the platform's UI and also perform a series of robustness assessments on the model using data that is not in your dataset. 


In this notebook, we will follow the **shell model** approach. Refer to the other notebooks for direct-to-API examples.

#### <a id="shell"> Shell models </a>

To upload a shell model, we only need to define its name, the architecture type, and add some metadata that will be rendered in the platform to help us identify it. This information should be saved to a `model_config.yaml` file.

Let's create a `model_config.yaml` file for our model:

In [None]:
prompt_template = """
You will be provided with a sentence in English, and your task is to translate it into Portuguese (Brazil).

{{ english }}"""
prompt = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt_template}
]

In [None]:
import yaml

# Note the camelCase for the keys
model_config = {
    "prompt": prompt,  # Optional for shell models
    "inputVariableNames": ["english"],
    "model": "gpt-3.5-turbo", # Optional for shell models
    "modelType": "shell",
    "name": "Translator",
    "architectureType": "llm",
    "metadata": {  # Can add anything here, as long as it is a dict
        "context_used": False,
        "embedding_db": False,
        "max_token_sequence": 150
    },
}

with open("model_config.yaml", "w") as model_config_file:
    yaml.dump(model_config, model_config_file, default_flow_style=False)

In [None]:
# Adding the model
project.add_model(
    model_config_file_path="model_config.yaml",
)

We can confirm that both the model and the validation set are now staged using the `project.status()` method. 

In [None]:
project.status()

### <a id="commit"> Committing and pushing to the platform </a>

Finally, we can commit the first project version to the platform. 

In [None]:
project.commit("Initial commit!")

In [None]:
project.status()

In [None]:
project.push()