[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/openlayer-ai/examples-gallery/blob/main/llms/question-answering/website-faq.ipynb)


# <a id="top">Answering questions about a website with LLMs</a>

This notebook illustrates how an LLM used for QA can be uploaded to the Openlayer platform.

## <a id="toc">Table of contents</a>

1. [**Problem statement**](#problem) 

2. [**Downloading the dataset**](#dataset-download)

3. [**Adding the model outputs to the dataset**](#model-output)

2. [**Uploading to the Openlayer platform**](#upload)
    - [Instantiating the client](#client)
    - [Creating a project](#project)
    - [Uploading datasets](#dataset)
    - [Uploading models](#model)
        - [Shell models](#shell)
    - [Committing and pushing to the platform](#commit)

In [None]:
%%bash

if [ ! -e "requirements.txt" ]; then
    curl "https://raw.githubusercontent.com/openlayer-ai/examples-gallery/main/llms/question-answering/requirements.txt" --output "requirements.txt"
fi

In [None]:
!pip install -r requirements.txt

## <a id="problem">1. Problem statement </a>

[Back to top](#top)


In this notebook, we will use an LLM to answer questions about a crawled website. It illustrates how the [LLM used in OpenAI's tutorial](https://platform.openai.com/docs/tutorials/web-qa-embeddings) can be used with the Openlayer platform.

The interested reader is encouraged to follow OpenAI's tutorial using the Embeddings API and then using the crawled website as context for the LLM. Here, we will focus on how such LLM can be uploaded to the Openlayer platform for evaluation.

## <a id="dataset-download">2. Downloading the dataset </a>

[Back to top](#top)

The dataset we'll use to evaluate the LLM is stored in an S3 bucket. Run the cells below to download it and inspect it:

In [None]:
%%bash

if [ ! -e "openai_questions.csv" ]; then
    curl "https://openlayer-static-assets.s3.us-west-2.amazonaws.com/examples-datasets/llms/qa/openai_questions.csv" --output "openai_questions.csv"
fi

In [None]:
import pandas as pd

In [None]:
dataset = pd.read_csv("openai_questions.csv")

dataset.head()

Our dataset has a single column with questions for the LLM. We will now use the LLM constructed on OpenAI's tutorial to get the answers for each row.

## <a id="dataset-download">3. Adding model outputs to the dataset </a>

[Back to top](#top)

As mentioned, we now want to add an extra column to our dataset: the `model_output` column with the LLM's prediction for each row.

There are many ways to achieve this goal. Here, we will assume that you have run the LLM the same way OpenAI outlines in their tutorial, which the [code can be found here](https://github.com/openai/openai-cookbook/blob/c651bfdda64ac049747c2a174cde1c946e2baf1d/apps/web-crawl-q-and-a/web-qa.ipynb).

Run the cell below to download the dataset with the extra `answer` column.

In [None]:
%%bash

if [ ! -e "openai_questions_and_answers.csv" ]; then
    curl "https://openlayer-static-assets.s3.us-west-2.amazonaws.com/examples-datasets/llms/qa/openai_questions_and_answers.csv" --output "openai_questions_and_answers.csv"
fi

In [None]:
dataset = pd.read_csv("openai_questions_and_answers.csv")

dataset.head()

## <a id="upload">4. Uploading to the Openlayer platform </a>

[Back to top](#top)

Now it's time to upload the datasets and model to the Openlayer platform.

### <a id="client">Instantiating the client</a>

In [None]:
import openlayer

client = openlayer.OpenlayerClient("YOUR_API_KEY_HERE")

### <a id="project">Creating a project on the platform</a>

In [None]:
from openlayer.tasks import TaskType

project = client.create_or_load_project(
    name="QA with LLMs",
    task_type=TaskType.LLMQuestionAnswering,
    description="Evaluating an LLM used for QA."
)

### <a id="dataset">Uploading datasets</a>

Before adding the datasets to a project, we need to do Prepare a `dataset_config`.  

This is a Python dictionary that contains all the information needed by the Openlayer platform to utilize the dataset. It should include the column names, the input variable names, etc. For details on the `dataset_config` items, see the [API reference](https://reference.openlayer.com/reference/api/openlayer.OpenlayerClient.add_dataset.html#openlayer.OpenlayerClient.add_dataset).

Let's prepare the `dataset_config` for our validation set:

In [None]:
# Some variables that will go into the `dataset_config`
input_variable_names = ["questions"]
output_column_name = "answers"

In [None]:
validation_dataset_config = {
    "inputVariableNames": input_variable_names,
    "label": "validation",
    "outputColumnName": output_column_name,
}

In [None]:
# Validation set
project.add_dataframe(
    dataset_df=dataset,
    dataset_config=validation_dataset_config,
)

We can confirm that the validation set is now staged using the `project.status()` method. 

In [None]:
project.status()

### <a id="model">Uploading models</a>

When it comes to uploading models to the Openlayer platform, there are a few options:

- The first one is to upload a **shell model**. Shell models are the most straightforward way to get started. They are comprised of metadata and all of the analysis are done via their predictions (which are [uploaded with the datasets](#dataset), in the `outputColumnName`).
- The second one is to upload a **direct-to-API model**. In this is the analogous case to using one of `openlayer`'s model runners in the notebook environment. By doing, you'll be able to interact with the LLM using the platform's UI and also perform a series of robustness assessments on the model using data that is not in your dataset. 


In this notebook, we will follow the **shell model** approach. Refer to the other notebooks for direct-to-API examples.

#### <a id="shell"> Shell models </a>

To upload a shell model, we only need to prepare its `model_config` Python dictionary.

Let's create a `model_config` for our model:

In [None]:
# Note the camelCase for the keys
model_config = {
    "inputVariableNames": ["questions"],
    "modelType": "shell",
    "metadata": {  # Can add anything here, as long as it is a dict
        "context_used": True,
        "embedding_db": False,
        "max_token_sequence": 150
    }
}

In [None]:
# Adding the model
project.add_model(
    model_config=model_config,
)

We can confirm that both the model and the validation set are now staged using the `project.status()` method. 

In [None]:
project.status()

### <a id="commit"> Committing and pushing to the platform </a>

Finally, we can commit the first project version to the platform. 

In [None]:
project.commit("Initial commit!")

In [None]:
project.status()

In [None]:
project.push()