[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/openlayer-ai/examples-gallery/blob/main/development/llms/langchain/question-answering/question-answering.ipynb)


# <a id="top">Using a LangChain chain to answer Python questions</a>

This notebook illustrates how a LangChain chain can be uploaded to the Openlayer platform.

## <a id="toc">Table of contents</a>

1. [**Problem statement**](#problem) 

2. [**Constructing the chain**](#chain)

3. [**Constructing the dataset**](#dataset-output)

2. [**Uploading to the Openlayer platform**](#upload)
    - [Instantiating the client](#client)
    - [Creating a project](#project)
    - [Uploading datasets](#dataset)
    - [Uploading models](#model)
    - [Committing and pushing to the platform](#commit)

In [None]:
%%bash

if [ ! -e "requirements.txt" ]; then
    curl "https://raw.githubusercontent.com/openlayer-ai/examples-gallery/main/development/llms/langchain/question-answering/requirements.txt" --output "requirements.txt"
fi

In [None]:
!pip install -r requirements.txt

## <a id="problem">1. Problem statement </a>

[Back to top](#top)


In this notebook, we will create a LangChain chain similar to the one from the [Quickstart](https://python.langchain.com/docs/get_started/quickstart).

Then, we will use it to construct a dataset, and, finally, upload it to the Openlayer platform to evaluate the LLM's performance.

## <a id="chain">2. Constructing the chain </a>

[Back to top](#top)


**Defining the LLM:**

In [None]:
from langchain.chat_models import ChatOpenAI


llm = ChatOpenAI(openai_api_key="YOUR_OPENAI_API_KEY_HERE") 

**Defining the prompt:**

In [None]:
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
)

template = """You are a helpful assistant who answers user's questions about Python.
A user will pass in a question, and you should answer it very objectively.
Use AT MOST 5 sentences. If you need more than 5 sentences to answer, say that the
user should make their question more objective."""
system_message_prompt = SystemMessagePromptTemplate.from_template(template)

human_template = "{question}"
human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)

In [None]:
chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])

**Defining the chain:**

In [None]:
from langchain.chains import LLMChain

chain = LLMChain(
    llm=llm,
    prompt=chat_prompt,
)

**Using the chain:**

In [None]:
chain.run("How can I define a class?")

## <a id="dataset-output">3. Constructing the dataset </a>

[Back to top](#top)


Now, let's say we have a list of questions that our chain can answer. Let's use the chain we created and capture its output to construct a dataset.

**This assumes you have a valid OpenAI API key and are willing to use it.** **If you prefer not to make the LLM requests**, you can [skip to this cell and download the resulting dataset with the model outputs if you'd like](#download-model-output).

In [None]:
questions_list = [
    "What is Python and why is it popular?",
    "How do I write a single-line comment in Python?",
    "What is the purpose of indentation in Python?",
    "Can you explain the difference between Python 2 and Python 3?",
    "What is the Python Standard Library?",
    "How do I declare a variable in Python?",
    "What are data types and how do they work in Python?",
    "How can I convert one data type to another?",
    "What is the 'print()' function used for?",
    "How do I get user input in Python?",
    "What are strings and how can I manipulate them?",
    "How do I format strings in Python?",
    "What is a list and how do I create one?",
    "How do I access elements in a list?",
    "What is a tuple and how is it different from a list?",
    "How can I add or remove items from a list?",
    "What is a dictionary and how can I use it?",
    "How do I loop through data using 'for' loops?",
    "What is a 'while' loop and how do I use it?",
    "How do I write conditional statements in Python?",
    "What does 'if', 'elif', and 'else' do?",
    "What is a function and how do I define one?",
    "How do I call a function?",
    "What is the return statement in a function?",
    "How can I reuse code using functions?",
    "What are modules and how do I use them?",
    "How can I handle errors and exceptions in Python?",
    "What is object-oriented programming (OOP)?",
    "What are classes and objects?",
    "How can I create and use a class?",
    "What is inheritance and why is it useful?",
    "How do I import classes and functions from other files?",
    "What is the purpose of '__init__()' in a class?",
    "How can I override methods in a subclass?",
    "What are instance variables and class variables?",
    "What is encapsulation in OOP?",
    "What are getter and setter methods?",
    "How do I read and write files in Python?",
    "What is the 'with' statement used for?",
    "How can I handle CSV and JSON files?",
    "What is list comprehension?",
    "How can I sort and filter data in a list?",
    "What are lambda functions?",
    "What is the difference between a shallow copy and a deep copy?",
    "How do I work with dates and times in Python?",
    "What is recursion and when is it useful?",
    "How do I install external packages using 'pip'?",
    "What is a virtual environment and why should I use one?",
    "How can I work with APIs in Python?",
    "What are decorators?",
    "Can you explain the Global Interpreter Lock (GIL)?"
]

In [None]:
# Creating the dataset (a pandas df)
import pandas as pd

dataset = pd.DataFrame({"question": questions_list})

In [None]:
dataset.head()

In [None]:
# Using the chain and capturing its output
dataset["answer"] = dataset["question"].apply(chain.run)

In [None]:
dataset.head()

<a id="download-model-output">**Run the cell below if you didn't want to make the LLM requests:**</a>

In [None]:
%%bash

if [ ! -e "python_questions_and_answers.csv" ]; then
    curl "https://openlayer-static-assets.s3.us-west-2.amazonaws.com/examples-datasets/llms/langchain/python_questions_and_answers.csv" --output "python_questions_and_answers.csv"
fi

In [None]:
import pandas as pd

dataset = pd.read_csv("python_questions_and_answers.csv")

dataset.head()

## <a id="upload">4. Uploading to the Openlayer platform </a>

[Back to top](#top)

Now it's time to upload the datasets and model to the Openlayer platform.

In [None]:
!pip install openlayer

### <a id="client">Instantiating the client</a>

In [None]:
import openlayer

client = openlayer.OpenlayerClient("YOUR_API_KEY_HERE")

### <a id="project">Creating a project on the platform</a>

In [None]:
from openlayer.tasks import TaskType

project = client.create_or_load_project(
    name="QA with LangChain",
    task_type=TaskType.LLM,
    description="Evaluating an LLM that answers Python questions."
)

### <a id="dataset">Uploading datasets</a>

Before adding the datasets to a project, we need to do Prepare a `dataset_config`.  

This is a Python dictionary that contains all the information needed by the Openlayer platform to utilize the dataset. It should include the column names, the input variable names, etc. For details on the `dataset_config` items, see the [API reference](https://reference.openlayer.com/reference/api/openlayer.OpenlayerClient.add_dataset.html#openlayer.OpenlayerClient.add_dataset).

Let's prepare the `dataset_config` for our validation set:

In [None]:
# Some variables that will go into the `dataset_config`
input_variable_names = ["question"]
output_column_name = "answer"

In [None]:
validation_dataset_config = {
    "inputVariableNames": input_variable_names,
    "label": "validation",
    "outputColumnName": output_column_name,
}

In [None]:
# Validation set
project.add_dataframe(
    dataset_df=dataset,
    dataset_config=validation_dataset_config,
)

We can confirm that the validation set is now staged using the `project.status()` method. 

In [None]:
project.status()

### <a id="model">Uploading models</a>

When it comes to uploading models to the Openlayer platform, there are a few options.

In our case, since we're using LangChain, we'll follow the **shell model** route.

Shell models are the most straightforward way to get started. They are comprised of metadata and all the analysis is done via their predictions (which are [uploaded with the datasets](#dataset), in the `outputColumnName`).

To upload a shell model, we only need to prepare its `model_config` Python dictionary.

Let's create a `model_config` for our model:

In [None]:
# Useful variable that will also go into our config
template = """You are a helpful assistant who answers user's questions about Python.
A user will pass in a question, and you should answer it very objectively.
Use AT MOST 5 sentences. If you need more than 5 sentences to answer, say that the
user should make their question more objective."""

In [None]:
# Note the camelCase for the keys
model_config = {
    "inputVariableNames": ["question"],
    "modelType": "shell",
    "prompt": [ # Optionally log the prompt, following the same format as OpenAI
        {"role": "system", "content": template}, 
        {"role": "user", "content": "{question}"}
    ], 
    "metadata": {  # Can add anything here, as long as it is a dict
        "output_parser": None,
        "vector_db_used": False,
        "temperature": 0
    }
}

In [None]:
# Adding the model
project.add_model(
    model_config=model_config
)

We can confirm that both the model and the validation set are now staged using the `project.status()` method. 

In [None]:
project.status()

### <a id="commit"> Committing and pushing to the platform </a>

Finally, we can commit the first project version to the platform. 

In [None]:
project.commit("Initial commit!")

In [None]:
project.status()

In [None]:
project.push()