[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/openlayer-ai/examples-gallery/blob/main/development/llms/langchain/question-answering-with-context/web_retrieval.ipynb)


# <a id="top">Using a LangChain chain to retrieve information from Wikipedia</a>

This notebook illustrates how a LangChain chain that retrieves information from Wikipedia to answer questions can be uploaded to the Openlayer platform.

## <a id="toc">Table of contents</a>

1. [**Problem statement**](#problem) 

2. [**Constructing the chain**](#chain)

3. [**Constructing the dataset**](#dataset-output)

2. [**Uploading to the Openlayer platform**](#upload)
    - [Instantiating the client](#client)
    - [Creating a project](#project)
    - [Uploading datasets](#dataset)
    - [Uploading models](#model)
    - [Committing and pushing to the platform](#commit)

In [None]:
%%bash

if [ ! -e "requirements.txt" ]; then
    curl "https://raw.githubusercontent.com/openlayer-ai/examples-gallery/main/development/llms/langchain/question-answering-with-context/requirements.txt" --output "requirements.txt"
fi

In [None]:
!pip install -r requirements.txt

## <a id="problem">1. Problem statement </a>

[Back to top](#top)


In this notebook, we will create a LangChain chain that retrieves relevant context from a Wikepedia article to answer questions.

Then, we will use it to construct a dataset, and, finally, upload it to the Openlayer platform to evaluate the LLM's performance.

## <a id="chain">2. Constructing a web retrieval class </a>

[Back to top](#top)


### Imports and OpenAI setup

In [None]:
import os
import pandas as pd

from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders.web_base import WebBaseLoader
from langchain.indexes import VectorstoreIndexCreator

In [None]:
os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY_HERE"

### Defining the class

In [None]:
from typing import Dict


class BasicLangChainWebReader:
    """
    Read web content and process the text for conversational purposes.
    """

    def __init__(self, url: str):
        """
        Initialize the reader with a URL.
        """
        self.url = url
        vectorstore = self._get_vectorstore_from_url()
        self.qa_chain = self._get_qa_chain(vectorstore)

    def ask(self, query: str) -> Dict[str, str]:
        """
        Ask a question related to the content of the web page.
        """
        result = self.qa_chain({"query": query})
        answer = result.get("result")
        contexts = []
        for document in result["source_documents"]:
            if isinstance(document, dict):
                contexts.append(document["page_content"])
            else:
                contexts.append(document.page_content)
        
        return {
            "answer": answer,
            "context": contexts
        }

    def _get_vectorstore_from_url(self):
        """
        Load the web page and create a vectorstore index.
        """
        loader = WebBaseLoader([self.url])
        index = VectorstoreIndexCreator().from_loaders([loader])
        return index.vectorstore

    def _get_qa_chain(self, vectorstore):
        """
        Create a QA chain from the vector store.
        """
        llm = ChatOpenAI()
        return RetrievalQA.from_chain_type(
            llm, retriever=vectorstore.as_retriever(), return_source_documents=True
        )

### Using the web reader

In [None]:
web_reader = BasicLangChainWebReader("https://en.wikipedia.org/wiki/Apple_Inc.")

In [None]:
response = web_reader.ask("Who are the founders of Apple?")

In [None]:
print(f"Answer: {response['answer']} \n\nContext: {response['context']}")

## <a id="dataset-output">3. Constructing the dataset </a>

[Back to top](#top)


Now, let's say we have a list of questions that our chain can answer. Let's use the chain we created and capture its output to construct a dataset.

**This assumes you have a valid OpenAI API key and are willing to use it.** **If you prefer not to make the LLM requests**, you can [skip to this cell and download the resulting dataset with the model outputs if you'd like](#download-model-output).

In [None]:
questions_and_answers = [
    ["Who is the founder of Apple?", "Steve Jobs, Steve Wozniak, and Ronald Wayne"],
    ["When was Apple founded?", "April 1, 1976"],
    ["what is Apple's mission?", "Apple's mission statement is “to create technology that empowers people and enriches their lives.”"],
    ["what was apple's first product", "The company's first product was the Apple I"],
    ["When did apple go public", "December 12, 1980"]
   ]

In [None]:
dataset = pd.DataFrame(questions_and_answers, columns=['query', 'ground_truth'])

In [None]:
dataset.head()

In [None]:
answers_and_contexts = dataset["query"].apply(lambda x: pd.Series(web_reader.ask(x)))

In [None]:
dataset["answer"] = answers_and_contexts["answer"]
dataset["context"] = answers_and_contexts["context"]

In [None]:
dataset.head()

<a id="download-model-output">**Run the cell below if you didn't want to make the LLM requests:**</a>

In [None]:
%%bash

if [ ! -e "answers_and_contexts.csv" ]; then
    curl "https://openlayer-static-assets.s3.us-west-2.amazonaws.com/examples-datasets/llms/langchain/answers_and_contexts.csv" --output "answers_and_contexts.csv"
fi

In [None]:
dataset = pd.read_csv("answers_and_contexts.csv")

dataset.head()

## <a id="upload">4. Uploading to the Openlayer platform </a>

[Back to top](#top)

Now it's time to upload the datasets and model to the Openlayer platform.

In [None]:
!pip install openlayer

### <a id="client">Instantiating the client</a>

In [None]:
import openlayer

client = openlayer.OpenlayerClient("YOUR_API_KEY_HERE")

### <a id="project">Creating a project on the platform</a>

In [None]:
from openlayer.tasks import TaskType

project = client.create_or_load_project(
    name="Web Retrieval with LangChain",
    task_type=TaskType.LLM,
    description="Evaluating an LLM that retrieves data from Wikipedia."
)

### <a id="dataset">Uploading datasets</a>

Before adding the datasets to a project, we need to do Prepare a `dataset_config`.  

This is a Python dictionary that contains all the information needed by the Openlayer platform to utilize the dataset. It should include the column names, the input variable names, etc. For details on the `dataset_config` items, see the [API reference](https://reference.openlayer.com/reference/api/openlayer.OpenlayerClient.add_dataset.html#openlayer.OpenlayerClient.add_dataset).

Let's prepare the `dataset_config` for our validation set:

In [None]:
validation_dataset_config = {
    "contextColumnName": "context",
    "questionColumnName": "query",
    "inputVariableNames": ["query", "context"],
    "label": "validation",
    "groundTruthColumnName": "ground_truth",
    "outputColumnName": "answer",
    
}

In [None]:
# Validation set
project.add_dataframe(
    dataset_df=df,
    dataset_config=validation_dataset_config,
)

We can confirm that the validation set is now staged using the `project.status()` method. 

In [None]:
project.status()

### <a id="model">Uploading models</a>

When it comes to uploading models to the Openlayer platform, there are a few options.

In our case, since we're using LangChain, we'll follow the **shell model** route.

Shell models are the most straightforward way to get started. They are comprised of metadata and all the analysis is done via their predictions (which are [uploaded with the datasets](#dataset), in the `outputColumnName`).

To upload a shell model, we only need to prepare its `model_config` Python dictionary.

Let's create a `model_config` for our model:

In [None]:
# Note the camelCase for the keys
model_config = {
    "inputVariableNames": ["query", "context"],
    "modelType": "shell",
    "metadata": {  # Can add anything here, as long as it is a dict
        "output_parser": None,
        "vector_db_used": False,
        "temperature": 0
    }
}

In [None]:
# Adding the model
project.add_model(
    model_config=model_config
)

We can confirm that both the model and the validation set are now staged using the `project.status()` method. 

In [None]:
project.status()

### <a id="commit"> Committing and pushing to the platform </a>

Finally, we can commit the first project version to the platform. 

In [None]:
project.commit("Initial commit!")

In [None]:
project.status()

In [None]:
project.push()