# Quickstart to Langchain with OpenAI
In this notebook, we'll look to get up & running with LangChain and OpenAI.

To use OpenAI API, you'll need to sign up for the API access, which is a paid service and acquire an API key. For instructions [see this link](). Generate a new API token and save it in a local `.env` file as
```
OPENAI_API_KEY=sk-XXXXXX
````
Replace `sk-XXXXXX` with your API key. **Don't share this with ANYONE**. Add `.env` to the `.gitignore` file, so that this file is never uploaded to GitHub (assuming you are using Git for version control).

In [1]:
import os
from dotenv import load_dotenv, find_dotenv

# load API keys from local .env file
load_dotenv(find_dotenv())

True

In [2]:
# instantiate the LLM instance
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-3.5-turbo",
    temperature=0.0,
)

In [3]:
# modify the question below & try
question = """
What is Deepika Padukone's birthday? Which country won the cricket world cup the year she was born?
"""
response = llm.invoke(question)
print(response.content)

Deepika Padukone's birthday is on January 5th. She was born in 1986. The cricket world cup in 1986 was won by India.


In [4]:
# let's us a prompt template instead
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a world class technical documentation writer."),
        ("user", "{input}"),
    ]
)

In [5]:
chain = prompt | llm
response = chain.invoke({"input": "How can RAG help LLMs with Q&A?"})
print(response.content)

RAG (Red, Amber, Green) status indicators can be a helpful tool for LLMs (Large Language Models) when dealing with Q&A (Question and Answer) tasks. Here's how RAG can assist LLMs with Q&A:

1. **Prioritization**: By using RAG indicators, LLMs can prioritize their responses based on the urgency or importance of the questions. Questions marked as red (critical) can be addressed first, followed by amber (important) and green (less urgent) questions.

2. **Quality Control**: LLMs can use RAG indicators to assess the quality of their responses. Red flags can indicate potential inaccuracies or gaps in information that need to be addressed before providing the final answer.

3. **Efficiency**: RAG can help LLMs streamline their workflow by quickly identifying which questions require immediate attention and which ones can be addressed later. This can improve efficiency and ensure that time is spent on the most critical tasks.

4. **Feedback Loop**: By using RAG indicators, LLMs can track their

In [6]:
from langchain_core.output_parsers import StrOutputParser

output_parser = StrOutputParser()

In [7]:
chain2 = prompt | llm | output_parser

In [8]:
response = chain.invoke({"input": "How can RAG help LLMs with Q&A?"})
print(response.content)

RAG (Red, Amber, Green) status indicators can be a helpful tool for LLMs (Large Language Models) when dealing with Q&A (Question and Answer) tasks. Here's how RAG can assist LLMs with Q&A:

1. **Red**: When a question is marked as "Red," it indicates that the LLM has low confidence in the answer provided. This can prompt the LLM to either seek more information or provide a disclaimer that the answer may not be accurate.

2. **Amber**: An "Amber" status signifies moderate confidence in the answer. The LLM can use this as a signal to double-check the information before presenting it as a final response.

3. **Green**: A "Green" status indicates high confidence in the answer provided by the LLM. This can give assurance to the user that the information is likely to be accurate.

By using RAG status indicators, LLMs can better manage the quality of their responses during Q&A tasks, leading to more reliable and trustworthy information being provided to users.


Now let us query a website for the ICC Cricket World Cup 1987, which was co-hosted by India & Pakistan, and ask some questions. By default, our LLM _may not_ have this knowledge. So we will help it by adding additional context to our query using RAG.
In this process, we'll look up relevant documents/information using a _Retriever_ class, which is then passed to our LLM as additional context. We'll populate a vector store (local storage) and use that as a retriever.

Please ensure that you have installed `beautifulsoup4` to help parse data from a webpage.
```bash
$> pip install beautifulsoup4
```

In [9]:
from langchain_community.document_loaders import WebBaseLoader

# load the documents - this code fragment just loads the documents. It
# does not vectorize anything
world_cup_url = "https://en.wikipedia.org/wiki/1987_Cricket_World_Cup"
loader = WebBaseLoader(world_cup_url)

docs = loader.load()

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [10]:
# here is the code to embed the web page into a local vector store
import os, pathlib
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter

# create directory to save vector store
save_path = pathlib.Path(os.getcwd()) / "vecstore"
save_path.mkdir(parents=True, exist_ok=True)
vec_store_path = save_path / "faiss_index"

embeddings = OpenAIEmbeddings()

if vec_store_path.exists():
    # if already created, just load it
    print("Loading from local vector store")
    vector = FAISS.load_local(
        str(vec_store_path), embeddings, allow_dangerous_deserialization=True
    )
else:
    # create & save it
    print("Creating new vector store from URL...")
    text_splitter = RecursiveCharacterTextSplitter()
    documents = text_splitter.split_documents(docs)
    vector = FAISS.from_documents(documents, embeddings)
    vector.save_local(str(vec_store_path))
    print(f"Vector store saved to {str(vec_store_path)}")

Loading from local vector store


In [13]:
from langchain.chains.combine_documents import create_stuff_documents_chain

prompt_wc = ChatPromptTemplate.from_template(
    """Answer the following question based only on the provided context:

<context>
{context}
</context>

Question: {input}"""
)

document_chain = create_stuff_documents_chain(llm, prompt_wc)

# create a retrieval chain to get the answer
from langchain.chains import create_retrieval_chain

retriever = vector.as_retriever()
retrieval_chain = create_retrieval_chain(retriever, document_chain)

In [14]:
# modify the question below & try
wc_question = """
How many groups were the participating countries divided into. In which group did India play 
and who were the other countries in that group?
"""
response = retrieval_chain.invoke({"input": wc_question})
print(response["answer"])

The participating countries were divided into two groups. India played in Group A along with Australia, New Zealand, and Zimbabwe.


In [16]:
wc_question2 = """
With which countries did England play in the Group B matches? What were the results of all 
those matches?
"""
response = retrieval_chain.invoke({"input": wc_question2})
print(response["answer"])

England played against Pakistan, West Indies, and Sri Lanka in the Group B matches. The results of those matches were as follows:
- England vs Pakistan: Australia won by 18 runs
- England vs West Indies: England won
- England vs Sri Lanka: England won


In [17]:
wc_question3 = """
Did the West Indies qualify for the semi finals? Where were the semi-finals and finals played. 
Who were the teams that played the semi-finals and finals? Who won the finals and by how many runs?
"""
response = retrieval_chain.invoke({"input": wc_question3})
print(response["answer"])

No, the West Indies did not qualify for the semi-finals. The semi-finals and finals were played in Lahore, Calcutta, and Bombay, India. The teams that played in the semi-finals were Australia, Pakistan, England, and India. Australia won the finals against England by seven runs.


In [18]:
wc_question4 = """
Who were the two leading run scorers and the two leading wicket takers?
"""
response = retrieval_chain.invoke({"input": wc_question4})
print(response["answer"])

The two leading run scorers were Graham Gooch from England with 471 runs and David Boon from Australia with 447 runs. The two leading wicket takers were Craig McDermott from Australia with 18 wickets and Imran Khan from Pakistan with 17 wickets.
