# Retrieval Augmented Generation
In this notebook, we'll learn how to do [Retrieval Augmented Generation](https://www.promptingguide.ai/techniques/rag) with ChromaDB and OpenAI to answer questions about the Wimbledon 2023 tennis tournament

## Load data from Wikipedia
We're going to first extract data from the Wimbledon 2023 Wikipedia page.

In [242]:
from langchain.document_loaders import WikipediaLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [243]:
search_term = "2023 Wimbledon Championships"
docs = WikipediaLoader(query=search_term, load_max_docs=1).load()

In [244]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 100,
    chunk_overlap  = 20,
    length_function = len,
    is_separator_regex = False,
)

data = text_splitter.split_documents(docs)
data[:3]

[Document(page_content='The 2023 Wimbledon Championships was a Grand Slam tennis tournament that took place at the All', metadata={'title': '2023 Wimbledon Championships', 'summary': 'The 2023 Wimbledon Championships was a Grand Slam tennis tournament that took place at the All England Lawn Tennis and Croquet Club in Wimbledon, London, United Kingdom.', 'source': 'https://en.wikipedia.org/wiki/2023_Wimbledon_Championships'}),
 Document(page_content='place at the All England Lawn Tennis and Croquet Club in Wimbledon, London, United Kingdom.', metadata={'title': '2023 Wimbledon Championships', 'summary': 'The 2023 Wimbledon Championships was a Grand Slam tennis tournament that took place at the All England Lawn Tennis and Croquet Club in Wimbledon, London, United Kingdom.', 'source': 'https://en.wikipedia.org/wiki/2023_Wimbledon_Championships'}),
 Document(page_content='== Tournament ==', metadata={'title': '2023 Wimbledon Championships', 'summary': 'The 2023 Wimbledon Championships was 

## Storing embeddings in ChromaDB
Next, let's store those chunks of text as embeddings in ChromaDB

In [245]:
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings

In [246]:
embeddings = OpenAIEmbeddings()

In [247]:
store = Chroma.from_documents(
    data, 
    embeddings, 
    ids = [f"{item.metadata['source']}-{index}" for index, item in enumerate(data)],
    collection_name="Wimbledon-Embeddings", 
persist_directory='db',
)
store.persist()

## Asking questions about Wimbledon 2023
Now let's use OpenAI, augmented by ChromaDB, to ask some questions about the tournament.

In [248]:
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from langchain.chat_models import ChatOpenAI
import pprint

In [249]:
template = """You are a bot that answers questions about Wimbledon 2023, using only the context provided.
If you don't know the answer, simply state that you don't know.

{context}

Question: {question}"""

PROMPT = PromptTemplate(
    template=template, input_variables=["context", "question"]
)

In [250]:
llm = ChatOpenAI(temperature=0, model="gpt-4")

In [251]:
qa_with_source = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=store.as_retriever(),
    chain_type_kwargs={"prompt": PROMPT, },
    return_source_documents=True,
)

In [252]:
pprint.pprint(
    qa_with_source("When and where was Wimbledon 2023 held?")
)

{'query': 'When and where was Wimbledon 2023 held?',
 'result': 'Wimbledon 2023 was held at the All England Lawn Tennis and Croquet '
           'Club in Wimbledon, London, United Kingdom from 3 to 16 July 2023.',
 'source_documents': [Document(page_content='at the All England Lawn Tennis and Croquet Club, Wimbledon, from 3 to 16 July 2023. Qualifying', metadata={'source': 'https://en.wikipedia.org/wiki/2023_Wimbledon_Championships', 'summary': 'The 2023 Wimbledon Championships was a Grand Slam tennis tournament that took place at the All England Lawn Tennis and Croquet Club in Wimbledon, London, United Kingdom.', 'title': '2023 Wimbledon Championships'}),
                      Document(page_content='2023. Qualifying matches were played from 26 to 29 June 2023 at the Bank of England Sports Ground', metadata={'source': 'https://en.wikipedia.org/wiki/2023_Wimbledon_Championships', 'summary': 'The 2023 Wimbledon Championships was a Grand Slam tennis tournament that took place at the All E

In [253]:
pprint.pprint(
    qa_with_source("Who won the mens' singles title and what was the score?")
)

{'query': "Who won the mens' singles title and what was the score?",
 'result': "Carlos Alcaraz won the men's singles title. The score was 1–6, "
           '7–6(8–6), 6–1, 3–6, 6–4.',
 'source_documents': [Document(page_content="Carlos Alcaraz def.  Novak Djokovic 1–6, 7–6(8–6), 6–1, 3–6, 6–4\n\n\n=== Ladies' singles ===", metadata={'source': 'https://en.wikipedia.org/wiki/2023_Wimbledon_Championships', 'summary': 'The 2023 Wimbledon Championships was a Grand Slam tennis tournament that took place at the All England Lawn Tennis and Croquet Club in Wimbledon, London, United Kingdom.', 'title': '2023 Wimbledon Championships'}),
                      Document(page_content="== Events ==\n\n\n=== Gentlemen's singles ===", metadata={'source': 'https://en.wikipedia.org/wiki/2023_Wimbledon_Championships', 'summary': 'The 2023 Wimbledon Championships was a Grand Slam tennis tournament that took place at the All England Lawn Tennis and Croquet Club in Wimbledon, London, United Kingdom.', 'title

In [254]:
pprint.pprint(
    qa_with_source("Were Russian players allowed to play?")
)

{'query': 'Were Russian players allowed to play?',
 'result': 'The context provided does not specify whether Russian players were '
           'allowed to play in Wimbledon 2023.',
 'source_documents': [Document(page_content='players, after they were banned from the previous edition due to the Russian invasion of Ukraine.', metadata={'source': 'https://en.wikipedia.org/wiki/2023_Wimbledon_Championships', 'summary': 'The 2023 Wimbledon Championships was a Grand Slam tennis tournament that took place at the All England Lawn Tennis and Croquet Club in Wimbledon, London, United Kingdom.', 'title': '2023 Wimbledon Championships'}),
                      Document(page_content='The tournament was played on grass courts, with all main draw matches played at the All England', metadata={'source': 'https://en.wikipedia.org/wiki/2023_Wimbledon_Championships', 'summary': 'The 2023 Wimbledon Championships was a Grand Slam tennis tournament that took place at the All England Lawn Tennis and Croquet C

In [255]:
pprint.pprint(
    qa_with_source("Did Russian players play?")
)

{'query': 'Did Russian players play?',
 'result': 'Yes, Russian players returned to play in Wimbledon 2023 after '
           'being banned from the previous edition.',
 'source_documents': [Document(page_content='players, after they were banned from the previous edition due to the Russian invasion of Ukraine.', metadata={'source': 'https://en.wikipedia.org/wiki/2023_Wimbledon_Championships', 'summary': 'The 2023 Wimbledon Championships was a Grand Slam tennis tournament that took place at the All England Lawn Tennis and Croquet Club in Wimbledon, London, United Kingdom.', 'title': '2023 Wimbledon Championships'}),
                      Document(page_content='Mate Pavić /  Lyudmyla Kichenok def.  Joran Vliegen /  Xu Yifan, 6–4, 6–7(9–11), 6–3', metadata={'source': 'https://en.wikipedia.org/wiki/2023_Wimbledon_Championships', 'summary': 'The 2023 Wimbledon Championships was a Grand Slam tennis tournament that took place at the All England Lawn Tennis and Croquet Club in Wimbledon, Londo

In [256]:
pprint.pprint(
    qa_with_source("Did British players play?")
)

{'query': 'Did British players play?',
 'result': 'The text does not provide information on whether British players '
           'participated in Wimbledon 2023.',
 'source_documents': [Document(page_content='players, after they were banned from the previous edition due to the Russian invasion of Ukraine.', metadata={'source': 'https://en.wikipedia.org/wiki/2023_Wimbledon_Championships', 'summary': 'The 2023 Wimbledon Championships was a Grand Slam tennis tournament that took place at the All England Lawn Tennis and Croquet Club in Wimbledon, London, United Kingdom.', 'title': '2023 Wimbledon Championships'}),
                      Document(page_content='The tournament was played on grass courts, with all main draw matches played at the All England', metadata={'source': 'https://en.wikipedia.org/wiki/2023_Wimbledon_Championships', 'summary': 'The 2023 Wimbledon Championships was a Grand Slam tennis tournament that took place at the All England Lawn Tennis and Croquet Club in Wimbledon,

In [257]:
pprint.pprint(
    qa_with_source("Were any extra events held?")
)

{'query': 'Were any extra events held?',
 'result': 'Yes, in addition to the tournament, a special event was held where '
           'Swiss former tennis player Roger Federer was honoured.',
 'source_documents': [Document(page_content='== Special events ==', metadata={'source': 'https://en.wikipedia.org/wiki/2023_Wimbledon_Championships', 'summary': 'The 2023 Wimbledon Championships was a Grand Slam tennis tournament that took place at the All England Lawn Tennis and Croquet Club in Wimbledon, London, United Kingdom.', 'title': '2023 Wimbledon Championships'}),
                      Document(page_content='In addition to the tournament taking place, Swiss former tennis player Roger Federer was honoured', metadata={'source': 'https://en.wikipedia.org/wiki/2023_Wimbledon_Championships', 'summary': 'The 2023 Wimbledon Championships was a Grand Slam tennis tournament that took place at the All England Lawn Tennis and Croquet Club in Wimbledon, London, United Kingdom.', 'title': '2023 Wimble