## Covered Topics

In [30]:
%pip install langchain-community langchain_chroma langchainhub

Collecting langchainhub
  Using cached langchainhub-0.1.15-py3-none-any.whl (4.6 kB)
Collecting types-requests<3.0.0.0,>=2.31.0.2
  Downloading types_requests-2.32.0.20240521-py3-none-any.whl (15 kB)


  Using cached types_requests-2.31.0.20240406-py3-none-any.whl (15 kB)
  Using cached types_requests-2.31.0.20240403-py3-none-any.whl (15 kB)
  Using cached types_requests-2.31.0.20240402-py3-none-any.whl (15 kB)
  Using cached types_requests-2.31.0.20240311-py3-none-any.whl (14 kB)
  Using cached types_requests-2.31.0.20240310-py3-none-any.whl (14 kB)
  Using cached types_requests-2.31.0.20240218-py3-none-any.whl (14 kB)
  Using cached types_requests-2.31.0.20240125-py3-none-any.whl (14 kB)
  Using cached types_requests-2.31.0.20240106-py3-none-any.whl (14 kB)
  Using cached types_requests-2.31.0.20231231-py3-none-any.whl (14 kB)
  Using cached types_requests-2.31.0.10-py3-none-any.whl (14 kB)
  Using cached types_requests-2.31.0.9-py3-none-any.whl (14 kB)
  Using cached types_requests-2.31.0.8-py3-none-any.whl (14 kB)
  Using cached types_requests-2.31.0.7-py3-none-any.whl (14 kB)
  Using cached types_requests-2.31.0.6-py3-none-any.whl (14 kB)
Collecting types-urllib3
  Using cached 

Installing collected packages: types-urllib3, types-requests, langchainhub
Successfully installed langchainhub-0.1.15 types-requests-2.31.0.6 types-urllib3-1.26.25.14
Note: you may need to restart the kernel to use updated packages.


In [5]:
import os
from langchain_openai import ChatOpenAI
from langchain.document_loaders import TextLoader
from langchain import hub
from langchain_chroma import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter

## RAG: what is it?

One of the **limits** of the LLMs is that **they have a knowledge cutoff**. This means that these models can only provide information and answer questions based on the data they were trained on, up to a certain date. After this cutoff, they cannot update their knowledge without further training or integration with up-to-date data sources.

### Introducing RAG: Retrieval-Augmented Generation

To address this limitation, a method known as Retrieval-Augmented Generation (RAG) has been developed. RAG is a technique that combines the power of language models with external information retrieval systems. This allows the model to access a broader range of up-to-date information during its operations.

#### How RAG Works

1. **Retrieval Step**: When a question or a prompt is given to a RAG system, it first retrieves relevant documents or data snippets from a large external database. This database can be continuously updated with new information, allowing the RAG system to access current data.

2. **Generation Step**: The retrieved information is then fed into a language model, such as a transformer-based LLM. The model uses both its pre-trained knowledge and the newly retrieved information to generate a response that is informed and up-to-date.

#### Retrieval step

Here's a breakdown of the retrieval step: 

- **Load**: Data, such as text files (JSON, URLs, etc.), is loaded into the system.
- **Split**: The loaded data is then split into manageable pieces or segments that are easier to process.
- **Embed**: Each piece of data is transformed into a numerical format, known as embeddings, which represent the semantic content of the data.
- **Store**: These embeddings are stored in a structured format, enabling the system to efficiently search and retrieve relevant information based on the query.

<img src="assets/rag-architecture.png" />


In [17]:
loader = TextLoader('data/dialog.txt')
documents = loader.load()

documents

[Document(page_content="At the clothing store\n\nSalesperson - Good morning, can I help you?\nCustomer - Yes, thank you. I saw a pair of black trousers in the window, can I try them on?\nSalesperson - Of course, what size?\nCustomer - I wear a size 50.\nSalesperson - Here they are!\nCustomer - Where are the fitting rooms?\nSalesperson - The fitting rooms are down there on the right, next to the stairs.\nCustomer - Perfect, thank you.\n\nSalesperson - How are they?\nCustomer - I like the style, but they're a bit tight. Can I try a larger size?\nSalesperson - Oh, I'm sorry, we're out of size 52 in this color, would you like to try the same model in brown?\nCustomer - No, thank you, I really don't like brown. Don't you have other colors in this size?\nSalesperson - Well, in size 52 we have brown, red, and gray.\nCustomer - Alright, I'll try them in gray, let's see how they fit.\n\nCustomer - I tried on the trousers, they are really nice in gray and the size is perfect! How much are they?\

In [18]:
llm = ChatOpenAI(
    model="gpt-3.5-turbo-0125", 
    openai_api_key=""
)

## Recursive Splitting with overlap

TBD

In [22]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=50, # use 200
    chunk_overlap=10 # use 33
)

splits = text_splitter.split_documents(documents)

splits

[Document(page_content='At the clothing store', metadata={'source': 'data/dialog.txt'}),
 Document(page_content='Salesperson - Good morning, can I help you?', metadata={'source': 'data/dialog.txt'}),
 Document(page_content='Customer - Yes, thank you. I saw a pair of black', metadata={'source': 'data/dialog.txt'}),
 Document(page_content='of black trousers in the window, can I try them', metadata={'source': 'data/dialog.txt'}),
 Document(page_content='try them on?', metadata={'source': 'data/dialog.txt'}),
 Document(page_content='Salesperson - Of course, what size?', metadata={'source': 'data/dialog.txt'}),
 Document(page_content='Customer - I wear a size 50.', metadata={'source': 'data/dialog.txt'}),
 Document(page_content='Salesperson - Here they are!', metadata={'source': 'data/dialog.txt'}),
 Document(page_content='Customer - Where are the fitting rooms?', metadata={'source': 'data/dialog.txt'}),
 Document(page_content='Salesperson - The fitting rooms are down there on', metadata={'

## Vectorization

TBD

In [24]:
vectorstore = Chroma.from_documents(
    documents=splits, 
    embedding=OpenAIEmbeddings(openai_api_key="")
)

In [25]:
vectorstore

<langchain_chroma.vectorstores.Chroma at 0x119090310>

In [28]:
vectorstore.embeddings

OpenAIEmbeddings(client=<openai.resources.embeddings.Embeddings object at 0x109d9ca90>, async_client=<openai.resources.embeddings.AsyncEmbeddings object at 0x119091fc0>, model='text-embedding-ada-002', dimensions=None, deployment='text-embedding-ada-002', openai_api_version='', openai_api_base=None, openai_api_type='', openai_proxy='', embedding_ctx_length=8191, openai_api_key=SecretStr('**********'), openai_organization=None, allowed_special=None, disallowed_special=None, chunk_size=1000, max_retries=2, request_timeout=None, headers=None, tiktoken_enabled=True, tiktoken_model_name=None, show_progress_bar=False, model_kwargs={}, skip_empty=False, default_headers=None, default_query=None, retry_min_seconds=4, retry_max_seconds=20, http_client=None, http_async_client=None, check_embedding_ctx_length=True)

In [31]:
retriever = vectorstore.as_retriever()
prompt = hub.pull("rlm/rag-prompt")

In [32]:
prompt

ChatPromptTemplate(input_variables=['context', 'question'], metadata={'lc_hub_owner': 'rlm', 'lc_hub_repo': 'rag-prompt', 'lc_hub_commit_hash': '50442af133e61576e74536c6556cefe1fac147cad032f4377b60c436e6cdcb6e'}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:"))])

In [35]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [38]:
rag_chain.invoke("what is the size of the customer?")

'The size of the customer is 50.'

In [39]:
rag_chain.invoke("what did the customer buy?")

'The customer bought items that were on sale and paid for them with a credit card.'