# LangChain RAG 201

Tentative techniques to explore:

1. MultiQueryRetriever
2. Contextual Compression
3. Ensemble Retrievers
4. Self-quering Retrievers
5. Time weighted vector store retrievers

In [1]:
%pip install --q langchain-community
%pip install --q tiktoken
%pip install --q chromadb
%pip install --q langchain
%pip install --q pysqlite3-binary
%pip install --q typing-inspect==0.8.0 typing_extensions==4.5.0
%pip install --q pydantic==1.10.8

[0mNote: you may need to restart the kernel to use updated packages.
[0mNote: you may need to restart the kernel to use updated packages.
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
spacy 3.4.1 requires pydantic!=1.8,!=1.8.1,<1.10.0,>=1.7.4, but you have pydantic 1.10.8 which is incompatible.
spacy 3.4.1 requires typer<0.5.0,>=0.3.0, but you have typer 0.9.0 which is incompatible.[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.
[0mNote: you may need to restart the kernel to use updated packages.
[0mNote: you may need to restart the kernel to use updated packages.
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
spacy 3.4.1 requires pydantic!=1.8,!=1.8.1,<1.10.0,>=1.7.4, but you have pydant

In [2]:
!python --version
!nvidia-smi

Python 3.9.16
Sun Feb 25 07:15:32 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.116.04   Driver Version: 525.116.04   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA RTX A6000    Off  | 00000000:00:05.0 Off |                  Off |
| 30%   36C    P8    24W / 300W |    607MiB / 49140MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-------------------------------------------------------------------------

### Configure Sqlite

In [3]:
__import__('pysqlite3')
import sys
sys.modules['sqlite3'] = sys.modules.pop('pysqlite3')

In [5]:
# %> curl -fsSL https://ollama.com/install.sh | sh
# %> ollama serve
# %> ollama pull gemma:7b-instruct
# %> ollama pull nomic-embed-text
# %> ollama pull mistral:instruct
# %> ollama pull mixtral:instruct

!ollama list

NAME                   	ID          	SIZE  	MODIFIED          
gemma:7b-instruct      	430ed3535049	5.2 GB	About an hour ago	
mistral:instruct       	61e88e884507	4.1 GB	34 minutes ago   	
mixtral:instruct       	7708c059a8bb	26 GB 	12 seconds ago   	
nomic-embed-text:latest	0a109f422b47	274 MB	About an hour ago	


## Experiment Configuration

In [6]:
# LLM_MODEL = "gemma:7b-instruct"
# LLM_MODEL = "mistral:instruct"
LLM_MODEL = "mixtral:instruct"
EMBEDDING_MODEL = "nomic-embed-text"
TEMPERATURE = 0.9
ENABLE_TRACING = False
### Gemma
# DOCUMENT_CHUNK_SIZE=5000
###
### Mistral/Mixtral
DOCUMENT_CHUNK_SIZE = 7500
###
CHUNK_OVERLAP = 100

### Test LLM generation

In [7]:
from langchain.llms import Ollama
# from langchain.callbacks.manager import CallbackManager
# from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

llm = Ollama(
    model=LLM_MODEL,
    #callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]),
    temperature=TEMPERATURE,
)

llm("Who are you?")

  warn_deprecated(


' I am a large language model trained by Mistral AI. I was designed to be able to generate human-like text based on the input I receive. My purpose is to provide high-quality and accurate information, while making sure that the content is safe, respectful and not harmful.'

In [8]:
# Optional: LangSmith API keys
import os
import getpass

os.environ["LANGCHAIN_TRACING_V2"] = str(ENABLE_TRACING)
if ENABLE_TRACING:
    os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
    os.environ["LANGCHAIN_API_KEY"] = getpass.getpass("LANGCHAIN_API_KEY")

### Embedding

In [9]:
from langchain_community.embeddings import OllamaEmbeddings

embeddings_nomic = OllamaEmbeddings(model=EMBEDDING_MODEL)
text = "Embed this text"
embed = embeddings_nomic.embed_query(text)
len(embed)

768

In [10]:
from langchain_community.document_loaders import WebBaseLoader

urls = [
    "https://lilianweng.github.io/posts/2023-06-23-agent/",
    "https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/",
    "https://lilianweng.github.io/posts/2023-10-25-adv-attack-llm/",
]

docs = [WebBaseLoader(url).load() for url in urls]
doc_list = [item for sublist in docs for item in sublist]

## Splitting

In [11]:
from langchain.text_splitter import CharacterTextSplitter

text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=DOCUMENT_CHUNK_SIZE, chunk_overlap=CHUNK_OVERLAP
)
doc_splits = text_splitter.split_documents(doc_list)

In [12]:
import tiktoken

encoding = tiktoken.get_encoding("cl100k_base")
encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")
for d in doc_splits:
    print("The document is %s tokens" % len(encoding.encode(d.page_content)))

The document is 6562 tokens
The document is 3037 tokens
The document is 6092 tokens
The document is 1050 tokens
The document is 6933 tokens
The document is 5570 tokens


## Index

In [13]:
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableLambda, RunnablePassthrough

vector_store = Chroma.from_documents(
    documents=doc_splits,
    collection_name="rag-chroma",
    embedding=embeddings_nomic,
)
retriever = vector_store.as_retriever()

In [14]:
retriever.get_relevant_documents("What is task decomposition")

[Document(page_content='LLM Powered Autonomous Agents | Lil\'Log\n\nLil\'Log\n\n\nPosts\n\n\nArchive\n\n\nSearch\n\n\nTags\n\n\nFAQ\n\n\nemojisearch.app\n\n      LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\n \n\n\nTable of Contents\n\nAgent System Overview\n\nComponent One: Planning\n\nTask Decomposition\n\nSelf-Reflection\n\n\nComponent Two: Memory\n\nTypes of Memory\n\nMaximum Inner Product Search (MIPS)\n\n\nComponent Three: Tool Use\n\nCase Studies\n\nScientific Discovery Agent\n\nGenerative Agents Simulation\n\nProof-of-Concept Examples\n\n\nChallenges\n\nCitation\n\nReferences\n\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful gene

### Add to vector db

In [15]:
from langchain_community.chat_models import ChatOllama
from langchain_core.prompts import ChatPromptTemplate

template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

model_local = ChatOllama(model=LLM_MODEL)

chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | model_local
    | StrOutputParser()
)


## Prompt Queries

In [18]:
chain.invoke("What is Task Decomposition?")

" Task decomposition is a technique used in natural language processing (NLP) to break down a complex task into smaller, more manageable subtasks. This approach allows a language model to solve a problem step-by-step, making it easier for the model to learn and generalize from the training data.\n\nIn NLP, task decomposition can be applied in various ways, such as:\n\n* Breaking down a complex question into simpler subquestions.\n* Decomposing a multi-step reasoning problem into a sequence of smaller reasoning steps.\n* Dividing a large text corpus into smaller chunks for more efficient processing.\n\nTask decomposition can help improve the performance and scalability of language models, making them more effective at solving complex tasks. It is often used in combination with other techniques, such as prompt engineering and transfer learning, to further enhance the model's abilities."