# LangChain RAG 201

Tentative techniques to explore:

1. MultiQueryRetriever
2. Contextual Compression
3. Ensemble Retrievers
4. Self-quering Retrievers
5. Time weighted vector store retrievers

In [2]:
%pip install --q langchain-community
%pip install --q tiktoken
%pip install --q chromadb
%pip install --q langchain
%pip install --q bs4
#################################
# Required for PaperSpace Gradient
# %pip install --q pysqlite3-binary
# %pip install --q typing-inspect==0.8.0 typing_extensions==4.5.0
# %pip install --q pydantic==1.10.8

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [3]:
!python --version
!nvidia-smi

Python 3.12.2
zsh:1: command not found: nvidia-smi


### Configure Sqlite

In [4]:
__import__('pysqlite3')
import sys
sys.modules['sqlite3'] = sys.modules.pop('pysqlite3')

ModuleNotFoundError: No module named 'pysqlite3'

In [5]:
# %> curl -fsSL https://ollama.com/install.sh | sh
# %> ollama serve
# %> ollama pull gemma:7b-instruct
# %> ollama pull nomic-embed-text
# %> ollama pull mistral:instruct
# %> ollama pull mixtral:instruct

!ollama list

NAME                   	ID          	SIZE  	MODIFIED     
mixtral:instruct       	7708c059a8bb	26 GB 	12 hours ago	
nomic-embed-text:latest	0a109f422b47	274 MB	12 hours ago	


## Experiment Configuration

In [6]:
# LLM_MODEL = "gemma:7b-instruct"
# LLM_MODEL = "mistral:instruct"
LLM_MODEL = "mixtral:instruct"
EMBEDDING_MODEL = "nomic-embed-text"
TEMPERATURE = 0.9
ENABLE_TRACING = False
### Gemma
# DOCUMENT_CHUNK_SIZE=5000
###
### Mistral/Mixtral
DOCUMENT_CHUNK_SIZE = 7500
###
CHUNK_OVERLAP = 100

### Test LLM generation

In [7]:
from langchain.llms import Ollama
# from langchain.callbacks.manager import CallbackManager
# from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

llm = Ollama(
    model=LLM_MODEL,
    #callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]),
    temperature=TEMPERATURE,
)

llm("Who are you?")

  warn_deprecated(


" I am a large language model trained by Mistral AI. I was designed to be able to assist with a wide range of tasks, from generating text on a given topic to answering questions and providing explanations. I don't have the ability to access personal data or external information, so my responses are based on the data I was trained on. I am intended to be a helpful and respectful conversational partner."

In [8]:
# Optional: LangSmith API keys
import os
import getpass

os.environ["LANGCHAIN_TRACING_V2"] = str(ENABLE_TRACING)
if ENABLE_TRACING:
    os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
    os.environ["LANGCHAIN_API_KEY"] = getpass.getpass("LANGCHAIN_API_KEY")

### Embedding

In [9]:
from langchain_community.embeddings import OllamaEmbeddings

embeddings_nomic = OllamaEmbeddings(model=EMBEDDING_MODEL)
text = "Embed this text"
embed = embeddings_nomic.embed_query(text)
len(embed)

768

In [10]:
from langchain_community.document_loaders import WebBaseLoader

urls = [
    "https://lilianweng.github.io/posts/2023-06-23-agent/",
    "https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/",
    "https://lilianweng.github.io/posts/2023-10-25-adv-attack-llm/",
]

docs = [WebBaseLoader(url).load() for url in urls]
doc_list = [item for sublist in docs for item in sublist]

## Splitting

In [11]:
from langchain.text_splitter import CharacterTextSplitter

text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=DOCUMENT_CHUNK_SIZE, chunk_overlap=CHUNK_OVERLAP
)
doc_splits = text_splitter.split_documents(doc_list)

In [12]:
import tiktoken

encoding = tiktoken.get_encoding("cl100k_base")
encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")
for d in doc_splits:
    print("The document is %s tokens" % len(encoding.encode(d.page_content)))

The document is 6562 tokens
The document is 3037 tokens
The document is 6092 tokens
The document is 1050 tokens
The document is 6933 tokens
The document is 5570 tokens


## Index

In [13]:
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableLambda, RunnablePassthrough

vector_store = Chroma.from_documents(
    documents=doc_splits,
    collection_name="rag-chroma",
    embedding=embeddings_nomic,
)
retriever = vector_store.as_retriever(search_kwargs={"k":3})

In [14]:
retriever.get_relevant_documents("What is task decomposition")

[Document(page_content='LLM Powered Autonomous Agents | Lil\'Log\n\nLil\'Log\n\n\nPosts\n\n\nArchive\n\n\nSearch\n\n\nTags\n\n\nFAQ\n\n\nemojisearch.app\n\n      LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\n \n\n\nTable of Contents\n\nAgent System Overview\n\nComponent One: Planning\n\nTask Decomposition\n\nSelf-Reflection\n\n\nComponent Two: Memory\n\nTypes of Memory\n\nMaximum Inner Product Search (MIPS)\n\n\nComponent Three: Tool Use\n\nCase Studies\n\nScientific Discovery Agent\n\nGenerative Agents Simulation\n\nProof-of-Concept Examples\n\n\nChallenges\n\nCitation\n\nReferences\n\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful gene

### Add to vector db

In [15]:
from langchain_community.chat_models import ChatOllama
from langchain_core.prompts import ChatPromptTemplate

template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

model_local = ChatOllama(model=LLM_MODEL)

chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | model_local
    | StrOutputParser()
)


## Prompt Queries

In [16]:
chain.invoke("What is Task Decomposition?")

" Task decomposition is the process of breaking down a complex task into smaller, more manageable subtasks. This approach can help simplify problem-solving and make it easier to understand and tackle large problems by dividing them into more manageable pieces. By decomposing tasks, you can also identify and address any dependencies or constraints between subtasks, ensuring that the overall task is completed efficiently and effectively.\n\nIn the context of LLM-powered autonomous agents, task decomposition involves breaking down high-level goals or objectives into a series of smaller, well-defined actions that the agent can execute to achieve its desired outcome. This process helps ensure that the agent's problem-solving capabilities are focused and effective, enabling it to navigate complex environments and complete tasks with greater autonomy and efficiency.\n\nTask decomposition is closely related to the concept of hierarchical task networks (HTNs), which involve organizing tasks int