# Privacy-friendly Retrieval Augmented Generation over code
The purpose of this notebook is to teach a brand-new Python programmer to customize this notebook's code to build and execute their very own RAG script.

Together we will, 
1. answer questions about code using a large language model, and 
2. learn how to do so in a privacy-respecting way, so that everything runs on *your* machine. 

## What's RAG?
Retrieval Augmented Generation, or RAG, is a powerful tool in the field of artificial intelligence that combines the best of two worlds: information retrieval and text generation. 

In simple terms, imagine you're writing an essay on a topic you're not very familiar with. You would first search for relevant information on the internet (information retrieval), then you would use that information to write your essay (text generation). RAG does something similar, but in an automated way.

When asked a question, RAG first retrieves relevant documents from a large database (like how you would search the internet). This is the "Retrieval" part. Then, it uses the information from these documents to generate a detailed, coherent response, which is the "Augmented Generation" part.

### Why use RAG?
The beauty of RAG is that it doesn't just blindly copy the information it retrieves. Instead, it understands the context of the question and generates a response that is not only accurate but also relevant and helpful. This makes RAG extremely useful in a variety of applications, such as answering complex questions, writing detailed summaries, or even creating content on specific topics.

RAG is a sophisticated AI tool that retrieves relevant information to accurately and helpfully generate content, much like a well-informed and articulate writer.

## Setup & Installation
Begin by installing and import the required libraries:

In [1]:
%pip install -qU langchain chromadb sentence-transformers langchainhub GitPython langchain-openai

Note: you may need to restart the kernel to use updated packages.


## LangSmith
If you have access to LangSmith, it will be extremely helpful as projects become increasingly more complex because it allows the tracing of action steps made throughout execution. 

This is especially true for us since we're building custom tools, creating an agent using Hugging Face's Model Hub, and tying everything together inside a runnable chain.

It becomes even more true as you add memory, vectorstore searching and so on.

In [2]:
import os

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = ""
os.environ["LANGCHAIN_PROJECT"] = "rag-over-code"

## Clone a Repo
Hopefully you already have a GitHub repository you'd like to interrogate. 

Otherwise, you can use the LangChain repo as a default. Plus, once we've learned how to query our files, you can ask clarifying questions about LangChain code you used here today.

In [3]:
import os
from git import Repo

# Clone
repo_path = "./langchain-library" # directory to clone the repository to \
    # `./` will create a sub-directory in the current working directory, named langchain-library

# Clone the repo if the sub-directory above doesn't exist
if not os.path.exists(repo_path):
    repo = Repo.clone_from("https://github.com/langchain-ai/langchain", to_path=repo_path)

## Load Files

In [4]:
from langchain.document_loaders.generic import GenericLoader
from langchain.document_loaders.parsers import LanguageParser
from langchain.text_splitter import Language

# Use `GenericLoader` to load Python files from the cloned repository
load_langchain_api_docs = GenericLoader.from_filesystem(
    repo_path + "/libs/langchain/langchain",
    glob="**/*",
    suffixes=[".py"], # Specify a list of file types
    parser=LanguageParser(language=Language.PYTHON, parser_threshold=500),
)

# The usual syntax for the following line is, \
# `data = loader.load()`, but our `GenericLoader` \
# is assigned to a more specific variable than 'loader'
documents = load_langchain_api_docs.load()

# Note: If you're using PyPDFLoader then it will split by page for you already
print (f'\nYou have {len(documents)} document(s) in your data')
print (f'\nThere are {len(documents[0].page_content)} characters in your sample document')
print (f'\nHere is a sample: \n\n```\n\n{documents[0].page_content[:200]}\n\n```')


You have 1563 document(s) in your data

There are 217 characters in your sample document

Here is a sample: 

```

"""Deprecated module for BaseLanguageModel class, kept for backwards compatibility."""
from __future__ import annotations

from langchain_core.language_models import BaseLanguageModel

__all__ = ["Bas

```


## Split Documents into Chunks
To make effective use of our loaded documents (files) we need to split them into manageable chunks.

Generally speaking, smaller chunks warrant more accurate results, but may take longer to process.

### Go Deeper

#### Accuracy with Smaller Chunks
* **Increased Focus**: Smaller chunks of text or queries allow the system to focus on a more specific set of information. This specificity can lead to more accurate and relevant results because the system is not overwhelmed by too much or too broad information.
* **Contextual Relevance**: With a narrower focus, the likelihood of retrieving information that is contextually relevant to more specific queries, enhancing the accuracy of the response.

#### Processing Time
* **Multiple Queries**: Smaller chunks might require multiple queries to cover a topic comprehensively. Each query involves a separate retrieval process, which cumulatively can take more time.
* **Trade-off Between Depth and Breadth**: While smaller queries allow for a depth in a specific area, they might necessitate multiple rounds of retrieval to get a broad understanding, thus increasing overall processing time.

#### System Limitations and Efficiency:
* **Computational Load**: Smaller chunks means more frequent calls to the retrieval system. Depending on the efficiency of the system, this can either slow down the process due to computational load or, if the system is highly efficient, might not significantly impact the processing time.

In [5]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Set `text_splitter` as `RecursiveCharacterTextSplitter`
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2048, chunk_overlap=0)

In [6]:
# Split the documents
texts = text_splitter.split_documents(documents)

In [7]:
# Take a look at your split texts
for text in texts:
    print(text)

page_content='"""Deprecated module for BaseLanguageModel class, kept for backwards compatibility."""\nfrom __future__ import annotations\n\nfrom langchain_core.language_models import BaseLanguageModel\n\n__all__ = ["BaseLanguageModel"]' metadata={'source': 'langchain-library\\libs\\langchain\\langchain\\base_language.py', 'language': <Language.PYTHON: 'python'>}
page_content='from langchain_community.cache import (\n    AstraDBCache,\n    AstraDBSemanticCache,\n    CassandraCache,\n    CassandraSemanticCache,\n    FullLLMCache,\n    FullMd5LLMCache,\n    GPTCache,\n    InMemoryCache,\n    MomentoCache,\n    RedisCache,\n    RedisSemanticCache,\n    SQLAlchemyCache,\n    SQLAlchemyMd5Cache,\n    SQLiteCache,\n    UpstashRedisCache,\n)\n\n__all__ = [\n    "InMemoryCache",\n    "FullLLMCache",\n    "SQLAlchemyCache",\n    "SQLiteCache",\n    "UpstashRedisCache",\n    "RedisCache",\n    "RedisSemanticCache",\n    "GPTCache",\n    "MomentoCache",\n    "CassandraCache",\n    "CassandraSemant

## `SentenceTransformerEmbeddings` & `Chroma` to power Retrieval

To keep our activity private, we'll use a free, open-source embeddings model hosted on the [Hugging Face](https://huggingface.co/).

`all-MiniLM-L6-v2` is a solid default choice due to its compact size, efficiency, and effectiveness in generating meaningful sentence embeddings. You may find more models [here](https://huggingface.co/models?pipeline_tag=sentence-similarity).

In [8]:
from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings

embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

  from .autonotebook import tqdm as notebook_tqdm
  return self.fget.__get__(instance, owner)()


### Create the Chroma Database
Now we need to create our database. We can do so in a single line to pass in documents, set our embeddings model, and choose a directory to save our database to persistent storage.

In [9]:
from langchain.vectorstores import Chroma 

# load it into Chroma
db = Chroma.from_documents(
    texts, 
    embedding_function, 
    persist_directory="./chroma_db"
    )

### Create the Retriever
We'll use our new `db` as our retriever using `as_retriever()` from the LangChain library.

I encourage you to read [this](https://api.python.langchain.com/en/stable/vectorstores/langchain_community.vectorstores.chroma.Chroma.html?highlight=chroma%20as_retriever#langchain_community.vectorstores.chroma.Chroma.as_retriever "Learn more about 'as_retriever'") Python API documentation from LangChain.

In [10]:
# Create the retriever
retriever = db.as_retriever(
    search_type="mmr", # or "similarity"
    search_kwargs={"k": 32,}
    )

Let's test our retriever.

In [None]:
# test the retriever
retrieved_docs = retriever.invoke(
    "Jina Embeddings"
)

# Print the first retrieved doc
print(retrieved_docs[0].page_content)

Great! We've successfully used the retriever to get relevant context from our database!

Now let's create a chain to generate answers based on what context we retrieve.

We'll start by calling another open-source model from the Hugging Face model hub. 

LangChain provides a wrapper, `HuggingFaceHub` to quickly initialize models for text generation or text2text generation.
- Note: You will need a HuggingFace Access Token, which you can get [here](https://huggingface.co/settings/tokens).

In [None]:
from langchain_community.llms import HuggingFaceHub

# Define the LLM for summarization
llm = HuggingFaceHub(
    repo_id="google/flan-t5-base",
    model_kwargs={"temperature": 0.7, "max_length": 500},
    huggingfacehub_api_token="",
    task="text-generation",
)

#### If the HuggingFace API isn't available, use the following cell to load GPT-3.5 as a drop-in replacement

In [None]:
# Use OpenAI since HF endpoint doesn't seem to like the length of my connection time
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(model_name="gpt-3.5-turbo-1106", temperature=0.7, openai_api_key="")

In [12]:
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationSummaryMemory

# Initialize Agent memory
memory = ConversationSummaryMemory(
    llm=llm,
    memory_key="chat_history",
    return_messages=True
)

# Initialize the ConversationalRetrievalChain
qa = ConversationalRetrievalChain.from_llm(
    llm,
    retriever=retriever,
    memory=memory
    )

In [None]:
question = "How can I initialize a ReAct agent?"
result = qa(question)
result["answer"]

In [14]:
questions = [
    "What is the class hierarchy?",
    "What classes are derived from the Chain class?",
    "What one improvement do you propose in code in relation to the class hierarchy for the Chain class?",
]

# Iterate over the list of questions and print the answers
for question in questions:
    result = qa(question)
    print(f"-> **Question**: {question} \n")
    print(f"**Answer**: {result['answer']} \n")

-> **Question**: What is the class hierarchy? 

**Answer**: The class hierarchy includes the following structure:

- Chain --> <name>Chain  # Examples: LLMChain, MapReduceChain, RouterChain

This structure demonstrates that Chains are easily reusable components linked together, allowing for the encoding of a sequence of calls to components like models, document retrievers, and other Chains. 

-> **Question**: What classes are derived from the Chain class? 

**Answer**: The following classes are derived from the `Chain` class:
1. `RouterChain`
2. `MultiRouteChain`
3. `SequentialChain`
4. `TransformChain`
5. `LLMRouterChain`
6. `LLMMathChain`

These are all part of the `langchain.chains` module and are used for various purposes such as routing, transformation, and language model-based operations. 

-> **Question**: What one improvement do you propose in code in relation to the class hierarchy for the Chain class? 

**Answer**: The class hierarchy for the `Chain` class seems to be consist

## ChatPromptTemplate

Since our LLM is a chat model, we can make use of all 'Chat' related functionality.

In [15]:
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.chains import LLMChain
from langchain.memory import ConversationSummaryMemory
from langchain.prompts import PromptTemplate

In [16]:
from langchain.chains.question_answering import load_qa_chain

# Create your Prompt Template by using `PromptTemplate`
template = """Use the following pieces of context to answer the question at the end. 
If you don't know the answer, just say that you don't know, don't try to make up an answer. 
Use three sentences maximum and keep the answer as concise as possible. 
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate(
    input_variables=["context", "question"],
    template=template,
)


# Use the LangChain Hub to load a premade prompt
# Uncomment the following, comment out the above, \
    # to use the Hub

#from langchain import hub

#QA_CHAIN_PROMPT = hub.pull("rlm/rag-prompt-default")

In [17]:
# Docs
question = "How can I initialize a ReAct agent?"
docs = retriever.get_relevant_documents(question)

# Chain
chain = load_qa_chain(llm, chain_type="stuff", prompt=QA_CHAIN_PROMPT)

# Run
chain({"input_documents": docs, "question": question}, return_only_outputs=True)

{'output_text': 'You can initialize a ReAct agent using the `create_react_agent` method, which takes a language model (LLM) and a sequence of tools as input and returns an agent executor. The agent executor is used to load an agent executor given tools and LLM.'}

# Wrapping Up
In this notebook, you've learned how to use the `Langchain` library to create a question-answering system. 

You've seen how to load a question-answering chain and how to create a prompt template for the system. You've also learned about the importance of keeping answers concise and not making up answers when the system doesn't know the answer. Furthermore, you've explored an alternative way of pulling a default prompt from the `Langchain` hub. 

While we didn't apply an output parser in this notebook, you've gained a solid foundation in setting up a question-answering system with `Langchain`.