# Langchain Retrieval Augmented Generation

This notebook introduces how to work with Langchain. 
Made by Csaba Hegedűs, BME-TMIT. 

## Chapter 0 Setup

Same as the Langchain intro. 

### Python packages 
Installing prerequisites: langchain and langgraph libraries

In [4]:
%pip install --quiet langchain langchain-community langchain-openai langchain_chroma 

Note: you may need to restart the kernel to use updated packages.


### Configure LLM

Always run this, before trying out anything else. 

You can use OpenAI or AzureOpenAI. 

In [18]:
AZURE_OPENAI_ENDPOINT = ""
AZURE_OPENAI_API_KEY = ""
AZURE_OPENAI_API_VERSION = "2024-05-01-preview"
AZURE_OPENAI_DEPLOYMENT_NAME = "gpt4o"
AZURE_OPENAI_EMBEDDING_MODEL = "text-embedding-3-large"

from langchain_openai import AzureChatOpenAI, AzureOpenAIEmbeddings

llm = AzureChatOpenAI(
    azure_endpoint=AZURE_OPENAI_ENDPOINT,
    api_key=AZURE_OPENAI_API_KEY,
    api_version=AZURE_OPENAI_API_VERSION,
    deployment_name=AZURE_OPENAI_DEPLOYMENT_NAME,
)

embedder = AzureOpenAIEmbeddings(
    azure_endpoint=AZURE_OPENAI_ENDPOINT,
    api_key=AZURE_OPENAI_API_KEY,
    api_version=AZURE_OPENAI_API_VERSION,
    model=AZURE_OPENAI_EMBEDDING_MODEL,
)

ALTERNATIVE: Using OpenAI as LLM

In [22]:
OPENAI_API_KEY = ""

from langchain_openai import ChatOpenAI, OpenAIEmbeddings

llm = ChatOpenAI(api_key=OPENAI_API_KEY, model="gpt-4o")
embedder = OpenAIEmbeddings(api_key=OPENAI_API_KEY)

## Chapter 1 Processing documents 

Document load, split, store (embed). 

Need to load the document into a Document object.
Follow this tutorial, if need additional help: https://python.langchain.com/v0.2/docs/tutorials/rag/

 There are many types of loaders in Langchain: https://python.langchain.com/v0.2/docs/integrations/document_loaders/

 How to load PDFs specifically: https://python.langchain.com/v0.2/docs/how_to/document_loader_pdf/

 I have used Unstructured library, because it has built in OCR, supports multi-modality and many file types. Has Langchain integration: https://python.langchain.com/v0.2/docs/integrations/providers/unstructured/

 However, for an introduction demo, it is sufficient to use a simplier loader. So we will use the PyPDF loader: https://python.langchain.com/v0.2/docs/how_to/document_loader_pdf/#using-pypdf

In [2]:
%pip install -q pypdf 

Note: you may need to restart the kernel to use updated packages.


In [3]:
from langchain_community.document_loaders import PyPDFLoader
from pprint import pprint
file_path = "./copilotRC.pdf"
loader = PyPDFLoader(file_path)
pages = loader.load_and_split()
pprint(pages)
print(len(pages))

[Document(page_content='Co-pilots for Arrowhead-based\nCyber-Physical System of Systems Engineering\nCsaba Heged ˝us, P ´al Varga\nDepartment of Telecommunications and Artificial Intelligence\nBudapest University of Technology and Economics\nM˝uegyetem rkp. 3., H-1111 Budapest, Hungary.\nEmail: {hegeduscs, pvarga }@tmit.bme.hu\nAbstract —One benefit of Large Language Model (LLM) based\napplications (e.g. chat assistants or co-pilots) is that they can\nbring humans closer to the loop in various IT and OT solutions.\nCo-pilots can achieve many things at once, i.e. provide a context-\naware natural language interface to knowledge bases, reach\nvarious systems (via APIs), or even help solving multi-step\nproblems with their planning and reasoning abilities. However,\nmaking production-grade chat assistants is a topical challenge,\nas fast-evolving LLMs expose new types of application design\nand security issues that need tackling. These especially rise\nto power when we try to apply these 

Next step is to split the large Documents into smaller chunks that can be later injected into prompts. 

It's worth noting that currently, gpt4o supports roughly 120K tokens as input context window. This will be filled with:

* system prompt
* chat history
* user query
* context injected by RAG pipeline

We usually inject a couple of relevant chunks, let's say 3. Therefore, we should have chunks that are around 10k tokens each. Previously (GPT4-32K or GPT-3.5), this chunk size was much-much smaller. 

In [14]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

#I am configuring chunk size to 1K, so we can see what's happening. 
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(pages)

pprint(splits)
print(len(splits))

[Document(page_content='Co-pilots for Arrowhead-based\nCyber-Physical System of Systems Engineering\nCsaba Heged ˝us, P ´al Varga\nDepartment of Telecommunications and Artificial Intelligence\nBudapest University of Technology and Economics\nM˝uegyetem rkp. 3., H-1111 Budapest, Hungary.\nEmail: {hegeduscs, pvarga }@tmit.bme.hu\nAbstract —One benefit of Large Language Model (LLM) based\napplications (e.g. chat assistants or co-pilots) is that they can\nbring humans closer to the loop in various IT and OT solutions.\nCo-pilots can achieve many things at once, i.e. provide a context-\naware natural language interface to knowledge bases, reach\nvarious systems (via APIs), or even help solving multi-step\nproblems with their planning and reasoning abilities. However,\nmaking production-grade chat assistants is a topical challenge,\nas fast-evolving LLMs expose new types of application design\nand security issues that need tackling. These especially rise\nto power when we try to apply these 

Now, we need to build a knowledge base using a vector database. We'll use simple in-memory vector DB. In other projects, we're using Postgres as vector DB with a plugin. 

Further read: https://python.langchain.com/v0.2/docs/how_to/vectorstores/

## Chapter 2 Retrieval and generation

Read: https://python.langchain.com/v0.2/docs/tutorials/rag/

In [26]:
from langchain_chroma import Chroma

vectorstore = Chroma.from_documents(documents=splits, embedding=embedder)
retriever = vectorstore.as_retriever()

pprint(retriever.invoke("What is arrowhead design copilot?"))



[Document(page_content='Fig. 3. The graphical overview of the Arrowhead Engineering Process (AEP) [24] to be supported by Co-pilots\nthe ecosystem, answering design and integration-related\nquestions. This can be embedded in the Arrowhead\nFramework Wiki [9] as an inline chatbot. Intended users\nare anyone who visits the Wiki.\n2) The Arrowhead Management Copilot interacting with\nvarious Arrowhead Core Systems of a Local Cloud de-\nployment to analyze and understand, potentially manage\nthe CPSoS via the Arrowhead governing middleware.\nThis tool can be embedded as a widget to the Arrowhead\nManagement Tool GUI. Intended users are the authen-\nticated Local Cloud (SoS) operators.\n3)Arrowhead Design Copilot which can integrate with the\nengineering toolchain to design SoS deployment and\nunderlying industrial automation processes and infras-\ntructure (i.e. SysML modeling with Eclipse Papyrus).\nThis co-pilot can be integrated into the Arrowhead\nEngineering Toolchain, interacting wit

In [None]:
#IN CASE YOU NEED TO DELETE THE VECTORSTORE
#vectorstore.delete_collection()

Creating system prompt for retrieval

In [32]:
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages([
    ("system","""Use the following pieces of context to answer the question at the end.
        If you don't know the answer, just say that you don't know, don't try to make up an answer.
        Use three sentences maximum and keep the answer as concise as possible.
        Always say "thanks for asking!" at the end of the answer.

        {context}

        Question: {question}

        Helpful Answer:"""
    )
])
prompt.pretty_print()



Use the following pieces of context to answer the question at the end.
        If you don't know the answer, just say that you don't know, don't try to make up an answer.
        Use three sentences maximum and keep the answer as concise as possible.
        Always say "thanks for asking!" at the end of the answer.

        [33;1m[1;3m{context}[0m

        Question: [33;1m[1;3m{question}[0m

        Helpful Answer:


In [46]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

def format_docs(docs):
    concatenated_text = "\n\n".join(doc.page_content for doc in docs)
    return concatenated_text

rag_chain = (
    # creates a dictionary where context value is filled up by retriever then formatted by format_docs
    # and question is passed over unchanged by RunnablePassthrough
    # these are Runnable objects that will be executed in parallel or sequence and the output is fed forward
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    #prompt expects dictionary of context and question
    | prompt
    | llm
    | StrOutputParser()
)

In [47]:
rag_chain.invoke("What is Arrowhead Design Copilot?")

'The Arrowhead Design Copilot is a tool that integrates with the engineering toolchain to assist in designing Systems of Systems (SoS) deployment and underlying industrial automation processes. It can interact with design tools like SysML modeling with Eclipse Papyrus. The intended users are SoS engineers. Thanks for asking!'

Further reads:

https://python.langchain.com/v0.1/docs/use_cases/question_answering/chat_history/

More advanced RAG types can be better implemented using Langgraph. 

* Adaptive RAG
https://github.com/langchain-ai/langgraph/blob/main/examples/rag/langgraph_adaptive_rag.ipynb 
* Corrective RAG
https://github.com/langchain-ai/langgraph/blob/main/examples/rag/langgraph_crag.ipynb
* Self RAG
https://github.com/langchain-ai/langgraph/blob/main/examples/rag/langgraph_self_rag.ipynb 
