# Exploring LangChain: Components & RAG Walkthrough

**Presenter:** Jack Tol  
**Website:** [jacktol.net](https://jacktol.net)

**Education:**  
- Bachelor of Artificial Intelligence  
  University of Technology Sydney  
  *2023 - Present*

**Overview:** Today, we'll delve into LangChain, exploring its key components and concluding with a walkthrough on leveraging Langchain's components and integrations to build a RAG prototype application.

**Date:** 2024-06-29 

**Agenda:**
- Introduction to Langchain & Its Ecosystem
- Reviewing a Few Langchain Components & A Brief Introduction to RAG
- Live Textbook RAG Demonstration

**Audience:** AI Enthusiasts, Developers, Researchers

**Contact:** For inquiries or more information, please email [contact@jacktol.net](mailto:contact@jacktol.net) or visit [jacktol.net](https://jacktol.net).

**Note:** This presentation will be recorded for future viewing.


## Introduction to LangChain

### The LangChain Ecosystem
- [LangChain.com](https://www.langchain.com/) is the website for LangChain, a **software company** that creates various products to aid in the development, testing, evaluation, management, and deployment of LLM-powered AI applications.

- The naming scheme adopted by LangChain can be a bit confusing. Let's start by differentiating between **LangChain the Company** and **LangChain the product**.

- One of the main products offered by LangChain is also called LangChain. This framework contains multiple packages and aims to be a flexible, yet abstract AI toolkit that provides various modular components which work together to create LLM-powered software/applications. The LangChain Product will be the primary focus of today's presentation.

#### The LangChain Product

**LangChain** is a framework consisting of several packages:

- **Langchain-Core**: Base abstractions of different components and provides a way to compose them together. No 3rd Party Integrations. Lightweight.
- **Langchain-Community**: Community Maintained 3rd party integrations for various components.
- **Partner Packages**: Popular integrations (e.g., langchain-openai, langchain-anthropic) are separated for better support.
- **Langchain**: Contains chains, agents, and retrieval strategies for application architecture. All components are NOT specific to any one integration, but rather generic across all integrations.


![](src/langchain_components.jpg)

#### The LangSmith Product
- **LangSmith**: Interpretability Developer platform for debugging, testing, evaluating, and monitoring LLM applications. RAG, CoT, Agentic Behavior.

#### The LangServe Product
- **LangServe**: A package which is built using FastAPI which makes it a bit easier deploy your Langchain chains as REST APIs for production-ready applications.

### The Purpose of Langchain
- **Langchain is a software framework** designed to simplify the integration of LLMs into applications by providing modular components.
- **You can debug, test, evaluate, and deploy LLM applications** more efficiently using these various products and modular components.

Langchain's library offers a variety of "components" that facilitate the development of LLM-related products and software. These components include:

- LLMs
- Embedding Models
- Vector Stores
- Document Loaders
- Tools (Agentic Functions)

...and much more.

In this presentation, we'll explore some implementations of these components. We will conclude with a demonstration of a small-scale notebook-based RAG prototype, showcasing how these elements can be integrated and function cohesively.

## Covering a few Langchain Components & A Brief Review of RAG

There are far too many components within Langchain to share them all with you here, so I'll just be covering a few components and their implementations which I feel are the most important and intuitive to understand.

### Document Loaders
#### PDF Document Loader

In [1]:
from langchain.document_loaders import PyPDFLoader

In [2]:
pdf_loader_example = PyPDFLoader("example_data\pdf_loader_example.pdf")
pages =  pdf_loader_example.load()
pages[:5]

[Document(metadata={'source': 'example_data\\pdf_loader_example.pdf', 'page': 0}, page_content='Efficient World Models with Context-Aware Tokenization\nVincent Micheli* 1Eloi Alonso* 1Franc ¸ois Fleuret1\nAbstract\nScaling up deep Reinforcement Learning (RL)\nmethods presents a significant challenge. Follow-\ning developments in generative modelling, model-\nbased RL positions itself as a strong contender.\nRecent advances in sequence modelling have led\nto effective transformer-based world models, al-\nbeit at the price of heavy computations due to\nthe long sequences of tokens required to accu-\nrately simulate environments. In this work, we\npropose ∆-IRIS, a new agent with a world model\narchitecture composed of a discrete autoencoder\nthat encodes stochastic deltas between time steps\nand an autoregressive transformer that predicts\nfuture deltas by summarizing the current state\nof the world with continuous tokens. In the\nCrafter benchmark, ∆-IRIS sets a new state of\nthe art at

#### CSV Document Loader

In [3]:
from langchain.document_loaders import CSVLoader

In [4]:
csv_loader_example = CSVLoader("example_data\metadata_parsed_arxiv.csv", encoding="utf-8")
csv_data = csv_loader_example.load()
csv_data[:5]

[Document(metadata={'source': 'example_data\\metadata_parsed_arxiv.csv', 'row': 0}, page_content="document_id: 0809.0182\nauthors: [{'keyname': 'Xie', 'forenames': 'Z. Y.'}, {'keyname': 'Jiang', 'forenames': 'H. C.'}, {'keyname': 'Chen', 'forenames': 'Q. N.'}, {'keyname': 'Weng', 'forenames': 'Z. Y.'}, {'keyname': 'Xiang', 'forenames': 'T.'}]\ntitle: Second Renormalization of Tensor-Network States"),
 Document(metadata={'source': 'example_data\\metadata_parsed_arxiv.csv', 'row': 1}, page_content="document_id: 0810.0725\nauthors: [{'keyname': 'Shadrin', 'forenames': 'S.'}]\ntitle: BCOV theory via Givental group action on cohomological field theories"),
 Document(metadata={'source': 'example_data\\metadata_parsed_arxiv.csv', 'row': 2}, page_content="document_id: 0909.0800\nauthors: [{'keyname': 'Shadrin', 'forenames': 'Sergey'}, {'keyname': 'Zvonkine', 'forenames': 'Dimitri'}]\ntitle: A group action on Losev-Manin cohomological field theories"),
 Document(metadata={'source': 'example_dat

### Chat Models

In [5]:
from langchain_anthropic.chat_models import ChatAnthropic 

In [6]:
claude_llm = ChatAnthropic(model="claude-3-5-sonnet-20240620",
                           temperature=0,
                           max_tokens=1024,
                           timeout=None,)

In [7]:
claude_response = claude_llm.invoke("What is the meaning of life?")
claude_response.content

"The meaning of life is a profound philosophical question that has been debated by thinkers, philosophers, and religious leaders throughout history. There is no single, universally accepted answer, as the meaning of life can be highly personal and subjective. Some common perspectives include:\n\n1. Happiness and well-being: Finding joy, contentment, and fulfillment in life.\n\n2. Love and relationships: Forming meaningful connections with others.\n\n3. Personal growth: Continuously learning, improving, and developing as an individual.\n\n4. Contribution to society: Making a positive impact on the world and helping others.\n\n5. Self-realization: Discovering and fulfilling one's potential and purpose.\n\n6. Religious or spiritual fulfillment: Following a particular faith or spiritual path.\n\n7. Experiencing and appreciating existence: Living in the present moment and finding beauty in the world.\n\n8. Creating meaning: Actively constructing purpose and significance in one's own life.\n

### Vector Stores

In [11]:
from langchain_pinecone import PineconeVectorStore
from langchain_openai import OpenAIEmbeddings

In [12]:
embedding_model = OpenAIEmbeddings(model="text-embedding-3-small")
arxiv_example_index = PineconeVectorStore.from_existing_index(embedding=embedding_model, index_name="arxiv-example-index")

In [13]:
search_results = arxiv_example_index.similarity_search(query="Uniform stability of the damped wave equation with a confining potential in the Euclidean space", k=5)
search_results

[Document(metadata={'document_id': '2406.17358'}, page_content='Uniform stability of the damped wave equation with a confining potential in the Euclidean space by Antoine Prouff'),
 Document(metadata={'document_id': '2406.17348'}, page_content='Exact controllability to eigensolutions of the heat equation via bilinear controls on two-dimensional domains by Rémi Buffe, Alessandro Duca'),
 Document(metadata={'document_id': '2406.17449'}, page_content='Pulsating string solution and stability in two parameter $\\chi$-deformed background by Rashmi R Nayak, Nibedita Padhi, Manoranjan Samal'),
 Document(metadata={'document_id': '2406.17293'}, page_content='Improved wave function for heavy-light mesons in QCD potential model approach and parameterization of the Cornell potential by Abdul Aziz, Sabyasachi Roy, Atri DEshamukhya'),
 Document(metadata={'document_id': '2406.17498'}, page_content='Multi-solitons of one-dimensional Boussinesq equation by Vicente Alvarez, Amin Esfahani')]

### A Brief Review of RAG

- RAG Stands for Retrieval Augmented Generation.
- A RAG Pipeline consists of a few different parts, which all work together to allow a language model to generate reliable and accurate information based on retrieved context.
- We pass through the users query and the retrieved context dynamically to a language model by filling in a prompt template.
- The prompt template leverages a LLMs ability to follow system level instructions when directed and the fact that we can pass through content as variables to the model.


#### Prompt Template Example
```
Instructions:
Your job is to answer the user's query using only the provided context.
Keep your answer grounded in the facts of the provided context.
If the context does not contain the facts needed to answer the user's query, return: "I do not have enough information available to accurately answer the question."

Context:
{context}

Query:
{user_query}
```

#### How does the context actually get retrieved?
- We take our external documents (PDFs, CSVs, TXTs etc.) and use **some heuristic** to split the text (perhaps splitting after *n* number of characters or based on the tags in a programming language if we are splitting code text).
- If our splits are too small, say, if we decided to split on sentences, we can combine a few sentences together to create a ***chunk*** of text.
- We take our chunks of text and pass them through a ***sentence embedding*** model, and we upload the embedding vectors to a vector database.
- Now, when a user enters a query, if we encode that query using the same embedding model, we can then perform a ***cosine similarity*** search between all the vectors in the database with respect to the users query vector, and then retrieve the ***top-k*** most similar chunks of text.
- If we save the retrieved context to a `context` variable, we can easily send that through to the prompt template in addition to the users query and let the model generate the response.

### RAG Pipeline Diagram
![](src/rag_pipeline.png)

## Textbook RAG Demonstration Using Langchain
### Part 1: Loading, Processing, & Uploading Our Documents

Let's first quickly take a look at the Textbook PDF we are going to be using for this example, in addition to the Pinecone Vector Store where we will be uploading our chunks to.

Understanding any high-level framework always begins by looking at and understanding the imports you'll be using.

Looking at our imports we see that we first import our 'Document Loader/Transformer' components, which in this case is `PyPDFLoader` & `RecurseiveCharacterSplitter` respectively.

Next, we import our "Embedding Model & Vector Store" components, which in this case is the `HuggingFaceEmbeddings` & `PineconeVectorStore` respectively.


In [14]:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
    
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_pinecone import PineconeVectorStore

Let's setup our PDF Loader, passing through the path for the textbook, and our Text Splitter, passing through our desired text splitting settings.

In [15]:
pdf_loader = PyPDFLoader("Machine Learning - An Algorithmic Perspective, Second Edition, by Stephen Marsland.pdf")
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50, length_function=len)

And now running our PDF Loader & Splitter with the convenient `load_and_split` method.

In [16]:
chunks = pdf_loader.load_and_split(text_splitter=text_splitter)

Inspecting our chunks, we can see that each chunk is stored in a list, with a `page_content` component containing the actual text of the chunk, and also a `metadata` component which stores a dictionary of metadata for each chunk, including the name of the original source of the chunk and also the page from which it came from.

In [17]:
chunks[:5]

[Document(metadata={'source': 'Machine Learning - An Algorithmic Perspective, Second Edition, by Stephen Marsland.pdf', 'page': 0}, page_content='Chapman & Hall/CRC \nMachine Learning & Pattern Recognition SeriesChapman & Hall/CRC \nMachine Learning & Pattern Recognition Series\nMachine Learning MACHINE \nLEARNING\nAn Algorithmic Perspective\nSecond Edition\nMarsland\nStephen Marsland\n• Access online or download to your smartphone, tablet or PC/Mac\n• Search the full text of this and other titles you own\n• Make and share notes and highlights\n• Copy and paste text and figures for use in your own documents'),
 Document(metadata={'source': 'Machine Learning - An Algorithmic Perspective, Second Edition, by Stephen Marsland.pdf', 'page': 0}, page_content='• Customize your view by changing font size and layoutWITH VITALSOURCE®\nEBOOKsecond editionMachine Learning: An Algorithmic Perspective, Second Edition  helps you understand \nthe algorithms of machine learning. It puts you on a path t

As a sanity check, we can also inspect the length of the `chunks` list and see if it makes sense, and in this case it does, especially since we set our chunk size to 500 characters.

In [18]:
len(chunks)

2285

Now let's set up two variables, the `embedding_model`and our `textbook_vector_store`, these are generally set up as global variables as many different parts of your codebase will use them.

We set the `embedding_model` to a `HuggingFaceEmbeddings` object, and our `textbook_vector_store` to a `PineconeVectorStore` object, using the `from_existing_index` method, in which we can pass through our embedding model, in addition to the index name, which we set to the name of our Pinecone index which we have set up.

In [20]:
embedding_model = HuggingFaceEmbeddings()

textbook_vector_store = PineconeVectorStore.from_existing_index(embedding=embedding_model, index_name="textbook-vector-store")

Let's now parse both the text and metadata component of each chunk, parsing each of these components to their own list, taking special attention to parse the metadata as a dictionary as Pinecone expects metadata to be formatted that way as we'll see shortly.

In [21]:
texts = [chunk.page_content for chunk in chunks]
metadatas = [{'page': chunk.metadata['page'] + 1, 'document': chunk.metadata['source']} for chunk in chunks]

And as we would expect, when we look within the `texts` list, we see the text component of the chunks. And ditto for the our `metadatas` list, where we see the metadata, including the page number & source file name for that particular chunk, formatted as a dictionary. 

In [22]:
from IPython.display import display

display(texts[:5])
display(metadatas[:5])

['Chapman & Hall/CRC \nMachine Learning & Pattern Recognition SeriesChapman & Hall/CRC \nMachine Learning & Pattern Recognition Series\nMachine Learning MACHINE \nLEARNING\nAn Algorithmic Perspective\nSecond Edition\nMarsland\nStephen Marsland\n• Access online or download to your smartphone, tablet or PC/Mac\n• Search the full text of this and other titles you own\n• Make and share notes and highlights\n• Copy and paste text and figures for use in your own documents',
 '• Customize your view by changing font size and layoutWITH VITALSOURCE®\nEBOOKsecond editionMachine Learning: An Algorithmic Perspective, Second Edition  helps you understand \nthe algorithms of machine learning. It puts you on a path toward mastering the relevant \nmathematics and statistics as well as the necessary programming and experimentation.\nNew to the Second Edition\n•  Two new chapters on deep belief networks and Gaussian processes',
 '•  Reorganization of the chapters to make a more natural flow of content\n

[{'page': 1,
  'document': 'Machine Learning - An Algorithmic Perspective, Second Edition, by Stephen Marsland.pdf'},
 {'page': 1,
  'document': 'Machine Learning - An Algorithmic Perspective, Second Edition, by Stephen Marsland.pdf'},
 {'page': 1,
  'document': 'Machine Learning - An Algorithmic Perspective, Second Edition, by Stephen Marsland.pdf'},
 {'page': 1,
  'document': 'Machine Learning - An Algorithmic Perspective, Second Edition, by Stephen Marsland.pdf'},
 {'page': 1,
  'document': 'Machine Learning - An Algorithmic Perspective, Second Edition, by Stephen Marsland.pdf'}]

We are now finally at the point where we can embed and then upload our chunks. We use the `from_texts` method on our `textbook_vector_store`, passing through our `texts`, `embedding_model`, `metadatas`, and `index_name`. Suprisingly, due to the amazing power and abstraction given to us by Langchain, this one line of code embeds and uploads all 2285 chunks of our textbook.

In [23]:
textbook_vector_store.from_texts(
    texts=texts,
    embedding=embedding_model,
    metadatas=metadatas,
    index_name="textbook-vector-store"
)

<langchain_pinecone.vectorstores.PineconeVectorStore at 0x1e80f5cdc90>

Let's quickly make a function to conduct a similarity search on the vector store with respect to a user query we enter, setting the number of items to retrieve (k) to 5, printing out the retrieved context chunk and the corresponding page number and source document metadata. We use the `similarity_search` method on our `textbook_vector_store`, passing through our query and k number as parameters, loop through the results, and print the relevant information.

In [24]:
def retrieve_and_display(query, k=5):
    results = textbook_vector_store.similarity_search(query=query, k=k)
    for result in results:
        print(f"Text: {result.page_content}\n")
        print(f"Document: {result.metadata['document']}\n")
        print(f"Page: {int(result.metadata['page'])}")
        print("\n")

Wow! Look at that, we see the context which was retrieved, which is by default ordered by descending with respect to the cosine similarity between the chunk and the user query, in addition to the name of the source document of that chunk and the page number from which the chunk came from so we can always cross-check the information.

In [25]:
retrieve_and_display("tell me about neural networks")

Text: neural networks so that they can do something useful.
The question we need to think about ﬁrst is how our neurons can learn. We are going to
look atsupervised learning for the next few chapters, which means that the algorithms will
learn by example: the dataset that we learn from has the correct output values associated
with each datapoint. At ﬁrst sight this might seem pointless, since if you already know the

Document: Machine Learning - An Algorithmic Perspective, Second Edition, by Stephen Marsland.pdf

Page: 64


Text: a well-known introduction to neural networks:
•D.E. Rumelhart, G.E. Hinton, and R.J. Williams. Learning internal representations
by back-propagating errors. Nature, 323(99):533–536, 1986a.
•D.E. Rumelhart, J.L. McClelland, and the PDP Research Group, editors. Parallel
Distributed Processing . MIT Press, Cambridge, MA, 1986b.
•R. Lippmann. An introduction to computing with neural nets. IEEE ASSP Magazine ,
pages 4–22, 1987.

Document: Machine Learning - An Algo

And lastly, let's just put this into a function, to make it easy to load, process, & upload different documents a bit more efficiently.

#### First let's do our imports and define the two global variables we need

In [26]:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
    
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_pinecone import PineconeVectorStore

embedding_model = HuggingFaceEmbeddings()
textbook_vector_store = PineconeVectorStore.from_existing_index(embedding=embedding_model, index_name="textbook-vector-store")

In [27]:
def process_textbook_to_vector_store():
    pdf_loader = PyPDFLoader("Machine Learning - An Algorithmic Perspective, Second Edition, by Stephen Marsland.pdf")
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50, length_function=len)
    
    chunks = pdf_loader.load_and_split(text_splitter=text_splitter)
    
    texts = [chunk.page_content for chunk in chunks]
    metadatas = [{'page': chunk.metadata['page'] + 1, 'document': chunk.metadata['source']} for chunk in chunks]
    
    textbook_vector_store.from_texts(
        texts=texts,
        embedding=embedding_model,
        metadatas=metadatas,
        index_name="textbook-vector-store"
    )
    print(f"Sucessfully Uploaded {len(texts)} Texts & {len(metadatas)} Metadatas to the Pinecone Textbook Vector Store.")
    return

In [28]:
process_textbook_to_vector_store()

Sucessfully Uploaded 2285 Texts & 2285 Metadatas to the Pinecone Textbook Vector Store.


##### Exercise: Try this on your own documents, notes or textbooks. Additionally, see if you can get it to work with Langchain's `PyPDFDirectoryLoader`.

#### Wonderful, we can see everything is working as expected. It's now time to move onto the next part.

### Part 2: Creating the RAG Component

Now, that we have gone through the process of loading, splitting, embedding and uploading our textbook to the vector database, and confirmed it's working as expected, it's time to connect this system up to a language model, so we can create our Question-Answering RAG Assistant.

Like before, let's inspect our imports to make sure we understand the tools we'll be using. Our first import from the `langchain_openai` integration, where we import the `ChatOpenAI` module. Our second import is from the `langchain_core` parent module, where we specifically capture the `prompts` sub-module in which we import the `ChatPromptTemplate` class.

We covered the `HuggingFaceEmbeddings` and `PineconeVectorStore` imports previously, so we won't worry about covering it here again.

In [29]:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

from langchain_huggingface import HuggingFaceEmbeddings
from langchain_pinecone import PineconeVectorStore

Next, we need to create our prompt template. The `ChatPromptTemplate` class provides a variety of flexible prompt templates which accommodates to various situations. When using the `ChatOpenAI` module, its most common to use the `from_messages` method, which formats system messages, human messages, and AI messages in an easy-to-understand list format which OpenAI LLM's expect.

We set a pretty specific system message, making sure to instruct the LLM to answer the users question only using the provided context, and if the context doesn't contain the answer, to let the user know this, instead of just hallucinating a response. We also, just pass through a `context` and `query` variable, which will get filled in with information at run time.

In [30]:
template = ChatPromptTemplate.from_messages([
    ("system", "Using ONLY the provided context, answer the user's query. If the provided context doesn't contain the answer, then return 'I don't have enough information to accurately answer the question.'"),
    ("system", "Context: {context}"),
    ("human", "User Query: {query}"),
    ("ai", "Answer:")
])

Now we simply setup our inference model, in this case setting a `llm` variable to a `ChatOpenAI()` model, passing through our model name, tempature, max tokens, and timeout. There are many, many more parameters you can pass in here to customize your model, but this simple setup covers pretty much all the essentials.

In [31]:
llm = ChatOpenAI(
    model="gpt-4o",
    temperature=0.3,
    max_tokens=None,
    timeout=None,
)

Now we need a mechanism to capture a user query, saving it to a variable `user_query`.

In [32]:
user_query = input("Please Enter Your Query: ")

Even though we defined the embedding model and vector store variables earlier, since we are going to want this section to be independent to the last section, we will redefine them here for clarification. Its helpful to think of the earlier part as the "Document Processing & Uploading Pipeline" and this section as the "Query, Retrieve, & Generate Pipeline".

Each should work independently, as we may want upload many, many more textbooks in the future, but the mechanism to query that vector store and generate a response will be the same.

In [34]:
embedding_model = HuggingFaceEmbeddings()
textbook_vector_store = PineconeVectorStore.from_existing_index(embedding=embedding_model, index_name="textbook-vector-store")

Now, just like before, we'll use the `similarity_search` method on out vector store to conduct a similarity search with respect to the user query, passing through a k value of 10 instead of 5 to improve the RAG performance and reliability of the system. We will set the retrieved context to a `retrieved_contexts` variable.

Inspecting the format of the retrieved context, we see its in the same format as earlier. We need to loop through all the retrieved context, parsing and extracting the page_content to a context list.

In [35]:
retrieved_contexts = textbook_vector_store.similarity_search(query=user_query, k=10)
retrieved_contexts

[Document(metadata={'document': 'Machine Learning - An Algorithmic Perspective, Second Edition, by Stephen Marsland.pdf', 'page': 64.0}, page_content='neural networks so that they can do something useful.\nThe question we need to think about ﬁrst is how our neurons can learn. We are going to\nlook atsupervised learning for the next few chapters, which means that the algorithms will\nlearn by example: the dataset that we learn from has the correct output values associated\nwith each datapoint. At ﬁrst sight this might seem pointless, since if you already know the'),
 Document(metadata={'document': 'Machine Learning - An Algorithmic Perspective, Second Edition, by Stephen Marsland.pdf', 'page': 64.0}, page_content='neural networks so that they can do something useful.\nThe question we need to think about ﬁrst is how our neurons can learn. We are going to\nlook atsupervised learning for the next few chapters, which means that the algorithms will\nlearn by example: the dataset that we lear

And that's exactly what we do here. We simply loop through each doc in `retrieved_contexts` and using a list comprehension, extract the parsed context to a new context list.

Looking at the context list, this looks much cleaner now.

In [36]:
context = [doc.page_content for doc in retrieved_contexts]
context

['neural networks so that they can do something useful.\nThe question we need to think about ﬁrst is how our neurons can learn. We are going to\nlook atsupervised learning for the next few chapters, which means that the algorithms will\nlearn by example: the dataset that we learn from has the correct output values associated\nwith each datapoint. At ﬁrst sight this might seem pointless, since if you already know the',
 'neural networks so that they can do something useful.\nThe question we need to think about ﬁrst is how our neurons can learn. We are going to\nlook atsupervised learning for the next few chapters, which means that the algorithms will\nlearn by example: the dataset that we learn from has the correct output values associated\nwith each datapoint. At ﬁrst sight this might seem pointless, since if you already know the',
 'a well-known introduction to neural networks:\n•D.E. Rumelhart, G.E. Hinton, and R.J. Williams. Learning internal representations\nby back-propagating err

Now that we have the user query and the context, we are ready to populate our prompt template with these values. So using the `invoke` method on our template, we pass through our `user_query` and `context` variables through to their respective key.

We set this finalized prompt to the `prompt_value` variable.

In [37]:
prompt_value = template.invoke({
    "query": user_query,
    "context": context
})

And now, that our prompt template has been populated, we are ready to pass through our prompt to the LLM using the `invoke` method on our `llm`, passing through our `prompt_value` which contains our prompt. We capture the response in the `response` variable, and print out the response.

Wow! Look at that. We got the response from the LLM, we just need to parse the content of the response and then we have for all intents and purposes created a RAG Pipeline in Langchain.

In [38]:
response = llm.invoke(prompt_value)
response

AIMessage(content='Neural networks are computational models inspired by the human brain. They consist of interconnected neurons that process information by responding to external inputs. A single neuron isn\'t very interesting on its own, as it only fires or doesn\'t fire when given inputs and doesn\'t learn. To make neurons more useful, they need to be able to learn and be put together into networks.\n\nOne of the most common neural networks in use is the Multi-layer Perceptron (MLP), proposed by Rumelhart, Hinton, and McClelland. The MLP is often treated as a \'black box\' by users who don\'t understand how it works, which can lead to poor results. Understanding how neural networks learn is crucial, and supervised learning is a common method where algorithms learn by example from a dataset with correct output values associated with each datapoint.\n\nFor further reading, some well-known introductions to neural networks include:\n- D.E. Rumelhart, G.E. Hinton, and R.J. Williams\' "Lea

Extracting out the content, we get our final response from the LLM which is now our question answered using only chunks of retrieved context which we have just uploaded a few minutes prior. Beautiful.

In [39]:
parsed_response = response.content
parsed_response

'Neural networks are computational models inspired by the human brain. They consist of interconnected neurons that process information by responding to external inputs. A single neuron isn\'t very interesting on its own, as it only fires or doesn\'t fire when given inputs and doesn\'t learn. To make neurons more useful, they need to be able to learn and be put together into networks.\n\nOne of the most common neural networks in use is the Multi-layer Perceptron (MLP), proposed by Rumelhart, Hinton, and McClelland. The MLP is often treated as a \'black box\' by users who don\'t understand how it works, which can lead to poor results. Understanding how neural networks learn is crucial, and supervised learning is a common method where algorithms learn by example from a dataset with correct output values associated with each datapoint.\n\nFor further reading, some well-known introductions to neural networks include:\n- D.E. Rumelhart, G.E. Hinton, and R.J. Williams\' "Learning internal rep

For a bit of fun, let's inspect the prompt actually sent to the model...

Oh, that's not very helpful

In [40]:
prompt_value

ChatPromptValue(messages=[SystemMessage(content="Using ONLY the provided context, answer the user's query. If the provided context doesn't contain the answer, then return 'I don't have enough information to accurately answer the question.'"), SystemMessage(content="Context: ['neural networks so that they can do something useful.\\nThe question we need to think about ﬁrst is how our neurons can learn. We are going to\\nlook atsupervised learning for the next few chapters, which means that the algorithms will\\nlearn by example: the dataset that we learn from has the correct output values associated\\nwith each datapoint. At ﬁrst sight this might seem pointless, since if you already know the', 'neural networks so that they can do something useful.\\nThe question we need to think about ﬁrst is how our neurons can learn. We are going to\\nlook atsupervised learning for the next few chapters, which means that the algorithms will\\nlearn by example: the dataset that we learn from has the cor

Let's parse this so it looks a bit nicer...

In [41]:
messages = prompt_value.messages

system_instructions = messages[0].content
context_message = messages[1].content
user_query_message = messages[2].content
llm_response = parsed_response

In [42]:
print(f"SYSTEM INSTRUCTIONS:\n{system_instructions}")
print(f"CONTEXT SENT TO MODEL:\n{context_message}")
print(f"ENTERED USER QUERY:\n{user_query_message}")
print(f"RESPONSE FROM LLM:\n{llm_response}")

SYSTEM INSTRUCTIONS:
Using ONLY the provided context, answer the user's query. If the provided context doesn't contain the answer, then return 'I don't have enough information to accurately answer the question.'
CONTEXT SENT TO MODEL:
Context: ['neural networks so that they can do something useful.\nThe question we need to think about ﬁrst is how our neurons can learn. We are going to\nlook atsupervised learning for the next few chapters, which means that the algorithms will\nlearn by example: the dataset that we learn from has the correct output values associated\nwith each datapoint. At ﬁrst sight this might seem pointless, since if you already know the', 'neural networks so that they can do something useful.\nThe question we need to think about ﬁrst is how our neurons can learn. We are going to\nlook atsupervised learning for the next few chapters, which means that the algorithms will\nlearn by example: the dataset that we learn from has the correct output values associated\nwith ea

### Time to put it in a function

#### First let's do our imports and define the two global variables we need

In [44]:
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_pinecone import PineconeVectorStore

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

embedding_model = HuggingFaceEmbeddings()
textbook_vector_store = PineconeVectorStore.from_existing_index(embedding=embedding_model, index_name="textbook-vector-store")

In [45]:
def textbook_rag():

    user_query = input("Please Enter Your Query: ")

    template = ChatPromptTemplate.from_messages([
        ("system", "Using ONLY the provided context, answer the user's query. If the provided context doesn't contain the answer, then return 'I don't have enough information to accurately answer the question'"),
        ("system", "Context: {context}"),
        ("human", "User Query: {query}"),
        ("ai", "Answer:")
    ])

    llm = ChatOpenAI(
        model="gpt-4o",
        temperature=0.3,
        max_tokens=None,
        timeout=None,
    )

    retrieved_contexts = textbook_vector_store.similarity_search(query=user_query, k=10)
    context = [doc.page_content for doc in retrieved_contexts]

    prompt_value = template.invoke({
        "query": user_query,
        "context": context
    })

    response = llm.invoke(prompt_value)
    parsed_response = response.content
    print(parsed_response)
    return

In [47]:
textbook_rag()

Neural networks are systems of interconnected neurons that can learn and perform tasks. They consist of multiple neurons that can fire or not fire based on inputs. To make neurons more interesting and useful, they need to be able to learn and be organized into sets. One common type of neural network is the Multi-layer Perceptron (MLP), proposed by Rumelhart, Hinton, and McClelland, which is widely used in machine learning. Neural networks can learn from datasets with correct output values associated with each datapoint through supervised learning.


### Concluding Remarks
Perfect, we have now created a complete, albeit simple RAG implementation using only Langchain components/integrations, starting at external document loading, splitting, embedding, and uploading, and then conducting on the spot retrieval based on a user query to a GPT-4o inference model, printing out the results.

Please use this notebook as a base, and in your own time, experiment by adding components, swapping out components, and creating some cool applications!

Thank you for listening!