## What is Langchain?
- Langchain is a software library designed to simplify the integration of large language models (LLMs) into applications by providing modular components.
- It enables developers to build, test, and deploy NLP and AI applications more efficiently, by offering tools and templates for tasks like chatbots, document processing, and information retrieval.
- The library supports various functionalities such as embedding models, document loaders, vector stores, and more, facilitating a comprehensive approach to building AI-driven solutions.

![](src/imageeee.png)


Langchain's library offers a variety of "components" that facilitate the development of NLP/LLM-related products and software. These components include:

- Chat Models
- Large Language Models (LLMs)
- Embedding Models
- Document Loaders
- Document Transformers
- Vector Stores
- Retrievers
- Tools

...and much more.

In this presentation, we'll explore some implementations of these components. We will conclude with a demonstration of a small-scale notebook-based RAG (Retrieve and Generate) prototype, showcasing how these elements can be integrated and function cohesively.

## Textbook RAG Demonstration Using Langchain
### Part 1: Loading, Processing, & Uploading Our Documents

Understanding any high-level framework always begins by looking at and understanding the imports you'll be using.

Looking at our imports we see that we first import our 'Document Loader/Transformer' components, which in this case is `PyPDFLoader` & `RecurseiveCharacterSplitter` respectively.

Next, we import our "Embedding Model & Vector Store" components, which in this case is the `HuggingFaceEmbeddings` & `PineconeVectorStore` respectively.


In [1]:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
    
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_pinecone import PineconeVectorStore

Let's setup our PDF Loader, passing through the path for the textbook, and our Text Splitter, passing through our desired text splitting settings.

In [2]:
pdf_loader = PyPDFLoader("Machine Learning - An Algorithmic Perspective, Second Edition, by Stephen Marsland.pdf")
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50, length_function=len)

And now running our PDF Loader & Splitter with the convenient `load_and_split` method.

In [3]:
chunks = pdf_loader.load_and_split(text_splitter=text_splitter)

Inspecting our chunks, we can see that each chunk is stored in a list, with a `page_content` component containing the actual text of the chunk, and also a `metadata` component which stores a dictionary of metadata for each chunk, including the name of the original source of the chunk and also the page from which it came from.

In [4]:
chunks[:5]

[Document(page_content='Chapman & Hall/CRC \nMachine Learning & Pattern Recognition SeriesChapman & Hall/CRC \nMachine Learning & Pattern Recognition Series\nMachine Learning MACHINE \nLEARNING\nAn Algorithmic Perspective\nSecond Edition\nMarsland\nStephen Marsland\n• Access online or download to your smartphone, tablet or PC/Mac\n• Search the full text of this and other titles you own\n• Make and share notes and highlights\n• Copy and paste text and figures for use in your own documents', metadata={'source': 'Machine Learning - An Algorithmic Perspective, Second Edition, by Stephen Marsland.pdf', 'page': 0}),
 Document(page_content='• Customize your view by changing font size and layoutWITH VITALSOURCE®\nEBOOKsecond editionMachine Learning: An Algorithmic Perspective, Second Edition  helps you understand \nthe algorithms of machine learning. It puts you on a path toward mastering the relevant \nmathematics and statistics as well as the necessary programming and experimentation.\nNew t

As a sanity check, we can also inspect the length of the `chunks` list and see if it makes sense, and in this case it does, especially since we set our chunk size to 500 characters.

In [5]:
len(chunks)

2285

Now let's set up two variables, the `embedding_model`and our `textbook_vector_store`, these are generally set up as global variables as many different parts of your codebase will use them.

We set the `embedding_model` to a `HuggingFaceEmbeddings` object, and our `textbook_vector_store` to a `PineconeVectorStore` object, using the `from_existing_index` method, in which we can pass through our embedding model, in addition to the index name, which we set to the name of our Pinecone index which we have set up.

In [6]:
embedding_model = HuggingFaceEmbeddings()
textbook_vector_store = PineconeVectorStore.from_existing_index(embedding=embedding_model, index_name="textbook-vector-store")

Let's now parse both the text and metadata component of each chunk, parsing each of these components to their own list, taking special attention to parse the metadata as a dictionary as Pinecone expects metadata to be formatted that way as we'll see shortly.

In [8]:
texts = [chunk.page_content for chunk in chunks]
metadatas = [{'page': chunk.metadata['page'] + 1, 'document': chunk.metadata['source']} for chunk in chunks]

And as we would expect, when we look within the `texts` list, we see the text component of the chunks. And ditto for the our `metadatas` list, where we see the metadata, including the page number & source file name for that particular chunk, formatted as a dictionary. 

In [12]:
from IPython.display import display

display(texts[:5])
display(metadatas[:5])

['Chapman & Hall/CRC \nMachine Learning & Pattern Recognition SeriesChapman & Hall/CRC \nMachine Learning & Pattern Recognition Series\nMachine Learning MACHINE \nLEARNING\nAn Algorithmic Perspective\nSecond Edition\nMarsland\nStephen Marsland\n• Access online or download to your smartphone, tablet or PC/Mac\n• Search the full text of this and other titles you own\n• Make and share notes and highlights\n• Copy and paste text and figures for use in your own documents',
 '• Customize your view by changing font size and layoutWITH VITALSOURCE®\nEBOOKsecond editionMachine Learning: An Algorithmic Perspective, Second Edition  helps you understand \nthe algorithms of machine learning. It puts you on a path toward mastering the relevant \nmathematics and statistics as well as the necessary programming and experimentation.\nNew to the Second Edition\n•  Two new chapters on deep belief networks and Gaussian processes',
 '•  Reorganization of the chapters to make a more natural flow of content\n

[{'page': 1,
  'document': 'Machine Learning - An Algorithmic Perspective, Second Edition, by Stephen Marsland.pdf'},
 {'page': 1,
  'document': 'Machine Learning - An Algorithmic Perspective, Second Edition, by Stephen Marsland.pdf'},
 {'page': 1,
  'document': 'Machine Learning - An Algorithmic Perspective, Second Edition, by Stephen Marsland.pdf'},
 {'page': 1,
  'document': 'Machine Learning - An Algorithmic Perspective, Second Edition, by Stephen Marsland.pdf'},
 {'page': 1,
  'document': 'Machine Learning - An Algorithmic Perspective, Second Edition, by Stephen Marsland.pdf'}]

We are now finally at the point where we can embed and then upload our chunks. We use the `from_texts` method on our `textbook_vector_store`, passing through our `texts`, `embedding_model`, `metadatas`, and `index_name`. Suprisingly, due to the amazing power and abstraction given to us by Langchain, this one line of code embeds and uploads all 2285 chunks of our textbook.

In [13]:
textbook_vector_store.from_texts(
    texts=texts,
    embedding=embedding_model,
    metadatas=metadatas,
    index_name="textbook-vector-store"
)

<langchain_pinecone.vectorstores.PineconeVectorStore at 0x11a25f0d0>

Let's quickly make a function to conduct a similarity search on the vector store with respect to a user query we enter, setting the number of items to retrieve (k) to 5, printing out the retrieved context chunk and the corresponding page number and source document metadata. We use the `similarity_search` method on our `textbook_vector_store`, passing through our query and k number as parameters, loop through the results, and print the relevant information.

In [14]:
def retrieve_and_display(query, k=5):
    results = textbook_vector_store.similarity_search(query=query, k=k)
    for result in results:
        print(f"Text: {result.page_content}\n")
        print(f"Document: {result.metadata['document']}\n")
        print(f"Page: {int(result.metadata['page'])}")
        print("\n")

Wow! Look at that, we see the context which was retrieved, which is by default ordered by descending with respect to the cosine similarity between the chunk and the user query, in addition to the name of the source document of that chunk and the page number from which the chunk came from so we can always cross-check the information.

In [15]:
retrieve_and_display("tell me about neural networks")

Text: neural networks so that they can do something useful.
The question we need to think about ﬁrst is how our neurons can learn. We are going to
look atsupervised learning for the next few chapters, which means that the algorithms will
learn by example: the dataset that we learn from has the correct output values associated
with each datapoint. At ﬁrst sight this might seem pointless, since if you already know the

Document: Machine Learning - An Algorithmic Perspective, Second Edition, by Stephen Marsland.pdf

Page: 64


Text: a well-known introduction to neural networks:
•D.E. Rumelhart, G.E. Hinton, and R.J. Williams. Learning internal representations
by back-propagating errors. Nature, 323(99):533–536, 1986a.
•D.E. Rumelhart, J.L. McClelland, and the PDP Research Group, editors. Parallel
Distributed Processing . MIT Press, Cambridge, MA, 1986b.
•R. Lippmann. An introduction to computing with neural nets. IEEE ASSP Magazine ,
pages 4–22, 1987.

Document: Machine Learning - An Algo

And lastly, let's just put this into a function, to make it easy to load, process, & upload different documents a bit more efficiently.

#### First let's do our imports and define the two global variables we need

In [None]:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
    
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_pinecone import PineconeVectorStore

embedding_model = HuggingFaceEmbeddings()
textbook_vector_store = PineconeVectorStore.from_existing_index(embedding=embedding_model, index_name="textbook-vector-store")

In [None]:
def process_textbook_to_vector_store():
    pdf_loader = PyPDFLoader("Machine Learning - An Algorithmic Perspective, Second Edition, by Stephen Marsland.pdf")
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50, length_function=len)
    
    chunks = pdf_loader.load_and_split(text_splitter=text_splitter)
    
    texts = [chunk.page_content for chunk in chunks]
    metadatas = [{'page': chunk.metadata['page'] + 1, 'document': chunk.metadata['source']} for chunk in chunks]
    
    textbook_vector_store.from_texts(
        texts=texts,
        embedding=embedding_model,
        metadatas=metadatas,
        index_name="textbook-vector-store"
    )
    print(f"Sucessfully Uploaded {len(texts)} Texts & {len(metadatas)} Metadatas to the Pinecone Textbook Vector Store.")
    return

In [None]:
process_textbook_to_vector_store()

(Exercise: Try this on your own documents, notes or textbooks. Additionally, see if you can get it to work with Langchain's `PyPDFDirectoryLoader`.)

#### Wonderful, we can see everything is working as expected. It's now time to move onto the next part.

### Part 2: Creating the RAG Component

Now, that we have gone through the process of loading, splitting, embedding and uploading our textbook to the vector database, and confirmed it's working as expected, it's time to connect this system up to a language model, so we can create our Question-Answering RAG Assistant.

Like before, let's inspect our imports to make sure we understand the tools we'll be using. Our first import from the `langchain_openai` integration, where we import the `ChatOpenAI` module. Our second import is from the `langchain_core` parent module, where we specifically capture the `prompts` sub-module in which we import the `ChatPromptTemplate` class.

We covered the `HuggingFaceEmbeddings` and `PineconeVectorStore` imports previously, so we won't worry about covering it here again.

In [1]:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

from langchain_huggingface import HuggingFaceEmbeddings
from langchain_pinecone import PineconeVectorStore

Next, we need to create our prompt template. The `ChatPromptTemplate` class provides a variety of flexible prompt templates which accommodates to various situations. When using the `ChatOpenAI` module, its most common to use the `from_messages` method, which formats system messages, human messages, and AI messages in an easy-to-understand list format which OpenAI LLM's expect.

We set a pretty specific system message, making sure to instruct the LLM to answer the users question only using the provided context, and if the context doesn't contain the answer, to let the user know this, instead of just hallucinating a response. We also, just pass through a `context` and `query` variable, which will get filled in with information at run time.

In [2]:
template = ChatPromptTemplate.from_messages([
    ("system", "Using ONLY the provided context, answer the user's query. If the provided context doesn't contain the answer, then return 'I don't have enough information to accurately answer the question.'"),
    ("system", "Context: {context}"),
    ("human", "User Query: {query}"),
    ("ai", "Answer:")
])

Now we simply setup our inference model, in this case setting a `llm` variable to a `ChatOpenAI()` model, passing through our model name, tempature, max tokens, and timeout. There are many, many more parameters you can pass in here to customize your model, but this simple setup covers pretty much all the essentials.

In [3]:
llm = ChatOpenAI(
    model="gpt-4o",
    temperature=0.3,
    max_tokens=None,
    timeout=None,
)

Now we need a mechanism to capture a user query, saving it to a variable `user_query`.

In [4]:
user_query = input("Please Enter Your Query: ")

Even though we defined the embedding model and vector store variables earlier, since we are going to want this section to be independent to the last section, we will redefine them here for clarification. Its helpful to think of the earlier part as the "Document Processing & Uploading Pipeline" and this section as the "Query, Retrieve, & Generate Pipeline".

Each should work independently, as we may want upload many, many more textbooks in the future, but the mechanism to query that vector store and generate a response will be the same.

In [5]:
embedding_model = HuggingFaceEmbeddings()
textbook_vector_store = PineconeVectorStore.from_existing_index(embedding=embedding_model, index_name="textbook-vector-store")



Now, just like before, we'll use the `similarity_search` method on out vector store to conduct a similarity search with respect to the user query, passing through a k value of 10 instead of 5 to improve the RAG performance and reliability of the system. We will set the retrieved context to a `retrieved_contexts` variable.

Inspecting the format of the retrieved context, we see its in the same format as earlier. We need to loop through all the retrieved context, parsing and extracting the page_content to a context list.

In [6]:
retrieved_contexts = textbook_vector_store.similarity_search(query=user_query, k=10)
retrieved_contexts

[Document(page_content='neural networks so that they can do something useful.\nThe question we need to think about ﬁrst is how our neurons can learn. We are going to\nlook atsupervised learning for the next few chapters, which means that the algorithms will\nlearn by example: the dataset that we learn from has the correct output values associated\nwith each datapoint. At ﬁrst sight this might seem pointless, since if you already know the', metadata={'document': 'Machine Learning - An Algorithmic Perspective, Second Edition, by Stephen Marsland.pdf', 'page': 64.0}),
 Document(page_content='a well-known introduction to neural networks:\n•D.E. Rumelhart, G.E. Hinton, and R.J. Williams. Learning internal representations\nby back-propagating errors. Nature, 323(99):533–536, 1986a.\n•D.E. Rumelhart, J.L. McClelland, and the PDP Research Group, editors. Parallel\nDistributed Processing . MIT Press, Cambridge, MA, 1986b.\n•R. Lippmann. An introduction to computing with neural nets. IEEE ASSP M

And that's exactly what we do here. We simply loop through each doc in `retrieved_contexts` and using a list comprehension, extract the parsed context to a new context list.

Looking at the context list, this looks much cleaner now.

In [7]:
context = [doc.page_content for doc in retrieved_contexts]
context

['neural networks so that they can do something useful.\nThe question we need to think about ﬁrst is how our neurons can learn. We are going to\nlook atsupervised learning for the next few chapters, which means that the algorithms will\nlearn by example: the dataset that we learn from has the correct output values associated\nwith each datapoint. At ﬁrst sight this might seem pointless, since if you already know the',
 'a well-known introduction to neural networks:\n•D.E. Rumelhart, G.E. Hinton, and R.J. Williams. Learning internal representations\nby back-propagating errors. Nature, 323(99):533–536, 1986a.\n•D.E. Rumelhart, J.L. McClelland, and the PDP Research Group, editors. Parallel\nDistributed Processing . MIT Press, Cambridge, MA, 1986b.\n•R. Lippmann. An introduction to computing with neural nets. IEEE ASSP Magazine ,\npages 4–22, 1987.',
 'to look at the neural network solution proposed by Rumelhart, Hinton, and McClelland,\nthe Multi-layer Perceptron (MLP), which is still one

Now that we have the user query and the context, we are ready to populate our prompt template with these values. So using the `invoke` method on our template, we pass through our `user_query` and `context` variables through to their respective key.

We set this finalized prompt to the `prompt_value` variable.

In [8]:
prompt_value = template.invoke({
    "query": user_query,
    "context": context
})

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


And now, that our prompt template has been populated, we are ready to pass through our prompt to the LLM using the `invoke` method on our `llm`, passing through our `prompt_value` which contains our prompt. We capture the response in the `response` variable, and print out the response.

Wow! Look at that. We got the response from the LLM, we just need to parse the content of the response and then we have for all intents and purposes created a RAG Pipeline in Langchain.

In [9]:
response = llm.invoke(prompt_value)
response

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


AIMessage(content='Neural networks are computational models inspired by the human brain, consisting of interconnected nodes (neurons) that process information. They can learn from data, particularly through supervised learning, where the dataset includes correct output values for each data point. One common type of neural network is the Multi-layer Perceptron (MLP), which is widely used in machine learning. Neural networks can solve various problems, but understanding how they work is crucial to avoid poor results. Key references in the field include works by Rumelhart, Hinton, McClelland, and others.', response_metadata={'token_usage': {'completion_tokens': 115, 'prompt_tokens': 1209, 'total_tokens': 1324}, 'model_name': 'gpt-4o', 'system_fingerprint': 'fp_ce0793330f', 'finish_reason': 'stop', 'logprobs': None}, id='run-33481a60-d27e-4cb7-8c6d-e1dc0651a448-0', usage_metadata={'input_tokens': 1209, 'output_tokens': 115, 'total_tokens': 1324})

Extracting out the content, we get our final response from the LLM which is now our question answered using only chunks of retrieved context which we have just uploaded a few minutes prior. Beautiful.

In [10]:
parsed_response = response.content
parsed_response

'Neural networks are computational models inspired by the human brain, consisting of interconnected nodes (neurons) that process information. They can learn from data, particularly through supervised learning, where the dataset includes correct output values for each data point. One common type of neural network is the Multi-layer Perceptron (MLP), which is widely used in machine learning. Neural networks can solve various problems, but understanding how they work is crucial to avoid poor results. Key references in the field include works by Rumelhart, Hinton, McClelland, and others.'

For a bit of fun, let's inspect the prompt actually sent to the model...

Oh, that's not very helpful

In [11]:
prompt_value

ChatPromptValue(messages=[SystemMessage(content="Using ONLY the provided context, answer the user's query. If the provided context doesn't contain the answer, then return 'I don't have enough information to accurately answer the question.'"), SystemMessage(content="Context: ['neural networks so that they can do something useful.\\nThe question we need to think about ﬁrst is how our neurons can learn. We are going to\\nlook atsupervised learning for the next few chapters, which means that the algorithms will\\nlearn by example: the dataset that we learn from has the correct output values associated\\nwith each datapoint. At ﬁrst sight this might seem pointless, since if you already know the', 'a well-known introduction to neural networks:\\n•D.E. Rumelhart, G.E. Hinton, and R.J. Williams. Learning internal representations\\nby back-propagating errors. Nature, 323(99):533–536, 1986a.\\n•D.E. Rumelhart, J.L. McClelland, and the PDP Research Group, editors. Parallel\\nDistributed Processin

Let's parse this so it looks a bit nicer...

In [12]:
messages = prompt_value.messages

system_instructions = messages[0].content
context_message = messages[1].content
user_query_message = messages[2].content
llm_response = parsed_response

In [13]:
print(f"SYSTEM INSTRUCTIONS:\n{system_instructions}")
print(f"CONTEXT SENT TO MODEL:\n{context_message}")
print(f"ENTERED USER QUERY:\n{user_query_message}")
print(f"RESPONSE FROM LLM:\n{llm_response}")

SYSTEM INSTRUCTIONS:
Using ONLY the provided context, answer the user's query. If the provided context doesn't contain the answer, then return 'I don't have enough information to accurately answer the question.'
CONTEXT SENT TO MODEL:
Context: ['neural networks so that they can do something useful.\nThe question we need to think about ﬁrst is how our neurons can learn. We are going to\nlook atsupervised learning for the next few chapters, which means that the algorithms will\nlearn by example: the dataset that we learn from has the correct output values associated\nwith each datapoint. At ﬁrst sight this might seem pointless, since if you already know the', 'a well-known introduction to neural networks:\n•D.E. Rumelhart, G.E. Hinton, and R.J. Williams. Learning internal representations\nby back-propagating errors. Nature, 323(99):533–536, 1986a.\n•D.E. Rumelhart, J.L. McClelland, and the PDP Research Group, editors. Parallel\nDistributed Processing . MIT Press, Cambridge, MA, 1986b.\n•

### Time to put it in a function

#### First let's do our imports and define the two global variables we need

In [None]:
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_pinecone import PineconeVectorStore

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

embedding_model = HuggingFaceEmbeddings()
textbook_vector_store = PineconeVectorStore.from_existing_index(embedding=embedding_model, index_name="textbook-vector-store")

In [None]:
def textbook_rag():

    user_query = input("Please Enter Your Query: ")

    template = ChatPromptTemplate.from_messages([
        ("system", "Using ONLY the provided context, answer the user's query. If the provided context doesn't contain the answer, then return 'I don't have enough information to accurately answer the question'"),
        ("system", "Context: {context}"),
        ("human", "User Query: {query}"),
        ("ai", "Answer:")
    ])

    llm = ChatOpenAI(
        model="gpt-4o",
        temperature=0.3,
        max_tokens=None,
        timeout=None,
    )

    retrieved_contexts = textbook_vector_store.similarity_search(query=user_query, k=10)
    context = [doc.page_content for doc in retrieved_contexts]

    prompt_value = template.invoke({
        "query": user_query,
        "context": context
    })

    response = llm.invoke(prompt_value)
    parsed_response = response.content
    print(parsed_response)
    return

In [None]:
textbook_rag()

Perfect, we have now created a complete, albeit simple RAG implementation using only Langchain components/integrations, starting at external document loading, splitting, embedding, and uploading, and then conducting on the spot retrieval based on a user query to a GPT-4o inference model, printing out the results.

Please use this notebook as a base, and in your own time, experiment by adding components, swapping out components, and creating some cool applications!