## Install Requirements
Run the cell, below, to install the libraries we'll be using.

In [None]:
# !pip install -qU langchain langchain-core langchain_community langchain_text_splitters langgraph 
# !pip install -qU langchain-google-genai 
# !pip install -qU bs4 
# !pip install -qU python-dotenv typing_extensions 

## Load the API key into the environment
The code, below, loads the API key and stores it where the LangChain libraries (and likely the Google libraries used by the LangChain libraries) expect to find it.

**If you're running this code in Google Colab**, this code assumes you've already stored your API key as a *secret*:

1. Open your Google Colab notebook and click on the 🔑 Secrets tab in the left panel.
2. The Secrets tab is found on the left panel.
3. Create a new secret with the name `GOOGLE_API_KEY`.
4. Copy/paste your API key into the Value input box of `GOOGLE_API_KEY`.
5. Toggle the button on the left to allow notebook access to the secret.

Otherwise, the code assumes that you have a `.env` file that includes `GOOGLE_API_KEY=<your api key here>`. 

In [None]:
import os
import sys

API_KEY = 'GOOGLE_API_KEY'

if 'google.colab' in sys.modules:
    from google.colab import userdata
    os.environ[API_KEY] = userdata.get(API_KEY)
    os.environ[API_KEY]    
else:
    from dotenv import load_dotenv
    load_dotenv()  # Load environment variables from .env file; should include GOOGLE_API_KEY

You can verify that your API key is where it ought to be by uncommenting and running the code cell, below.

In [None]:
# os.getenv(API_KEY)

## Components
Import and instantiate a:
  1. chat model
  2. embedding model
  3. in-memory vector store

Note that we're using the `langchain_google_genai` library instead of the Google Vertex (or OpenAI, or Anthropic, etc.) library. That means you can't simply copy code from the LangChain tutorial. Documentation for the Google GenAI library can be found [here](https://python.langchain.com/api_reference/google_genai/index.html).

In [None]:
from langchain_google_genai import ChatGoogleGenerativeAI
llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash-lite")

In [None]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
embeddings = GoogleGenerativeAIEmbeddings(model="models/text-embedding-004")

In [None]:
from langchain_core.vectorstores import InMemoryVectorStore
vector_store = InMemoryVectorStore(embeddings)

## RAG Pipeline

### Scrape a Web Page

We'll use the `WebBaseLoader` class to scrape a web page we'd like to ask an LLM about. It uses [Beautiful Soup](https://beautiful-soup-4.readthedocs.io/en/latest/) -- another popular library -- to parse the web page (extract its text content). 

Notice how, instead of writing their own HTML parser, the LangChain developers make use of another well-established library. The named parameter `bs_kwargs` is short for "Beautiful Soup key-word arguments. We're passing to `WebBaseLoader` a set of arguments that will be passed to Beautiful Soup functions. A decision to use another library like this comes with trade-offs:
  - To use LangChain, I don't have to write much or any code to control Beautiful Soup. LangChain handles (almost) all of it for me.
  - But now this LangChain class is dependent on (tied to) Beautiful Soup. If Beautiful Soup changes interfaces, `WebBaseLoader` might break.
  - And `WebBaseLoader` is also somewhat less flexible. What if Beautiful Soup isn't my prefered library or doesn't do what I need? So you'll sometimes see one library give you the ability to pass whatever HTML parser you choose. It could be Beautiful Soup or another open-source library or the HTML parser you wrote for fun.

Notice also that we've decided to give Beautiful Soup some more specific instructions, taking content from HTML tags that have a class of `post-content`, `post-title`, or `post-header`. (You could navigate to the web page and open the developer tools to see just what that includes.) Doing so gives us cleaner text to use for our RAG application but at the cost of making our code less general. If I want to query a different web page, there's no reason to think it will use the same class names to identify the important bits. If we add a web page loader to KnotebookLM, we'll need to think about how best to generalize our approach.

In [None]:
import bs4
from langchain_community.document_loaders import WebBaseLoader

# setting a User-Agent to avoid a Beautiful Soup warning
# a User-Agent header tells a web server what kind of client is making the request 
os.environ['USER_AGENT'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'

loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        ),
    ),
)

docs = loader.load()

`docs` is a list of `Document` objects. We only loaded one document, so the length of `docs` is 1.

In [None]:
len(docs)

We can ask Python to tell us the type of that sole document.

In [None]:
type(docs[0])

That's `langchain-core`'s base `Document` class. Consulting the [documentation](https://python.langchain.com/api_reference/core/documents/langchain_core.documents.base.Document.html#langchain_core.documents.base.Document) you can see it is instantiated with two notable properties: `page_content` and `metadata`. (You can also find in the documentation a link to the source code if you want to dig further.)

We'll need to talk about `metadata` later. For now, let's look at the first bit of the `page_content`.

In [None]:
docs[0].page_content[:300]

Compare it to the web page we scraped. Beautiful Soup did a pretty good job, no?

### Split the Text
As a final pre-processing step, we'll split the text into smaller chunks. Read [why](https://python.langchain.com/docs/concepts/text_splitters/#why-split-documents).

Following the tutorial, we'll use the `RecursiveCharacterTextSplitter` class. It implements a [text-structure based](https://python.langchain.com/docs/concepts/text_splitters/#text-structured-based) approach. To better understand how this splitter works and how to control it, read this [guide](https://python.langchain.com/docs/how_to/recursive_text_splitter/) and consult the [documentation](https://python.langchain.com/api_reference/text_splitters/character/langchain_text_splitters.character.RecursiveCharacterTextSplitter.html).

In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)

Let's see how many chunks we've split our document into.

In [None]:
len(all_splits)

They're not all equal length. Based on the guides and documentation you've read, can you explain why?

In [None]:
for idx, split in enumerate(all_splits[:5]):
    print(f"Split {idx} length: {len(split.page_content)}")

Based on what we read, we'd expect to see some overlap between the end of one split and the beginning of the next.

In [None]:
for prev, curr in zip(all_splits[20:25], all_splits[21:26]):
    print('previous: \n', prev.page_content[-50:], '\n')
    print('current: \n', curr.page_content[:50], '\n')
    print('\n------------------\n')

But in this slice of splits, I don't see any overlap. (I tried a few different slices and likewise didn't see any overlaps.) Does that mean it's not working? 

In [None]:
for i in range (len(all_splits) - 1):
    last = all_splits[i].page_content.strip().split()[-1]
    first = all_splits[i+1].page_content.strip().split()[0]
    if last == first:
        print('\n-------- index', i, '----------\n')
        print('previous: \n', last, '\n')
        print('current: \n', first, '\n')

There are only two cases where one split overlaps with the previous, and in both cases it looks like a heading. It's not perfectly clear -- at least not to me -- why `RecursiveCharacterTextSplitter` works this way. It could be the nature of the web page (lots of headings, lots of figures). It could be our the relation between our `chunk_size` and `chunk_overlap`. The documentation isn't super helpful. If we want to know more, we'll likely have to dive into the code and experiment.

When it comes time to write code for KnotebookLM, we'll likely want to play around with chunk and overlap sizes and see what makes most sense for our application.

### Index Splits

If you were implementing this next step without LangChain, you'd likely think of it as two steps:
1. For each split, generate an embedding (a vector that represents the "meaning" of the text in the split)
2. Write the resulting vector and the original text to a database.

LangChain handles both with a single call to the `add_documents` method on the `vector_store` instance we created. (And now you understand why we needed to pass the `embeddings` instance as an argument to `vector_store`.

In [None]:
_ = vector_store.add_documents(documents=all_splits)

Let's take a quick peak at what's in `vector_store`. Each document has a unique identifier (the `id`), the `text` and `metadata` we saw when exploring the splits, and a `vector` -- the document's embedding. Maybe it's a little hard to tell, but those embeddings were returned by making calls to the Google embeddings model.

In [None]:
for index, (id, doc) in enumerate(vector_store.store.items()):
    if index < 3:
        # docs have keys 'id', 'vector', 'text', 'metadata'
        print(f"{id}: {doc['text'][:75]}")
        print(doc['metadata'])
        print(doc['vector'][:5])
        print("\n\n---------------\n\n")
    else:
        break

That's all the pre-processing we need. We're ready to move on to retrieval tasks.

## Retrieve Relevant Chunks, Ask Questions

We've indexed the web page and can now ask questions.

### Prompt

LangChain has a library of task-specific prompts. Let's grab the "RAG" prompt.

In [None]:
from langchain import hub

prompt = hub.pull('rlm/rag-prompt');
prompt

Notice it didn't return a simple string, but rather an instance of `ChatPromptTemplate`. Following the tutorial's walk-through, we can explore it a bit:

In [None]:
example_message, = prompt.invoke(
    { "context": "Here's where we'll put relevant chunks from the web page.", "question": "Here's where our question goes." }
).to_messages()

Notice the comma after `example_message`? That wasn't a mistake. As `to_messages` implies, we might get more than one message. Adding the comma there *destructures* the list `to_messages` returns so that I get just the first item. (In this case, there is only one item.)

Let's see the `content` of that message...

In [None]:
print(example_message.content)

Pretty cool. We pass the prompt a dictionary with `context` and `question` keys and it'll insert their values into our prompt.

### Using LangGraph to Stitch Together the Parts
Here's how the docs describe LangGraph:
> LangGraph is a library for building stateful, multi-actor applications with LLMs, used to create agent and multi-agent workflows.

Translation: LangGraph is a tool for building an application that can coordinate interactions between a user and one or more AI tools -- in this case, our chat and embedding models and the vector store where we stash our document's embeddings. To do so, it needs to "remember" (that's what "stateful" means).

Basically, we'll create the structure of its memory (the `State` class) and define its two operations, `retrieve` and `generate`. Then we define a workflow and "compile" it.

We'll start by importing some classes and types.

In [None]:
from langchain_core.documents import Document
from langgraph.graph import START, StateGraph
from typing_extensions import List, TypedDict

#### State
Next, we'll create the application `State`. It's a class that will inherit from the `TypedDict` class. 

We need to remember the question, the context (the relevant chunks of the web page), and the answer. The *types* (e.g., `str`, `List`) may look new. We're defining what kind of data will be stored at each property. Type informaton helps us (and our tools) catch errors and can make tools like autocompletion more effective. (Not Python, but some languages are *typed* -- they **require** type information.)

In [None]:
class State(TypedDict):
    question: str
    context: List[Document]
    answer: str

#### Retrieval Task
Let's now define the `retrieve` task. This is how we get the most relevant chunks from our document. All we have to do is pass our question to the `similarity_search` method exposed by our `vector_store` instance. The rest is abstracted away. `similarity_search` handles:
  1. generating embeddings for our question
  2. using those embeddings, searching through the vector store for the chunks closest to it in "meaning space"
  3. return to us the most similar chunks

In [None]:
def retrieve(state: State):
    retrieved_docs = vector_store.similarity_search(state["question"])
    return {"context": retrieved_docs}

We don't have to wait to put together the whole "graph" (application). We can test drive `retrieve` ourselves. (Note: to help us later, I'm going to add the result of `retrieve` to `test_state`.)

In [None]:
test_state = State({ "question": "Who wrote the blog post?" })
test_state["context"] = retrieve(test_state)["context"]
test_state

I got back four chunks, probably ordered by similarity. None of them seem to contain the relevant information. (The second chunk mentions an author -- and it happens to be the author of the blog post -- but here it's from a citation of another paper.) The author is listed, but our similarity search didn't find it. So I don't have a lot of hope that it'll answer our question, but it will nevertheless be instructive to see how it goes.

Just to confirm, here's the split that includes the author's name.

In [None]:
all_splits[0].page_content

#### Generation (Question Answering) Task
We also need to define a function to construct our prompt, send it to the chat model, and handle its response: 
  1. Knowing from our exploration that the documents returned from the `retrieve` task are objects with different properties, we first need to extract the `page_content` from each. We'll join them into a single string with a couple of line breaks (`\n\n`) separating them.
  2. Next, we can use the `prompt` template we created earlier, passing to it the question and the chunks we retrieved and processed.
  3. Then we `invoke` our chat model, passing to it the `message` (which we recall is a prompt that contains a kind of system message, the source texts, and our question).
  4. The `response` we get back from the chat model has a `content` property. That's what we'll return as the `answer`. (If you're curious, you can print out the full response to see what other details you get.)

In [None]:
def generate(state: State):
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"question": state["question"], "context": docs_content})
    response = llm.invoke(messages)
    print(response)
    return {"answer": response.content}

Let's give it a try using our example question and the documents we retrieved based on that question.

In [None]:
generate(test_state)

As an experiment, let's see what happens if I add what I know is the relevant information to the context.

In [None]:
test_state["context"].append(all_splits[0])
generate(test_state)

Sweet! That time, we got it. It would be worth experimenting to see if we can tweak our parameters or text-splitting strategy to improve our results. The tool is no good if we have to find the answer and pass it to the LLM.

#### String the Tasks Together
Finally, let's add our task definitions to an instance of `LangGraph` and "compile" it.

In [None]:
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

Now, instead of running each task ourselves, we should be able to `invoke` the graph with a question and get an answer. Let's try it with a new question:

In [None]:
response = graph.invoke({"question": "What is Task Decomposition?"})
print(response["answer"])

In [None]:
response2 = graph.invoke({"question": "You said decomposition breaks large tasks into manageable ones. What makes a task more manageable?" })
print(response2["answer"])

## Evaluation, Next Steps
What do you think of LangChain? Hard to say since you don't have much experience with it and (probably) even less with other libraries that try to do the same. That's the position you often find yourself in: do I commit to spending more time learning this library? Move on to another??

We can also ask how we'd evaluate our application. A couple of things to consider:
  - The application has many parts. How can we judge (and possibly improve) each separately? Once we've chained everything together with `LangGraph`, it's really hard to tell where things are failing or not working well. For example, if we'd just asked about the author without looking separately at the retrieval and generation tasks, we wouldn't know that it fails because we just don't retrieve the right information.
     - We also have to consider that later steps depend on earlier steps. For example, if retrieval isn't working, maybe we need to tweak how we're chunking the text.
  - we relied on a particular embedding and chat model. Would others do better? Worse? What counts as better?

We should also take stock of what this application **doesn't** do. For example, each question we ask will be isolated. `reponse2` shows that we haven't built an application capable of holding a conversation where previous conversational turns influence the next response.