RAG consists of 1) chunking, 2) embedding those chunks, 3) retrieving those chunks. We will start with 1).


In [27]:
%pip install pypdf pymilvus tqdm openai voyageai cohere llama-parse nest-asyncio semantic-router langchain-experimental pydantic langchain_voyageai llama-index-embeddings-voyageai python-dotenv ipywidgets

Collecting ipywidgets
  Using cached ipywidgets-8.1.5-py3-none-any.whl.metadata (2.3 kB)
Collecting widgetsnbextension~=4.0.12 (from ipywidgets)
  Using cached widgetsnbextension-4.0.13-py3-none-any.whl.metadata (1.6 kB)
Collecting jupyterlab-widgets~=3.0.12 (from ipywidgets)
  Using cached jupyterlab_widgets-3.0.13-py3-none-any.whl.metadata (4.1 kB)
Using cached ipywidgets-8.1.5-py3-none-any.whl (139 kB)
Using cached jupyterlab_widgets-3.0.13-py3-none-any.whl (214 kB)
Using cached widgetsnbextension-4.0.13-py3-none-any.whl (2.3 MB)
Installing collected packages: widgetsnbextension, jupyterlab-widgets, ipywidgets
Successfully installed ipywidgets-8.1.5 jupyterlab-widgets-3.0.13 widgetsnbextension-4.0.13
Note: you may need to restart the kernel to use updated packages.


We will be using a few hosted APIs for this demo. We will be using OpenAI for the base LLM, Llama Cloud for effective PDF parsing, VoyageAI for embeddings and re-ranking, and Milvus as our vector database.

The env variables we need are listed in the .example_env file. To follow along, create accounts at each of these companies and get the API keys. Then, copy the .example_env file to a new file called .env and fill in the values.


In [11]:
# load the values from the .env file
import os
from dotenv import load_dotenv

load_dotenv()

True

For this template, we are reading in cyber security regulation PDFs. We need to extract the text from the PDFs, chunk that text, and feed it into our vector database as embeddings.

There are many ways to extract text from PDFs. I will go through a free way that is does the job and a paid way that does an even better job, which you might want to use for production use cases.


We want a pdf returned as a list of strings, a string for each page. This will be helpful for chunking as we'll see later. The first approach is to use the open source, free PdfReader library. This approach is fine. However, because PDFs have artifacts and strange formatting, the parsed text may be off a bit or contain artifacts that could confuse the LLM.


In [6]:
from pypdf import PdfReader


def extract_pages_of_text_from_pdf(pdf_path: str) -> list[str]:
    reader = PdfReader(pdf_path)
    return [page.extract_text() for page in reader.pages]

A more sophisticated approach is to use a service, like Llama Parse, which uses ML to smartly parse PDFs into markdown. Markdown can be easier for an LLM to ingest and can better represent the structure of a PDF that isn't just text. In fact, you actually pass in parsing instructions to Llama Parse for how the pdf should be parsed. For our use-case, it's pretty straight forward.

This approach takes significantly longer (5-10x longer). It may not be worth it for your use-case.


In [8]:
from llama_parse import LlamaParse
import nest_asyncio

# we need to call this in the colab notebook because llama parsing uses an async api under the hood
nest_asyncio.apply()


def extraxt_pages_of_text_from_pdf_llama_parse(
    pdf_path: str, parsing_instructions: str
) -> list[str]:
    parser = LlamaParse(
        api_key=os.environ["LLAMA_CLOUD_API_KEY"],
        result_type="markdown",
        parsing_instructions=parsing_instructions,
    )
    documents = parser.load_data(pdf_path)
    return [d.text for d in documents]

Let's create a function to take in a directory, loop through all of the PDFs in the directory, and parse them.


In [9]:
from pathlib import Path

parsing_instructions = "The provided document is a regulatory document written by the Department of Fiancial Services. It only contains legal text."


def read_pdfs_in_directory(directory: Path) -> dict[str, list[str]]:
    all_pdfs: dict[str, list[str]] = {}

    for pdf_file in directory.glob("*.pdf"):
        parsed_pdf_pages = extraxt_pages_of_text_from_pdf_llama_parse(
            pdf_path=str(pdf_file), parsing_instructions=parsing_instructions
        )
        # if we wanted to use PdfReader instead
        # parsed_pdf_pages = extract_pages_of_text_from_pdf(pdf_path=str(pdf_file))
        all_pdfs[pdf_file.name] = parsed_pdf_pages

    return all_pdfs

In [10]:
dir_path = Path("files")

# Read all PDFs in the directory
pdf_contents = read_pdfs_in_directory(dir_path)

# Print the first page of each PDF (if available)
for filename, pages in pdf_contents.items():
    if pages:
        print(f"First page of {filename}:")
        print(pages[0][:500])  # Print first 500 characters
        print("\n" + "=" * 50 + "\n")  # Separator
    else:
        print(f"No content found in {filename}")

# Print total number of PDFs processed
print(f"Total PDFs processed: {len(pdf_contents)}")

Started parsing the file under job_id c3168486-67fe-4e49-bf67-21cc128930a7
Started parsing the file under job_id 426ffc3d-adb5-44e3-b446-8741107d3719
Started parsing the file under job_id 9520e113-4144-4931-9f93-d7967d083063
First page of 23.11.1 Regulations - Financial Services_ Final Adoption of the 2nd Amendment to Regulation 23 NYCRR 500_ Assessment of Public Comments.pdf:
# Assessment of Public Comments on the Revised Proposed Second Amendment to 23 NYCRR Part 500

The New York State Department of Financial Services (“Department” or “DFS”) received comments from banking, insurance, and other industry groups, regulated organizations, unregulated businesses, and members of a law school law society regarding the Second Amendment to 23 NYCRR Part 500 (the “Cybersecurity Regulation”).

Commenters made comments that were previously submitted and that DFS addressed in i


First page of 23.11.21 Clean Second Amendment Part 500 FINAL.pdf:
# NEW YORK STATE

# DEPARTMENT OF FINANCIAL SERVICE

Great, now `pdf_contents` is a dictionary with the key as the filename, and the value as a list of strings, represending the text of the file, where each string is a page from the file.


Now that we have the parsed text for our knowledge base, we need to 1) turn it into a format that we can query for semantic matches, and 2) store that format in a database we can query.

We will break the text up into chunks. Each chunk will be converted into an embedding, which is a list of numbers (vector) that store the semantic meaning of this chunk.

There are many models that turn text chunks into embeddings. Some are open source and some are closed source. It can be helpful to use an embedding model that is trained to work specially for your use case. In this case, Voyage AI has embeddings specifically for legal text chunks.

This model tends to be the most effective for embedding legal documents. We will use it.


In [14]:
import voyageai

voyage_client = voyageai.Client(api_key=os.environ["VOYAGE_API_KEY"])


def embed_text(text: str) -> list[float]:
    return voyage_client.embed([text], model="voyage-law-2").embeddings[0]

In [16]:
test_embedding = embed_text("This is a test to get the embedding dim")
print(f"this model creates embeddings of {len(test_embedding)} dimensions")
print(test_embedding[:10])

this model creates embeddings of 1024 dimensions
[-0.030661506578326225, 0.005531651899218559, 0.030176803469657898, 0.011611496098339558, -0.0042946781031787395, -0.10320452600717545, 0.0400349460542202, -0.015299078077077866, 0.007095373701304197, 0.06134941056370735]


As you can see, this model returns a vector of length 1024 for this test string. This vector is what we'll store in our vector database for querying.


Now that we have a way to turn our text chunks into embeddings with semantic meaning, we need a place to store these chunks and query them.

There are many open and closed source vector databases and postgres extensions for storing and querying vectors.

We will use Milvus as our Vector Database. It is open source, fast and efficient, highly scalable, and easy to use.


In [17]:
from pymilvus import MilvusClient

milvus_client = MilvusClient(
    uri=os.environ["MILVUS_URI"], token=os.environ["MILVUS_API_KEY"]
)

collection_name = "legal_tutorial"

# to get the embedding dimension (how long the vectors the embedding model generates are), we can either look it up online or just see by testing
# in this case, voyage-law-2 produces embeddings of length 1024, but let's still do this test in case we swap it out for a different model
test_embedding = embed_text("Test to see how long the embedding vectors are")
embedding_dim = len(test_embedding)
print(f"{embedding_dim=}")

# if the collection already exists, delete it so we can create it again fresh
if milvus_client.has_collection(collection_name):
    milvus_client.drop_collection(collection_name)

milvus_client.create_collection(
    collection_name=collection_name,
    dimension=embedding_dim,
    metric_type="IP",  # Inner product distance
    consistency_level="Strong",  # Strong consistency level
)

embedding_dim=1024


In [20]:
# define the db model we are going to insert into the vectorDB
from pydantic import BaseModel


class ChunkForDB(BaseModel):
    id: int
    vector: list[float]
    text: str
    filename: str

You might be wondering, how do vector databases work conceptually? Each vector embedded by our embedding model will have a length of 1024. So, imagine a 1024 dimensional space. Each vector is stored in this space. To find "matches" for a query, we embed the query into a 1024 dimensional vector, place that vector in this space, and see which other vectors are close by. We then choose the nearest k vectors to return.


Now let's add the PDF chunks into the vector DB.

You might be wondering, how do you decide how to chunk the text? What if you chunk the text right in the middle of some semantic meaning?

Chunking is an art and the most effective chunk size is use-case dependent. Some strategies involve an overlapping window for each chunk. More advanced strategies involve Semantic Chunking, where you actually pre-embed the text to see which parts are not semantically similar and _should not_ be chunked together.

For example, we wouldn't want a single chunk containing many different subjects or meanings. Ideally, each chunk represents one idea that can fit clearly into the embeddings space.

Remember, we only have 1024 dimensions to encode the semantic meaning for each chunk. If we have many meanings in a single chunk, we will end up encoding weaker signals for each meaning.

We will go through two approaches. The first is the naive approch of creating one chunk per page. Then, we will implement Semantic Chunking. If you are looking for something more sophisticated than the naive approach but more efficient than Semantic Chunkin (which is resource intensive and takes a relativly long time to do), there are many other approaches you should research.


This is the code to go page by page and embed the page.


In [21]:
from tqdm import tqdm


def naive_chunking_per_page() -> list[ChunkForDB]:
    embeddings: list[ChunkForDB] = []
    index = 0
    for filename, pages in pdf_contents.items():
        for page in tqdm(pages, desc=f"Creating embeddings for file {filename}."):
            embeddings.append(
                ChunkForDB(
                    id=index, vector=embed_text(page), text=page, filename=filename
                )
            )
            index = index + 1

Next, we will implement Semantic Chunking. To get more in depth on how it works, check out this video: `https://youtu.be/TcRRfcbsApw?si=DdflUrsZHsxpdd82`.

This takes significantly longer to run since it has to embed the text, in many ways, just to chunk it.

I have two examples of creating Semantic Search. The first is commented out and it's using LangChain. The second, which is being used, is LlamaIndex.

LangChain and LlamaIndex are two open source frameworks that provide helper functions for many LLM use cases. They have great documentation. For something complicated like Semantic Chunking, I am outsourcing the logic to them. Consult their documentation for more in depth information.


In [33]:
# using langchain
from langchain_experimental.text_splitter import SemanticChunker
from langchain_voyageai import VoyageAIEmbeddings


def semantic_chunks_from_files_langchain(
    pages_by_filename: dict[str, list[str]],
) -> list[ChunkForDB]:
    embed_model = VoyageAIEmbeddings(
        voyage_api_key=os.environ["VOYAGE_API_KEY"], model="voyage-law-2"
    )
    # you can find more information about how LangChain's SemanticChunker works in their docs
    semantic_chunker = SemanticChunker(
        embed_model, breakpoint_threshold_type="percentile"
    )

    chunks_for_db: list[ChunkForDB] = []
    index = 0
    for filename, pages in pages_by_filename.items():
        print(f"Creating semantic embeddings for file {filename}.")
        semantic_chunks = semantic_chunker.create_documents(texts=pages)
        for chunk in semantic_chunks:
            chunks_for_db.append(
                ChunkForDB(
                    id=index,
                    vector=embed_text(chunk.page_content),
                    text=chunk.page_content,
                    filename=filename,
                )
            )
            index = index + 1
    return chunks_for_db

In [34]:
# using llama_index
from llama_index.core import Document
from llama_index.core.node_parser import SemanticSplitterNodeParser
from llama_index.embeddings.voyageai import VoyageEmbedding


def semantic_chunks_from_files_llama_index(
    pages_by_filename: dict[str, list[str]],
) -> list[ChunkForDB]:
    embed_model = VoyageEmbedding(
        voyage_api_key=os.environ["VOYAGE_API_KEY"], model_name="voyage-law-2"
    )
    # you can find more information about how Llama Index's SemanticSplitterNodeParser works in their docs
    semantic_chunker = SemanticSplitterNodeParser(
        buffer_size=1, breakpoint_percentile_threshold=95, embed_model=embed_model
    )

    chunks_for_db: list[ChunkForDB] = []
    index = 0
    for filename, pages in pages_by_filename.items():
        print(f"Creating semantic embeddings for file {filename}.")
        semantic_chunks = semantic_chunker.get_nodes_from_documents(
            [Document(text=p) for p in pages], show_progress=True
        )
        for chunk in semantic_chunks:
            chunks_for_db.append(
                ChunkForDB(
                    id=index,
                    vector=embed_text(chunk.text),
                    text=chunk.text,
                    filename=filename,
                )
            )
            index = index + 1
    return chunks_for_db

In [35]:
# we'll use llama index's for now
# chunks_for_db = semantic_chunks_from_files_langchain(pages_by_filename=pdf_contents)
chunks_for_db = semantic_chunks_from_files_llama_index(pages_by_filename=pdf_contents)

Creating semantic embeddings for file 23.11.1 Regulations - Financial Services_ Final Adoption of the 2nd Amendment to Regulation 23 NYCRR 500_ Assessment of Public Comments.pdf.


Generating embeddings: 100%|██████████| 22/22 [00:01<00:00, 11.88it/s]
Generating embeddings: 100%|██████████| 9/9 [00:00<00:00, 10.55it/s]
Generating embeddings: 100%|██████████| 11/11 [00:01<00:00,  9.42it/s]
Generating embeddings: 100%|██████████| 6/6 [00:00<00:00,  9.90it/s]
Generating embeddings: 100%|██████████| 8/8 [00:00<00:00, 11.15it/s]
Generating embeddings: 100%|██████████| 10/10 [00:00<00:00, 12.35it/s]
Generating embeddings: 100%|██████████| 12/12 [00:01<00:00, 10.94it/s]
Generating embeddings: 100%|██████████| 10/10 [00:01<00:00,  9.91it/s]
Generating embeddings: 100%|██████████| 10/10 [00:00<00:00, 11.89it/s]
Generating embeddings: 100%|██████████| 8/8 [00:00<00:00,  9.80it/s]
Generating embeddings: 100%|██████████| 12/12 [00:01<00:00, 10.70it/s]
Generating embeddings: 100%|██████████| 9/9 [00:00<00:00,  9.87it/s]
Generating embeddings: 100%|██████████| 9/9 [00:00<00:00,  9.77it/s]
Generating embeddings: 100%|██████████| 13/13 [00:01<00:00, 11.76it/s]
Generating embeddi

Creating semantic embeddings for file 23.11.21 Clean Second Amendment Part 500 FINAL.pdf.


Generating embeddings: 100%|██████████| 13/13 [00:01<00:00, 12.49it/s]
Generating embeddings: 100%|██████████| 16/16 [00:01<00:00, 11.79it/s]
Generating embeddings: 100%|██████████| 4/4 [00:00<00:00,  7.93it/s]
Generating embeddings: 100%|██████████| 9/9 [00:00<00:00, 10.40it/s]
Generating embeddings: 100%|██████████| 3/3 [00:00<00:00,  6.45it/s]
Generating embeddings: 100%|██████████| 4/4 [00:00<00:00,  8.15it/s]
Generating embeddings: 100%|██████████| 9/9 [00:00<00:00, 11.61it/s]
Generating embeddings: 100%|██████████| 5/5 [00:00<00:00,  9.07it/s]
Generating embeddings: 100%|██████████| 5/5 [00:00<00:00,  9.26it/s]
Generating embeddings: 100%|██████████| 11/11 [00:01<00:00,  9.86it/s]
Generating embeddings: 100%|██████████| 4/4 [00:00<00:00,  5.58it/s]
Generating embeddings: 100%|██████████| 9/9 [00:00<00:00,  9.02it/s]
Generating embeddings: 100%|██████████| 9/9 [00:01<00:00,  8.70it/s]
Generating embeddings: 100%|██████████| 17/17 [00:01<00:00, 13.41it/s]
Generating embeddings: 100

Creating semantic embeddings for file 23.5.17 APC.pdf.


Generating embeddings: 100%|██████████| 6/6 [00:00<00:00,  6.72it/s]
Generating embeddings: 100%|██████████| 14/14 [00:00<00:00, 14.87it/s]
Generating embeddings: 100%|██████████| 14/14 [00:01<00:00, 10.02it/s]
Generating embeddings: 100%|██████████| 10/10 [00:01<00:00,  9.16it/s]
Generating embeddings: 100%|██████████| 11/11 [00:01<00:00, 10.51it/s]
Generating embeddings: 100%|██████████| 10/10 [00:01<00:00,  8.54it/s]
Generating embeddings: 100%|██████████| 10/10 [00:01<00:00,  9.23it/s]
Generating embeddings: 100%|██████████| 11/11 [00:01<00:00,  9.97it/s]
Generating embeddings: 100%|██████████| 12/12 [00:01<00:00, 10.74it/s]
Generating embeddings: 100%|██████████| 13/13 [00:01<00:00, 10.62it/s]
Generating embeddings: 100%|██████████| 8/8 [00:00<00:00,  9.90it/s]
Generating embeddings: 100%|██████████| 11/11 [00:00<00:00, 11.99it/s]
Generating embeddings: 100%|██████████| 12/12 [00:01<00:00, 11.43it/s]
Generating embeddings: 100%|██████████| 10/10 [00:01<00:00,  7.74it/s]
Generating

In [39]:
# see what a the first chunk of text and it's embedding looks like
print(chunks_for_db[0].text)
print(chunks_for_db[0].vector)

# Assessment of Public Comments on the Revised Proposed Second Amendment to 23 NYCRR Part 500

The New York State Department of Financial Services (“Department” or “DFS”) received comments from banking, insurance, and other industry groups, regulated organizations, unregulated businesses, and members of a law school law society regarding the Second Amendment to 23 NYCRR Part 500 (the “Cybersecurity Regulation”).

Commenters made comments that were previously submitted and that DFS addressed in its initial assessment of public comments, which were published in the State Register on June 28, 2023 (the “Assessment”). See the Assessment for detailed responses to those comments. DFS received the following new comments.

Commenters stated their support for the following changes in the Cybersecurity Regulation:

1. Providing clarification that only those affiliates sharing information systems, cybersecurity resources, or all or any part of a cybersecurity program with a covered entity should 

In [43]:
# insert the embeddings into the vector database
# milvus takes in a list of dictionaries for data, so dump the pydantic model into a dictionary
milvus_client.insert(
    collection_name=collection_name, data=[e.model_dump() for e in chunks_for_db]
)

{'insert_count': 302, 'ids': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 

Now it's time to put it all together to get some results from queries!

First, we will take in our user's query and embed it. Now, we can compare this embedding to the embeddings of the chunks in our database. This is the `milvus_client.search` step.

Milvus will return the top `n_chunks_from_db` chunks that match the query. We could then return those chunks to our LLM for context. But we can do even better.

There are specific models that are trained to take in queries, chunks of context, and rank those chunks based on relevance to the query. This allows us to give the LLM the most relevant results, sorted by relevance from a model specifically trained for this task.

LLMs work best when they have high quality, focused information without noise. That is what this step is helping to achieve.

We will use Voyage AI's reranker model to do this. The `voyage_client.rerank` step will return `n_chunks_from_reranking` chunks. This should filter out the less relevant chunks and return the relevent ones in order of relevence. From there, we can send those chunks to our LLM for context.


In [44]:
def return_scored_chunks(
    query: str, n_chunks_from_db: int = 20, n_chunks_from_reranking: int = 10
) -> list[dict]:
    # returns a list of { chunk: str, relevance_score: float }

    # first, get the raw documents from the vector db
    search_res = milvus_client.search(
        collection_name=collection_name,
        data=[
            embed_text(query)
        ],  # Use the `embed_text` function to convert the question to an embedding vector
        limit=n_chunks_from_db,  # Return top 10 results
        search_params={"metric_type": "IP", "params": {}},  # Inner product distance
        output_fields=["text"],  # Return the text field
    )

    # milvus already gives a distance but we want to use a better re-ranking strategy.
    # We will run the query and the chunks through the voyage reranker
    text_with_distances: list[dict] = [
        {"text": res["entity"]["text"], "distance": res["distance"]}
        for res in search_res[0]
    ]

    reranking = voyage_client.rerank(
        query,
        [i["text"] for i in text_with_distances],
        model="rerank-2",
        top_k=n_chunks_from_reranking,
    )
    # for r in reranking.results:
    # print(f"Document: {r.document}")
    # print(f"Relevance Score: {r.relevance_score}")

    return sorted(
        [
            {"chunk": r.document, "relevance_score": r.relevance_score}
            for r in reranking.results
        ],
        key=lambda x: x["relevance_score"],
        reverse=True,
    )

Finally, we can create our system and user prompts to send to our LLM. In this case, we are using gpt-4o.


In [45]:
from openai import OpenAI

SYSTEM_PROMPT = """
You are a helpful AI assistant that provides accurate answers to user questions using only the information provided in the context below. Do not introduce any information not included in the context. If you cannot answer the question based on the context, politely inform the user that you don't have enough information to answer. Do not mention the context or question in your response.
"""

USER_PROMPT = """
Context:
<<CONTEXT>>

Question:
<<QUESTION>>
"""


def get_response(query: str) -> str:
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

    chunk_results = return_scored_chunks(
        query=query, n_chunks_from_db=20, n_chunks_from_reranking=10
    )
    chunks = [i["chunk"] for i in chunk_results]

    user_prompt = USER_PROMPT.replace("<<QUESTION>>", query).replace(
        "<<CONTEXT>>", "\n".join(chunks)
    )

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": user_prompt},
        ],
    )
    print(response.choices[0].message.content)

Now we can test our code.


In [46]:
query = "what are the main things banks have to do if they are hacked?"
get_response(query=query)

Banks, as well as other regulated entities, must do the following if they are hacked:

1. **Conduct Incident Response:** They must have adequate written incident response plans in place designed to detect, respond to, and recover from unauthorized access to or use of customer information.

2. **Perform a Scope Assessment:** Conduct a scope assessment to determine the impact and extent of the incident, even though an explicit reference in the regulation is considered unnecessary.

3. **Notify Authorities:** Timely notifications are required to be given to the Department of Financial Services (DFS) to help identify techniques used by attackers and enable a quick response to new threats. The timeframe for these notifications should not be delayed for any reason.

4. **Recover Information:** Covered entities should have plans to recover information from backups, despite concerns about the potential compromise of backups.

5. **Annual Testing:** Test their incident response plans at least a

In [47]:
query = "describe the cybersecurity program"
get_response(query=query)

The cybersecurity program described must protect the confidentiality, integrity, and availability of the covered entity’s information systems and nonpublic information stored on those systems. It must be based on the entity’s risk assessment and designed to:

1. Identify and assess internal and external cybersecurity risks.
2. Use defensive infrastructure and policies to protect information systems and nonpublic information.
3. Detect cybersecurity events.
4. Respond to detected cybersecurity events to mitigate negative effects.
5. Recover from cybersecurity events and restore normal operations.
6. Fulfill applicable regulatory reporting obligations.

The program must ensure the safety and soundness of the institution and the protection of its customers’ information. Senior management must take the issue seriously and confirm compliance through an annual certification. The Chief Information Security Officer (CISO) must have adequate authority and resources to implement and maintain the

Looks like it works!


Other improvments to make in the future

- BM25 / hybrid search, so keywords can be used in searching
- [Add context to chunks](https://www.anthropic.com/news/contextual-retrieval)
- Add monitoring to the RAG pipeline with a product like `LangSmith`
- Give the previous and next chunk in addition to the matched chunk to the LLM for more context.
- Add the ability for the LLM to cite the exact sources
