# Hands On with Advanced AI & Emerging Applications

## Requirementes

* Definition of al Large Language Model (LLM)
* Definitions of a Prompt

## Retrieval Augmented Generation

RAG (Retrieval-Augmented Generation) is a technique that is used in natural language processing to enhance the capabilities of Large Language Models (LLMs) by integrating information retrieval. It improves the responses of LLMs by incorporating updated or user-defined data the LLM was originally not trained on. A RAG system is typically composed of two processes: Indexing and Retrieval + Generation.

This module uses the [LangChain](https://www.langchain.com/) to interact with LLMs. Langchain is a framework designed to facititate the development of application powered by LLMs. It provides an abstraction layer by decompossing applications as a set of _chains_ made of prompts, models and data sources. Langchain offers libraries to interact with most of the foundational models available in the market.

In this module we will use [Amazon Bedrock](https://aws.amazon.com/bedrock/) which is a fully managed service from AWS that offers access to several foundationsl models. We have chosen [Cloude Sonnet](https://www.anthropic.com/claude/sonnet) from Anthropic as the LLM to use

### Setting up connection to Amazon Bedrock

In [None]:
pip install -q langchain-community langchain_aws boto3

**Note**: Use this if running from Google Collab

In [None]:
import boto3
from langchain_aws import ChatBedrock
from langchain_aws import ChatBedrock

session = boto3.Session(
    aws_access_key_id="your-key-id",
    aws_secret_access_key="your-key",
    region_name="us-east-1"
)

llm = ChatBedrock(
    model_id="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
    model_kwargs=dict(temperature=0),
    client=session.client("bedrock-runtime")
  )

**Note**: Use this if running from AWS directly

In [None]:
from langchain_aws import ChatBedrock

llm = ChatBedrock(
    model_id="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
    model_kwargs=dict(temperature=0),
    region="us-east-1"
)

In [None]:
ai_msg = llm.invoke("Who is the president of the USA?")
print(ai_msg.content)

**Note**: See how Claude Sonnet, an LLM released in February 2024, was trained on data that is no longer accurate.

### Indexing

The indexing phase is responsible for organizing data so that it can be efficiently retrieved by applications powered by large language models (LLMs). To scale Retrieval-Augmented Generation (RAG) systems and handle large datasets, vector databases are commonly used. These databases store information and facilitate efficient data indexing and querying

For the purpose of this section we are going to load the content of the markdown file `acme_guidelines.md`. This document has guidelines of the ficitonal companny `ACME` about the IP addresses schemes to be used in the device configuration.


LanhChain offers a vast number of classes to load documents, ranging from text files (`.txt`) up to HTML and PDF documents. To load the `acme_guidelines.md` file we are goint ot use the class `TextLoader` from the `langchain.document_loaders` library


In [None]:
from langchain.document_loaders import TextLoader

loader = TextLoader("acme_guidelines.md")
documents = loader.load()

#### Splitting
As a best practice, data to be stored on the Vertor DB is usually splitted into smaller _chunks_ of data. This makes the _querying_ stage more efficient as only relevant pieces of the data are retrieved. Additionally large chunks may latter not fit in the LLM context window. 

There are many strategies to split a text. For example, to consider special characters (HTML tags) or punctiation signs. In our example we are using yet another Langchain Class to split a text. The `MarkdownHeaderTextSplitter` allows us to split markdown files based on _Headers_ or _sections_.



In [None]:
from langchain_text_splitters import MarkdownHeaderTextSplitter

headers_to_split_on = [
    ("#", "Header 1"),
    ("##", "Header 2"),
    ("###", "Header 3"),
]
md_splitter = MarkdownHeaderTextSplitter(headers_to_split_on=headers_to_split_on)

md_splits = md_splitter.split_text(documents[0].page_content)
md_splits[3]

Another alternative to split text files is to use the class `RecursiveCharacterTextSplitter`, which allows us to define separators between chunks, the size of each chunk and the number of items being overlapped between chunks.

In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    separators=["\n\n"], 
    chunk_size=200,
    chunk_overlap=0,
    add_start_index=True,
)
all_splits = text_splitter.split_documents(documents)

At this point we can test the content of a chunk by displaying the attribute `page_content`

In [None]:
all_splits[3].page_content

#### Embedding

The next and last stage in the _Indexing_ process is to actually store the chunks in a Vector DB

As the name implies, a Vector DB saved data as **vectors** not as raw text, therefore and embedding function must transform thus chunks into vectors. In this task we are using an _In-Memory_ Database and [Amazon Titan Embeddings](https://docs.aws.amazon.com/bedrock/latest/userguide/titan-embedding-models.html)

In [None]:
from langchain_core.vectorstores import InMemoryVectorStore
from langchain_aws import BedrockEmbeddings

**Note**: Use this if running from Google Collab

In [None]:
embeddings = BedrockEmbeddings(
    model_id="amazon.titan-embed-text-v2:0", 
    client=session.client("bedrock-runtime")
)

**Note**: Use this if running from AWS directly

In [None]:
embeddings = BedrockEmbeddings(
    model_id="amazon.titan-embed-text-v2:0",
    region_name='us-east-1'
)

In [None]:
vector_store = InMemoryVectorStore(embeddings)
vector_store.add_documents(documents=md_splits)

Note that we have used the variable `md_splits` which has the chunks generated by the `MarkdownHeaderTextSplitter`

Let us examine the first two entries on the Vector DB. Note how the dimension of the embeddings in 1024. Only the first three items of each vector are being displayed

In [None]:
for index, (id, doc) in enumerate(vector_store.store.items()):
    if index < 2:
        print(f"{id}: {doc['vector'][0:3]} ({len(doc['vector'])})")


#### Similarity Search

Before actually implement the retrieval stage with the LLM, let us test the _querying_ of the vector VB with the method `similarity_search`

In [None]:
query = "approved security mechanisms"
results = vector_store.similarity_search(query, k=2)
for result in results:
    print(result.page_content)

**Note:** See how the similary search takes into account semantic meaning of the text rather than exact words or grammar.

### Retrieval Using LangChain LCEL

In this task, we are going to use LangChain constructs, namely _chains_,  to test the capabilities of the RAG system. As a first step we will invoke the LLMs without using RAG just to realize the difference

In [None]:
from langchain import hub
from langchain.schema.runnable import RunnablePassthrough
from langchain_core.prompts import PromptTemplate

query = "What IPs are allowed in Ethernet Interfaces?"

#### Query without RAG

In [None]:
chain = PromptTemplate.from_template(template = query) |  llm
result = chain.invoke(input={})
result.content

See how using basic prompt and invoking the LLM directly, the generated response is very generic. The LLM claims that any IP address can be used on a Ethernet Interface

#### Querying with RAG

The idea behind RAG is to provider more context to LLM, so it was use it to generate a response. In our current example the context are the Policies listed in the `acme_guidelines.md` document. We will use a  prompt that apart from specify the question to the LLM also give context to it
```
HUMAN

You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.

Question: {question} 

Context: {context} 

Answer:
```

In [None]:
retrieval_qa_chat_prompt = hub.pull("rlm/rag-prompt")
retrieval_qa_chat_prompt.messages[0]

Note how the `rag_chain` is made from three components:

1. Two parallel branch to get (a) the `context` and (b) `the question`
2. The RAG promp
3. The LLM

In [None]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)
    
rag_chain = (
    {'context': vector_store.as_retriever() | format_docs, 'question' : RunnablePassthrough()} |
    retrieval_qa_chat_prompt |
    llm
)
result = rag_chain.invoke("What IPs are allowed on Ethernet Interfaces")
result.content

See how in this execution the LLM considered the content of the `acme_guidelines.md` markdown file

### Retrieval Using LangGraph

In this task, we are going to use [LangGraph](https://www.langchain.com/langgraph) constructs, namely Nodes and Edges, to test the capabilities of the RAG

**Note**: This task asummes that the in-memory VectorDB has already been indexed. 

In [None]:
!pip install -q langgraph graphviz

We start defining the _State_ of the grpah. The _State_ is the data being exchanged by the Nodes

In [None]:
from langchain_core.documents import Document
from langgraph.graph import START, StateGraph
from typing_extensions import List, TypedDict
from IPython.display import Image, display

class State(TypedDict):
    question: str
    context: List[Document]
    answer: str

In our use-case we define two nodes: 

*  Retrieve: It queries the Vector DB using limiraty search from the input question
*  Generate. It leverages the RAG prompt to give context to the LLM and expand its scope

In [None]:
def retrieve(state: State):
    retrieved_docs = vector_store.similarity_search(state["question"],k=2)
    return {"context": retrieved_docs}

def generate(state: State):
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = retrieval_qa_chat_prompt.invoke({"question": state["question"], "context": docs_content})
    response = llm.invoke(messages)
    return {"answer": response.content}

Then we build our graph by defining node and edges

In [None]:
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

In [None]:
display(Image(graph.get_graph().draw_mermaid_png()))

Finally we invoke our graph

In [None]:
result = graph.invoke({"question": "What IPs are allowed on Ethernet Interfaces"})
result["answer"]

In [None]:
result["context"]