![title.png](https://i.ibb.co/2KmT38V/title.png)

![section-breakpoint.png](https://i.ibb.co/344JqH3/section-breakpoint.png)

![objectives.png](https://i.ibb.co/fxbWnNQ/objectives.png)

![image.png](https://i.ibb.co/rHVSp3w/visual-breakpoint.png)

## Notebook setup and dependency installation

In [None]:
!   pip install -qU \
    openai==1.30 \
    pinecone-client==4.1.0 \
    datasets==2.19 \
    tqdm

In [None]:
from IPython.display import HTML, display, Markdown
from typing import Dict


def chunk_display_html(chunk: Dict[str, str]) -> str:
    html_template = """
<html>
<head>
<style>
    table {{
        font-family: arial, sans-serif;
        border-collapse: collapse;
        width: 100%;
    }}
    td, th {{
        border: 1px solid #dddddd;
        text-align: left;
        padding: 8px;
    }}
</style>
</head>
<body>
    <table>
        <tr>
            <th>Key</th>
            <th>Value</th>
        </tr>
        <tr>
            <td>Title</td>
            <td>{title}</td>
        </tr>
        <tr>
            <td>doi</td>
            <td>{doi}</td>
        </tr>
        <tr>
            <td>Chunk ID</td>
            <td>{chunk_id}</td>
        </tr>
        <tr>
            <td>chunk</td>
            <td>{chunk}</td>
        </tr>
        <tr>
            <td>id</td>
            <td>{id}</td>
        </tr>
        <tr>
            <td>Summary</td>
            <td>{summary}</td>
        </tr>
        <tr>
            <td>Source</td>
            <td>{source}</td>
        </tr>
        <tr>
            <td>Authors</td>
            <td>{authors}</td>
        </tr>
        <tr>
            <td>Categories</td>
            <td>{categories}</td>
        </tr>
        <tr>
            <td>Comment</td>
            <td>{comment}</td>
        </tr>
        <tr>
            <td>Journal Reference</td>
            <td>{journal_ref}</td>
        </tr>
        <tr>
            <td>Primary Category</td>
            <td>{primary_category}</td>
        </tr>
        <tr>
            <td>Published</td>
            <td>{published}</td>
        </tr>
        <tr>
            <td>Updated</td>
            <td>{updated}</td>
        </tr>
        <tr>
            <td>References</td>
            <td>{references}</td>
        </tr>
    </table>
</body>
</html>
"""

    # Format the HTML with the generated rows
    html_output = html_template.format(
        doi=chunk.get("doi", "N/A"),
        chunk_id=chunk.get("chunk-id", "N/A"),
        chunk=chunk.get("chunk", "N/A"),
        id=chunk.get("id", "N/A"),
        title=chunk.get("title", "N/A"),
        summary=chunk.get("summary", "N/A"),
        source=chunk.get("source", "N/A"),
        authors=chunk.get("authors", "N/A"),
        categories=chunk.get("categories", "N/A"),
        comment=chunk.get("comment", "N/A"),
        journal_ref=chunk.get("journal_ref", "N/A"),
        primary_category=chunk.get("primary_category", "N/A"),
        published=chunk.get("published", "N/A"),
        updated=chunk.get("updated", "N/A"),
        references=chunk.get("references", "N/A"),
    )

    # Display the HTML in an IPython notebook
    display(HTML(html_output))


def display_retrieved_context(context_response):
    # HTML template for the main container and individual tables
    html_template = """
    <html>
    <head>
    <style>
        .container {{
            display: flex;
            flex-wrap: wrap;
        }}
        .table-container {{
            margin: 10px;
            padding: 10px;
            border: 1px solid #dddddd;
        }}
        table {{
            font-family: arial, sans-serif;
            border-collapse: collapse;
            width: 100%;
        }}
        td, th {{
            border: 1px solid #dddddd;
            text-align: left;
            padding: 8px;
        }}
    </style>
    </head>
    <body>
        <div class="container">
            {tables}
        </div>
    </body>
    </html>
    """

    # Function to generate HTML table for a single dictionary
    def generate_table_for_dict(data):
        rows = "\n".join(
            "<tr><td>{key}</td><td>{value}</td></tr>".format(
                key=key, value=value if value is not None else "N/A"
            )
            for key, value in data.items()
        )
        table_html = """
        <div class="table-container">
            <table>
                <tr>
                    <th>Key</th>
                    <th>Value</th>
                </tr>
                {rows}
            </table>
        </div>
        """.format(
            rows=rows
        )
        return table_html

    # Generate HTML tables for all dictionaries in the list
    tables = "\n".join(
        generate_table_for_dict(data["metadata"]) for data in context_response
    )

    # Format the main HTML with the generated tables
    html_output = html_template.format(tables=tables)

    # Display the HTML in an IPython notebook
    display(HTML(html_output))


def display_markdown(content: str) -> None:
    display(Markdown(content))

![image.png](https://i.ibb.co/rHVSp3w/visual-breakpoint.png)

![step-away-rag.png](https://i.ibb.co/Y288TWR/step-away-rag.png)

## Setup OpenAI

Enter the OpenAI API key and instantiate the OpenAI clinet.

Copy this openai API key `sk-XXXXXXXXXX` into the prompt.

In [None]:
import getpass
from openai import OpenAI

OPENAI_API_KEY = getpass.getpass("Enter your OpenAI API key: ")
openai = OpenAI(api_key=OPENAI_API_KEY)

![section-breakpoint.png](https://i.ibb.co/344JqH3/section-breakpoint.png)

## Implement request to OpenAI GPT Models


Implement a function that will send a prompt to an LLM and return an answer.  
The OpenAI client has the following signature:  
```python 
openai_client.chat.completions.create(model: str, messages=List[Dict[str, str]])
```

API reference: https://platform.openai.com/docs/api-reference/chat/create 

In [None]:
def llm_completion(prompt: str, openai_client: OpenAI, model: str = "gpt-3.5-turbo") -> str:

    response = openai_client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": "You are a helpful assistant. Output is markdown"},
            {"role": "user", "content": prompt},
        ],
        temperature=0.0,
    )

    return response.choices[0].message.content


![section-breakpoint.png](https://i.ibb.co/344JqH3/section-breakpoint.png)

In [None]:
display_markdown(llm_completion("What is the capital of Germany?", openai))

In [None]:
display_markdown(llm_completion("Who is 25th person that landed on the moon?", openai, model="gpt-3.5-turbo"))

![hallucinations-problem.png](https://i.ibb.co/gMvNZC6/hallucinations-problem.png)

![section-breakpoint.png](https://i.ibb.co/344JqH3/section-breakpoint.png)

In [None]:
display_markdown(llm_completion("What are key benefits of mistral 7B?", openai, model="gpt-3.5-turbo"))

![knowledge-cutoff.png](https://i.ibb.co/ccccpxZ/knowledge-cutoff.png)

![section-breakpoint.png](https://i.ibb.co/344JqH3/section-breakpoint.png)

In [None]:
context_example = """
Answer the question based on the following context. If you don't can't find the answer, tell I don't know.

Context:
We introduce Mistral 7B, a 7–billion-parameter language model engineered for
superior performance and efficiency. Mistral 7B outperforms the best open 13B
model (Llama 2) across all evaluated benchmarks, and the best released 34B
model (Llama 1) in reasoning, mathematics, and code generation. Our model
leverages grouped-query attention (GQA) for faster inference, coupled with sliding
window attention (SWA) to effectively handle sequences of arbitrary length with a
reduced inference cost. We also provide a model fine-tuned to follow instructions,
Mistral 7B – Instruct, that surpasses Llama 2 13B – chat model both on human and
automated benchmarks. Our models are released under the Apache 2.0 license.

Question: What are key benefits of Mistral 7B?
"""
display_markdown(llm_completion(context_example, openai))

![visual-breakpoint.png](https://i.ibb.co/rHVSp3w/visual-breakpoint.png)

![why-rag.png](https://i.ibb.co/KF3xr64/why-rag.png)

![section-breakpoint.png](https://i.ibb.co/344JqH3/section-breakpoint.png)

![how-to-build-rag.png](https://i.ibb.co/wgXqfFv/how-to-build-rag.png)

![visual-breakpoint.png](https://i.ibb.co/rHVSp3w/visual-breakpoint.png)

## 1. Build a knowledge base

![section-breakpoint.png](https://i.ibb.co/344JqH3/section-breakpoint.png)

![build-knowledge-base.png](https://i.ibb.co/dGnjrCk/build-knowledge-base.png)


![section-breakpoint.png](https://i.ibb.co/344JqH3/section-breakpoint.png)

### Setup Pinecone 

Enter the Pinecone API key inside the prompt and create a Pinecone client.

In [None]:
PINECONE_REGION = "eu-west-1"
PINECONE_CLOUD = "aws"
INDEX_NAME = "pinecone-workshop-1"
VECTOR_DIMENSIONS = 1536
PINECONE_API_KEY = getpass.getpass("Enter your Pinecone API key: ")

In [None]:
from pinecone import Pinecone

pinecone = Pinecone(api_key=PINECONE_API_KEY)

pinecone.list_indexes()

![section-breakpoint.png](https://i.ibb.co/344JqH3/section-breakpoint.png)

### Create a Pinecone Index

Create a Serverless Index with OpenAI embeddings size with cosine similarity metrics. 
The index creation requires index name, dimension of embeddings, similarity metric and serverless spec for a serverless setup.  

Pseudo code:  
```python
pinecone.create_index(
    name=<add-index-name>,
    dimension=<add-vector-dimension>,
    metric=<add-similarity-metric>,
    spec=ServerlessSpec(cloud=<add-cloud-name>, region=<add-region-name>)
)
```

More info o serverless: https://docs.pinecone.io/reference/architecture/serverless-architecture#overview  
Ref API: https://docs.pinecone.io/reference/api/control-plane/create_index 

In [None]:
from pinecone import ServerlessSpec

# Check if the index already exists and delete it
if INDEX_NAME in [index.name for index in pinecone.list_indexes()]:
    pinecone.delete_index(INDEX_NAME)

# TODO: Create a new index with the specified name, dimension, metric, and spec
# Use `pinecone.create_index(name=, dimension=, metric=, spec=ServerlessSpec(cloud=, region=)`

# Create a new Index reference object with the specified name and pool_threads
index = pinecone.Index(INDEX_NAME, pool_threads=20)
print(index.describe_index_stats())

![section-breakpoint.png](https://i.ibb.co/344JqH3/section-breakpoint.png)

### Dataset

We are going to use a sample of 1000 AI papers that can be found here: https://huggingface.co/datasets/smartcat/ai-arxiv2-chunks-embedded 

The data set is already chunked and encoded using `text-embeddings-3-small` so we can just load and upsert it to the Pinecone.

If you want to play with chunking strategies and embeddings, you can find the full data set here: https://huggingface.co/datasets/jamescalam/ai-arxiv2

Dataset API reference: https://huggingface.co/docs/datasets/en/index  
Slicing and indexing: https://huggingface.co/docs/datasets/en/access 

In [None]:
import datasets

dataset = datasets.load_dataset("smartcat/ai-arxiv2-chunks-embedded", split="train")
dataset

In [None]:
chunk_display_html(dataset[0])

In [None]:
print(len(dataset[0]["embeddings"]))
print(dataset[0]["embeddings"][:10])

In [None]:
list(dataset[0]["metadata"].keys())

![section-breakpoint.png](https://i.ibb.co/344JqH3/section-breakpoint.png)

### Data upsert to Pinecone

Let insert data to the Pinecone in batches. 

From our data set we need 3 columns:
1. `id` - the ID of the chunk we want to insert
2. `embeddings` - contains a vector embedding of the chunk. It uses `text-embeddings-3-small`
3. `metadata` - a dictinary with additional data about the chunk. 

The code for upserting:
```python
index.upsert(vectors=[ 
    (id1, vector1, metadata1),
    (id2, vector2, metadata2),
    ....
 ])
```

### Note on optimization

To improve througput, we are going to add `async_req=True` parameter. 
Upsert will return futures that we need to wait.

Optimization tips:
1. Deploy application at the same region
2. Use batching upsert
3. Use parallelized upsert
4. Use GRPCIndex, but make sure to add backoff for throttling
5. Use namespaces and metadata filtering
6. Avoid quotas and limits: https://docs.pinecone.io/reference/quotas-and-limits

Optimized upsert code:
```python
future = index.upsert(vectors=[ 
    (id1, vector1, metadata1),
    (id2, vector2, metadata2),
    ....
    ],
    async_req=True
)
....
future.get() # wait for upsert to complete
```

For scale-up and optimizations make sure to read: : https://docs.pinecone.io/guides/operations/performance-tuning#increasing-throughput  
Metadata filtering: Metadata filtering: https://docs.pinecone.io/guides/data/filter-with-metadata 

In [None]:
from tqdm import tqdm
from pinecone import Index

def upsert_batch(ds: datasets.Dataset, index: Index, batch_size: int = 100) -> None:
    futures = []
    for i in range(0, len(ds), batch_size):
        i_end = min(i + batch_size, len(ds))
        batch = ds.select(range(i, i_end))

        # The upsert requires the vectors to be a list of tuples (id, vector, metadata)
        # Example: [(id1, vector1, metadata1), (id2, vector2, metadata2), ...]
        # TODO: Select the id, embeddings, and metadata from the batch
        ...

        # TODO: Upsert the vectors to the Pinecone index asynchronously
        # Use `index.upsert(vectors=[(id1, vector1, metadata1), (id2, vector2, metadata2)], async_req=True)`
        futures.append(
           ...
        )

    # Wait for all the upserts to complete
    for future in tqdm(futures, total=len(futures), desc="Upsert to Pinecone"):
        future.get()

In [None]:
upsert_batch(dataset.select(range(500)), index, batch_size=20)

In [None]:
print(index.describe_index_stats())

In [None]:
upsert_batch(dataset.select(range(40_000)), index)

In [None]:
print(index.describe_index_stats())

![section-breakpoint.png](https://i.ibb.co/344JqH3/section-breakpoint.png)

### What has been done with data set?

![chunking-dataset.png](https://i.ibb.co/9wV70Q7/chunking-dataset.png)

![section-breakpoint.png](https://i.ibb.co/344JqH3/section-breakpoint.png)

#### Chunking Strategies
1. Character split (with overlapping)
2. Recursive character split
3. Document specific splitting
4. Semantic Chunking
5. Agentic?
6. More?

Introduction to chunking: https://github.com/FullStackRetrieval-com/RetrievalTutorials/blob/main/tutorials/LevelsOfTextSplitting/5_Levels_Of_Text_Splitting.ipynb


![visual-breakpoint.png](https://i.ibb.co/rHVSp3w/visual-breakpoint.png)

## Revisit Agenda


![done-next.png](https://i.ibb.co/QYC9nrb/done-next.png)


![visual-breakpoint.png](https://i.ibb.co/rHVSp3w/visual-breakpoint.png)

## 2. Retrieve against the query



![section-breakpoint.png](https://i.ibb.co/344JqH3/section-breakpoint.png)

![semantic-search.png](https://i.ibb.co/2dWGRHn/semantic-search.png)

![section-breakpoint.png](https://i.ibb.co/344JqH3/section-breakpoint.png)

Once we inserted everythong to Pinecone, let's query it. The input is query text and the output is the top similar chunks.
Steps: 
1. Encode the input text to generate embeddings 
2. Call Pinecone's query function to retrieve top K similar results

### Encode (create embeddings)

Create embeddings:
```python
openai_client.embeddings.create(model="embedding-model-name", input="Text to create embeddings")
```

Embedding API Ref: https://platform.openai.com/docs/api-reference/embeddings  
Query API Ref: https://docs.pinecone.io/reference/api/data-plane/query 

In [None]:
from typing import List


def encode(
    text: str, openai_client: OpenAI, model: str = "text-embedding-3-small"
) -> List[float]:
    # TODO: create embeddings using OpenAI API
    # 1. Call openai_client.embeddings.create with the model and text
    response = ...

    return response.data[0].embedding

In [None]:
res = encode("What are key benefits of mistral 7B?", openai)
print(res[:10])
print(len(res))

![section-breakpoint.png](https://i.ibb.co/344JqH3/section-breakpoint.png)

### Query Pinecone

Query Pinecone:
We are going to retrieve metadata, but not vectors itself
```python
index.query(
        vector=QUERY_EMBEDDINGS, top_k=TOP_K, include_metadata=True, include_values=False
    )
```

In [None]:
from pinecone import QueryResponse


def semantic_search(
    query: str, index: Index, openai_client: OpenAI, top_k: int = 10
) -> QueryResponse:
    # TODO: Implement semantic search using the OpenAI API and Pinecone index
    # 1. Reuse `encode` function to create embeddings for the query
    # 2. Use `index.query` to query the index with the vector and top_k and set include_metadata=True
    query_embedding = ...
    ret = index.query( ... )
    return ret

In [None]:
res = semantic_search("What is Mistral 7B?", index, openai)
print(res.matches[0].metadata['text'])

![visual-breakpoint.png](https://i.ibb.co/rHVSp3w/visual-breakpoint.png)

## 3. Augment and generate

![section-breakpoint.png](https://i.ibb.co/344JqH3/section-breakpoint.png)

![workflow-rag-simple.png](https://i.ibb.co/p0gwY23/workflow-rag-simple.png)

![section-breakpoint.png](https://i.ibb.co/344JqH3/section-breakpoint.png)

### Generate a final response

Combine all pieces together:
1. Perform a semantic search on input query
2. Build a context (prompt) for a LLM
3. Call LLM to generate a final response
4. Return a final response and retrieved context

The relevant context can be found in metadata, you can use:
1. `title` - a Paper title
2. `published` - a publish date
3. `primart_category` 
4. `summary` - a paper summary
5. `text` - a chunk text - this is the most useful info to build a context for LLM

In [None]:
from typing import Tuple

def from_metadata(metadata: Dict) -> str:
    return f"""
***
    Title: {metadata['title']}
    Authors: {metadata['authors'][:5]}
    Published: {metadata['published']}
    Primary category: {metadata['primary_category']}
    Paper summary: {metadata['summary']}
    Text: {metadata['text']}
"""


def get_prompt(query: str, query_results: QueryResponse) -> str:
    context = "\n".join(
        [from_metadata(result.metadata) for result in query_results.matches]
    )
    return f"""
Answer the question based on the following context. If you don't can't find the answer, tell I don't know.
The answers should be clear, easy to understand, complete and comprehensive.

Context:
{context}

Question: {query}
"""


def rag(query: str, index: Index, openai_client: OpenAI, top_k: int = 5) -> Tuple[str, QueryResponse]:
    # TODO: Wire all RAG pieces together to generate a response (retrieve, augment, generate)
    # 1. [RETRIEVE]: Reuse `semantic_search` function to get the top_k results
    # 2. [AUGMENT]: Use `get_prompt` function to generate the prompt (context + question)
    # 3. [GENERATE]: Use `llm_completion` function to generate the response
    # 4. Return the response and query_results
    query_results = ...
    prompt = ...
    response = ...
    return response, query_results

In [None]:
answer, context = rag("What are key benefits of Mistral 7B?", index, openai)
display_markdown(answer)

In [None]:
display_retrieved_context(context.matches[:2])

![section-breakpoint.png](https://i.ibb.co/344JqH3/section-breakpoint.png)

In [None]:
answer, _ = rag("The difference between Mistral and Kosmos?", index, openai, top_k=20)
display_markdown(answer)

![visual-breakpoint.png](https://i.ibb.co/rHVSp3w/visual-breakpoint.png)

## Bonus Topics

![section-breakpoint.png](https://i.ibb.co/344JqH3/section-breakpoint.png)

TBA - Add mini-agenda

![section-breakpoint.png](https://i.ibb.co/344JqH3/section-breakpoint.png)

### LangChain

Once we explored the simple components of RAG, we can use existing frameworks to build them.  

Langchain: https://www.langchain.com/langchain 

In [None]:
!pip install -qU \
    langchain-pinecone \
    langchain-openai \
    langchain

In [None]:
from langchain_pinecone import PineconeVectorStore
from langchain_openai import OpenAIEmbeddings
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQAWithSourcesChain

embeddings = OpenAIEmbeddings(
    model="text-embedding-3-small",
    openai_api_key=OPENAI_API_KEY,
)

vectorstore = PineconeVectorStore(
    index=index, embedding=embeddings, text_key="text", pinecone_api_key=PINECONE_API_KEY, index_name=INDEX_NAME,)

llm = ChatOpenAI(openai_api_key=OPENAI_API_KEY, model="gpt-3.5-turbo", temperature=0.0)
langchain_rag = RetrievalQAWithSourcesChain.from_chain_type(
    llm=llm, chain_type="stuff", retriever=vectorstore.as_retriever()
)

In [None]:
langchain_rag.invoke("What are all benefits of using Mistral 7B?")

![visual-breakpoint.png](https://i.ibb.co/rHVSp3w/visual-breakpoint.png)

### Fine-Tune vs RAG vs Prompt Engineering

![rag-vs-prompt-vs-finetune.png](https://i.ibb.co/C8rvnTY/Screenshot-2024-05-30-at-21-17-45.png)

![visual-breakpoint.png](https://i.ibb.co/rHVSp3w/visual-breakpoint.png)

### Advanced RAG

In [None]:
answer, context = rag(
    "The key difference between Mistral 7B and Palm?", index, openai, top_k=20
)
display_markdown(answer)

#### Indexing Time

#### Query Time

#### Evaluations

#### Agentic Approach

![visual-breakpoint.png](https://i.ibb.co/rHVSp3w/visual-breakpoint.png)

## Resources

In [None]:
# Notes
# Better embeddings
# Better chunking
# Better indexing
# Better query understanding
# History management
# Evaluation
# Ranking (Context Limitation)
# Guardrails
# Frameworks (langchain / canopy)

![visual-breakpoint.png](https://i.ibb.co/rHVSp3w/visual-breakpoint.png)