# Gai/Gen: Retrieval-Augmented-Generation (RAG)

## 1. Note

The following examples has been tested on the following environment:

-   NVidia GeForce RTX 2060 6GB
-   Windows 11 + WSL2
-   Ubuntu 22.04
-   Python 3.10
-   CUDA Toolkit 11.8

## 2. Create Virtual Environment and Install Dependencies

We will create a seperate virtual environment for this to avoid conflicting dependencies that each underlying model requires.

```sh
sudo apt update -y && sudo apt install ffmpeg git git-lfs -y
conda create -n RAG python=3.10.10 -y
conda activate RAG
pip install -e ".[RAG]"
```

## 3. Install Model

In [None]:
%%bash
huggingface-cli download hkunlp/instructor-large \
        --local-dir ~/gai/models/instructor-large \
        --local-dir-use-symlinks False

## 4. Example

The following examples shows the use of 2 models for handling index and retrieval: the `Instructor` model running locally and the `OpenAI Embedding` model.
Both model uses different embedding dimensions, ie. 768 and 1536 respectively. Therefore, the 'demo' collection needs to be reset before running each demo.

### 1. Index and Retrieve Text File using Instructor Model

In [1]:
from gai.gen.rag import RAG
rag = RAG(generator_name="rag")
rag.unload()
rag.load()

# Index
doc_id = await rag.index_async(
    collection_name='demo',
    file_path="./pm_long_speech_2023.txt",
    file_type='txt',
    source="https://www.pmo.gov.sg/Newsroom/2023-National-Day-Rally-Speech",
    title="2023 National Day Rally Speech",
    )

# Retrieve
rag.retrieve(collection_name="demo",query_texts="Who are the young seniors?")

  from .autonotebook import tqdm as notebook_tqdm
2024-05-31 16:24:48 INFO gai.gen.rag.dalc.RAGVSRepository:[32mRAGVSRepository: in_memory[0m


load INSTRUCTOR_Transformer


2024-05-31 16:25:01 INFO gai.gen.rag.RAG:[32mrag.index_document_header_async: request started. collection_name=demo file_path=./pm_long_speech_2023.txt title=2023 National Day Rally Speech source=https://www.pmo.gov.sg/Newsroom/2023-National-Day-Rally-Speech abstract=None authors=None publisher=None published_date=None comments=None keywords=None[0m
2024-05-31 16:25:01 DEBUG gai.gen.rag.RAG:[35mrag.index_document_header_async: creating doc header with id=PwR6VmXqAfwjn84ZM6dePsLWTldPv8cNS5dESYlsY2U.[0m
2024-05-31 16:25:01 DEBUG gai.gen.rag.RAG:[35mrag.index_document_header_async: document_header created. id=PwR6VmXqAfwjn84ZM6dePsLWTldPv8cNS5dESYlsY2U[0m
2024-05-31 16:25:01 INFO gai.gen.rag.RAG:[32mrag.index_document_split_async: splitting chunks[0m
2024-05-31 16:25:01 INFO gai.gen.rag.RAG:[32mrag.index_document_split_async: chunkgroup created. chunkgroup_id=ffdc954e-0bc1-4a08-8898-cd7798675b9c[0m


max_seq_length  512


100%|██████████| 66/66 [00:00<00:00, 688.12it/s]
2024-05-31 16:25:01 INFO gai.gen.rag.RAG:[32mrag.index_document_split_async: chunks created. count=66[0m
2024-05-31 16:25:01 INFO gai.gen.rag.RAG:[32mRAG.index_document_index_async: Start indexing...[0m
0it [00:00, ?it/s]2024-05-31 16:25:03 DEBUG gai.gen.rag.RAG:[35mRAG.index_document_index_async: Indexed 1/66 chunk 3fa2e943-0178-443c-a028-2777a40abbed into collection demo[0m
1it [00:01,  1.74s/it]2024-05-31 16:25:03 DEBUG gai.gen.rag.RAG:[35mRAG.index_document_index_async: Indexed 2/66 chunk 7aee27c2-4aa4-4c92-b4de-934cb6eb1529 into collection demo[0m
2024-05-31 16:25:03 DEBUG gai.gen.rag.RAG:[35mRAG.index_document_index_async: Indexed 3/66 chunk 28f3790a-5774-4e3e-900d-ed737f7892c4 into collection demo[0m
2024-05-31 16:25:03 DEBUG gai.gen.rag.RAG:[35mRAG.index_document_index_async: Indexed 4/66 chunk f03a384f-a161-4d9a-9b3a-cb0ee8a11718 into collection demo[0m
4it [00:01,  2.70it/s]2024-05-31 16:25:03 DEBUG gai.gen.rag.RAG:

[{'documents': 'Especially for those in their 50s and early 60s. Let us call them the “Young Seniors”. "Young”, because you are younger than the Pioneer Generation and the Merdeka Generation; “Seniors”, because you will soon retire, or maybe you have recently retired.',
  'metadatas': {'Abstract': '',
   'ChunkGroupId': 'ffdc954e-0bc1-4a08-8898-cd7798675b9c',
   'DocumentId': 'PwR6VmXqAfwjn84ZM6dePsLWTldPv8cNS5dESYlsY2U',
   'Keywords': '',
   'PublishedDate': '',
   'Source': 'https://www.pmo.gov.sg/Newsroom/2023-National-Day-Rally-Speech',
   'Title': '2023 National Day Rally Speech'},
  'distances': 0.09020859003067017,
  'ids': '93a92b31-5641-4aec-9613-03c16cf30e0a'},
 {'documents': 'Young Seniors are in a unique position today. Compared to the Pioneer and Merdeka Generations, you have benefited more from Singapore’s growth, and generally done better in life. But compared to workers younger than you, in their 30s and 40s today, you have generally earned less over your lifetimes. Yo

### 2. Index and Retrieve PDF using OpenAI Embedding

This example uses OpenAI Embedding model to index and retrieve file remotely in "demo" collection.
OpenAI uses 1536 dimension embeddings.

In [3]:
from gai.gen.rag import RAG
rag = RAG(generator_name="openai-ada-rag")

# Reset collection because openai embedding uses a larger dimension than the instructor-large
rag.delete_collection("demo")
path = "./attention-is-all-you-need.pdf"
rag.unload()
rag.load()
# Index
doc_id = await rag.index_async(
    collection_name='demo',
    file_path=path,
    file_type='pdf',
    source="arxiv.org",
    title="Attention is All You Need",
    )
# Retrieve
rag.retrieve(collection_name="demo",query_texts="How is the transformer different from RNN?")

2024-05-31 14:50:13 INFO gai.gen.rag.dalc.RAGVSRepository:[32mRAGVSRepository: in_memory[0m
2024-05-31 14:50:13 INFO gai.gen.rag.RAG:[32mDeleting demo...[0m
2024-05-31 14:50:14 INFO gai.gen.rag.RAG:[32mrag.index_document_header_async: request started. collection_name=demo file_path=./attention-is-all-you-need.pdf title=Attention is All You Need source=arxiv.org abstract=None authors=None publisher=None published_date=None comments=None keywords=None[0m
2024-05-31 14:50:16 DEBUG gai.gen.rag.RAG:[35mrag.index_document_header_async: creating doc header with id=-Sc9eXzUiSlaFV3qEDaKam33Boamkvv4tea8YPsjpy0.[0m
2024-05-31 14:50:19 DEBUG gai.gen.rag.RAG:[35mrag.index_document_header_async: document_header created. id=-Sc9eXzUiSlaFV3qEDaKam33Boamkvv4tea8YPsjpy0[0m
2024-05-31 14:50:19 INFO gai.gen.rag.RAG:[32mrag.index_document_split_async: splitting chunks[0m
2024-05-31 14:50:21 INFO gai.gen.rag.RAG:[32mrag.index_document_split_async: chunkgroup created. chunkgroup_id=74b0f4f7-2ff7

[{'documents': 'however, the Transformer is the first transduction model relying entirely on self-attention to compute representations of its input and output without using sequence- aligned RNNs or convolution. In the following sections, we will describe the Transformer, motivate self-attention and discuss its advantages over models such as [17, 18] and [9]. 3 Model Architecture Most competitive neural sequence transduction models have an encoder-decoder structure [5, 2, 35]. Here, the encoder maps an input sequence of symbol representations (x1, ..., xn) to a sequence of continuous representations z = (z1, ..., zn). Given z, the decoder then generates an output sequence (y1, ..., ym) of symbols one element at a time. At each step the model is auto-regressive [10], consuming the previously generated symbols as additional input when generating the next. 2 Figure 1: The Transformer - model architecture. The Transformer follows this overall architecture using stacked self-attention and p

### 3. List Collections

In [9]:
from gai.gen.rag import RAG
rag = RAG()
rag.list_collections()

2024-05-31 14:47:40 INFO gai.gen.rag.dalc.RAGVSRepository:[32mRAGVSRepository: in_memory[0m


[Collection(name=demo)]

### 4. List Document Headers

In [10]:
rag.list_document_headers(collection_name="demo")

[]

---
## 5. Running as a Service

In this example, we will start 2 services: one for RAG API and one for RAG Listener.
We will then index a document using curl and observe the progress using the listener.

### Step 1: Start the API service

#### Option A: Run in a Docker container (Recommended)

```bash
docker run -d \
    --name gai-rag \
    -p 12031:12031 \
    --gpus all \
    -v ~/gai/models:/app/models \
    kakkoii1337/gai-rag:latest
```

Wait for model to load

```bash
docker logs gai-rag
```

When the loading is completed, the logs should show this:

```bash
INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:12031 (Press CTRL+C to quit)
```

#### Option B: Run from Terminal

```bash
cd /gai-gen/gai/api/
python rag_api.py
```

### Step 2: Start the Listener Service

The listener can be helpful when used with the API. It can be used to monitor the indexing progress via web socket. 
This is especially useful when monitoring the progress while indexing large files.

```python
# prettier-ignore
import asyncio
import os, sys
import websockets

async def listen():
    ws_uri = "ws://localhost:12031/api/v1/rag/index-file/ws"
    async with websockets.connect(ws_uri) as websocket:
        while True:
            message = await websocket.recv()
            logger.info(f"Received: {message}")

asyncio.run(listen())
```

The above code is saved under `/tests/integration_tests/rag/rag_listener`.

```bash
cd tests/integration_tests/rag
python rag_listener.py
```

If the listener is successfully started, you should see the following message from the API Server logs:

![rag-listener-connected](./imgs/rag-listener-connected.png)


### Step 3: Test RAG

**Send Request**

```bash
cd tests/integration_tests/rag
```

The following example uses curl script `tests/integration_tests/rag/3_curl_index.sh` to index a file .

```bash
curl -X POST 'http://localhost:12031/gen/v1/rag/index-file' \
    -H 'accept: application/json' \
    -H 'Content-Type: multipart/form-data' \
    -s \
    -F 'collection_name=demo' \
    -F 'file=@./pm_long_speech_2023.txt' \
    -F 'metadata={"source": "https://www.pmo.gov.sg/Newsroom/National-Day-Rally-2023#:~:text=COVID%2D19%20was%20the%20most,indomitable%20spirit%20of%20our%20nation."}'
```

**NOTE**: The indexing may fail if the file was already indexed. To re-index, you can delete the demo collection.

```bash
curl -X DELETE 'http://localhost:12031/gen/v1/rag/collection/demo'

```



### Video

![gai-gen-rag](../doc/docs/gai-gen/imgs/gai-gen-rag.gif)