# Retrieval Augmented Generation (RAG) Server

## 1. Setup

**Instructions:**

a) Download model

```bash
huggingface-cli download hkunlp/instructor-large \
    --revision 54e5ffb8d484de506e59443b07dc819fb15c7233 \
    --local-dir ~/.gai/models/instructor-large \
    --local-dir-use-symlinks False
```


---

## 2. Load test configuration

In [1]:
from gai.lib.server.singleton_host import SingletonHost
from gai.lib.common.utils import free_mem
from rich.console import Console
console=Console()

config = {
    "type": "rag",
    "generator_name": "instructor-sentencepiece",
    "chromadb": {
        "path": "rag/chromadb",
        "n_results": 3
    },
    "sqlite": {
        "path": "rag/gai-rag.db"
    },
    "model_path": "models/instructor-large",
    "device": "cuda",
    "chunks": {
        "size": 1000,
        "overlap": 100,
        "path": "chunks"
    },
    "module_name": "gai.rag.server.gai_rag",
    "class_name": "RAG",
    "init_args": [],
    "init_kwargs": {}
}


## 3. Load Model Test

In [2]:
# before loading
free_mem()
try:
    with SingletonHost.GetInstanceFromConfig(config) as host:

        # after loading
        free_mem()
except Exception as e:
    raise e
finally:
    # after disposal
    free_mem()
    

[42m[30mINFO    [0m [32mRAG: device=cuda[0m
[42m[30mINFO    [0m [32mRAG: sqlite=sqlite:///:memory:[0m


load INSTRUCTOR_Transformer




max_seq_length  512


## Indexing

In [2]:
from gai.rag.server.gai_rag import RAG
from gai.rag.server.dtos.create_doc_header_request import CreateDocHeaderRequestPydantic
try:
    with SingletonHost.GetInstanceFromConfig(config) as host:
        rag = host.generator

        req = CreateDocHeaderRequestPydantic(
            CollectionName='demo',
            FileType='txt',
            Source='https://www.pmo.gov.sg/Newsroom/2023-National-Day-Rally-Speech',
            Title='2023 National Day Rally Speech',
            FilePath="./pm_long_speech_2023.txt"
        )


        # Index
        chunkids = await rag.index_async(
            req=req
            # collection_name='demo',
            # file_type='txt',
            # source="https://www.pmo.gov.sg/Newsroom/2023-National-Day-Rally-Speech",
            # title="2023 National Day Rally Speech"
            )

except Exception as e:
    raise e
finally:
    # after disposal
    free_mem()


[42m[30mINFO    [0m [32mRAG: device=cuda[0m
[42m[30mINFO    [0m [32mRAG: sqlite=sqlite:///:memory:[0m


load INSTRUCTOR_Transformer
max_seq_length  512




[42m[30mINFO    [0m [32mrag.index_document_header_async: request started. collection_name=demo file_path=./pm_long_speech_2023.txt title=2023 National Day Rally Speech source=https://www.pmo.gov.sg/Newsroom/2023-National-Day-Rally-Speech abstract=None authors=None publisher=None published_date=None comments=None keywords=None[0m
[45m[30mDEBUG   [0m [35mrag.index_document_header_async: creating doc header with id=PwR6VmXqAfwjn84ZM6dePsLWTldPv8cNS5dESYlsY2U.[0m
[45m[30mDEBUG   [0m [35mrag.index_document_header_async: document_header created. id=PwR6VmXqAfwjn84ZM6dePsLWTldPv8cNS5dESYlsY2U[0m
[42m[30mINFO    [0m [32mrag.index_document_split_async: splitting chunks[0m
[42m[30mINFO    [0m [32mrag.index_document_split_async: chunkgroup created. chunkgroup_id=a09d97b0-7246-4c9a-b564-af39428049da[0m
100%|██████████| 66/66 [00:00<00:00, 533.84it/s]
[42m[30mINFO    [0m [32mrag.index_document_split_async: chunks created. count=66[0m
[42m[30mINFO    [0m [32mRAG.ind

In [3]:
try:
    with SingletonHost.GetInstanceFromConfig(config) as host:
        rag=host.generator
        # Index
        result = rag.retrieve(collection_name="demo",query_texts="Who are the young seniors?")
        console.print(result)

except Exception as e:
    raise e
finally:
    # after disposal
    free_mem()


[42m[30mINFO    [0m [32mRAG: device=cuda[0m
[42m[30mINFO    [0m [32mRAG: sqlite=sqlite:///:memory:[0m


load INSTRUCTOR_Transformer
max_seq_length  512


[42m[30mINFO    [0m [32mRAG.retrieve: Retrieving by query Who are the young seniors?...[0m


## API (Press F5 to start API server)

Wait for it complete loading before running the next cell.

#### a) List Collections

In [1]:
%%bash
curl -s http://localhost:12036/gen/v1/rag/collections

{"collections":[]}

#### b) Delete collection

In [2]:
%%bash
curl -s -X DELETE http://localhost:12036/gen/v1/rag/collection/demo

{"detail":{"code":"collections_not_found","message":"Collection demo not found"}}

#### c) index

In [3]:
%%bash
curl -X POST 'http://localhost:12036/gen/v1/rag/index-file' \
    -H 'accept: application/json' \
    -H 'Content-Type: multipart/form-data' \
    -s \
    -F 'file=@./pm_long_speech_2023.txt' \
    -F 'req={"CollectionName":"demo","Source": "https://www.pmo.gov.sg/Newsroom/National-Day-Rally-2023","FilePath":"./pm_long_speech_2023.txt"}'

{"DocumentId":"PwR6VmXqAfwjn84ZM6dePsLWTldPv8cNS5dESYlsY2U","ChunkgroupId":"dd1aade5-2edb-438e-ac1d-35f6f71a63e7","ChunkIds":["6ad7580d-04d7-4f25-9886-e6231ec5e2d1","19fac0e8-3169-4f1d-8c5b-5a0714eeff92","1a7af241-6208-4f55-8df4-24e3d4f435de","cc6f6f39-e301-4199-aaef-f9db54956b83","7ef9f81a-b976-49ee-a813-9f3bc14ee735","103e1d2e-6da1-4f16-8c13-a9f2a142217b","409951a3-c262-4acd-973a-4dcbce101f56","3d18c399-3568-4979-ab12-7246b4eac60f","66bcf493-46a9-4c37-b3d2-2b42def7a852","b60045fa-2f43-4a80-861d-d06cb28c7758","d395fe19-f21d-4759-af03-1bb288c27678","fbb0ff42-d7bd-42d8-acd4-a5dfe0d68015","2214b261-0876-496e-b64f-c3c692a1ab61","9c5be1f2-66b3-4fbc-9e78-46b84fc756df","5002ee26-c75f-456a-a69a-d950d9df615d","a5bfc36b-5a48-43b4-9553-117d997484e9","15b7801c-10a4-446c-bdcd-5ec7c60c9535","16ac4971-70a0-469e-adbb-e2b2622bde20","39e906b6-e95a-455c-87fa-500fac9c74af","17f1898d-318e-4a5f-98ce-309932e428a5","76048a0c-55c9-46ac-bffd-1543c7274f39","f8368248-8f96-4fda-a26b-525b70b5ff91","2f9ffe14-3a77-4

#### d) verify document exists

In [4]:
%%bash
curl -X POST 'http://localhost:12036/gen/v1/rag/collection/demo/document/exists' \
    -H 'accept: application/json' \
    -H 'Content-Type: multipart/form-data' \
    -s \
    -F 'file=@./pm_long_speech_2023.txt' 

{"document":{"Id":"PwR6VmXqAfwjn84ZM6dePsLWTldPv8cNS5dESYlsY2U","CollectionName":"demo","ByteSize":43352,"FileName":"pm_long_speech_2023.txt","FileType":".txt","File":null,"Source":"https://www.pmo.gov.sg/Newsroom/National-Day-Rally-2023","Abstract":null,"Authors":null,"Title":null,"Publisher":null,"PublishedDate":null,"Comments":null,"Keywords":null,"CreatedAt":"2024-09-06T12:22:54.245021","UpdatedAt":"2024-09-06T12:22:54.245027","IsActive":true,"ChunkGroups":[{"Id":"dd1aade5-2edb-438e-ac1d-35f6f71a63e7","DocumentId":"PwR6VmXqAfwjn84ZM6dePsLWTldPv8cNS5dESYlsY2U","SplitAlgo":"recursive_split","ChunkCount":66,"ChunkSize":1000,"Overlap":100,"IsActive":true,"ChunksDir":"/tmp/chunks/b2d0cc2ebe2641038c7edf7cec52d0ed"}]}}

#### e) retrieve

In [5]:
%%bash
curl -X POST 'http://localhost:12036/gen/v1/rag/retrieve' \
    -s \
    -H "Content-Type: application/json" \
    -d '{"collection_name":"demo","query_texts":"Who are the young seniors?","n_results":4}'


{"retrieved":[{"documents":"Especially for those in their 50s and early 60s. Let us call them the “Young Seniors”. \"Young”, because you are younger than the Pioneer Generation and the Merdeka Generation; “Seniors”, because you will soon retire, or maybe you have recently retired.","metadatas":{"Abstract":"","ChunkGroupId":"dd1aade5-2edb-438e-ac1d-35f6f71a63e7","DocumentId":"PwR6VmXqAfwjn84ZM6dePsLWTldPv8cNS5dESYlsY2U","Keywords":"","PublishedDate":"","Source":"https://www.pmo.gov.sg/Newsroom/National-Day-Rally-2023","Title":""},"distances":0.09020859003067017,"ids":"0df94782-6019-4272-ad8b-0f7727e1c584"},{"documents":"Young Seniors are in a unique position today. Compared to the Pioneer and Merdeka Generations, you have benefited more from Singapore’s growth, and generally done better in life. But compared to workers younger than you, in their 30s and 40s today, you have generally earned less over your lifetimes. You have also had less time to benefit from improvements to the CPF syst

g) list documents

In [6]:
%%bash
curl -s 'http://localhost:12036/gen/v1/rag/collection/demo/documents'



h) get document

In [7]:
%%bash
curl -s 'http://localhost:12036/gen/v1/rag/collection/demo/document/PwR6VmXqAfwjn84ZM6dePsLWTldPv8cNS5dESYlsY2U'

{"document":{"Id":"PwR6VmXqAfwjn84ZM6dePsLWTldPv8cNS5dESYlsY2U","CollectionName":"demo","ByteSize":43352,"FileName":"pm_long_speech_2023.txt","FileType":".txt","File":null,"Source":"https://www.pmo.gov.sg/Newsroom/National-Day-Rally-2023","Abstract":null,"Authors":null,"Title":null,"Publisher":null,"PublishedDate":null,"Comments":null,"Keywords":null,"CreatedAt":"2024-09-06T12:22:54.245021","UpdatedAt":"2024-09-06T12:22:54.245027","IsActive":true,"ChunkGroups":[{"Id":"dd1aade5-2edb-438e-ac1d-35f6f71a63e7","DocumentId":"PwR6VmXqAfwjn84ZM6dePsLWTldPv8cNS5dESYlsY2U","SplitAlgo":"recursive_split","ChunkCount":66,"ChunkSize":1000,"Overlap":100,"IsActive":true,"ChunksDir":"/tmp/chunks/b2d0cc2ebe2641038c7edf7cec52d0ed"}]}}

i) update document

In [8]:
%%bash
curl -X PUT \
    http://localhost:12036/gen/v1/rag/collection/demo/document/PwR6VmXqAfwjn84ZM6dePsLWTldPv8cNS5dESYlsY2U \
    -H 'Content-Type: application/json' \
    -s \
    -d '{
            "Publisher": "ABC"
        }'


{"message":"Document updated successfully","document":"PwR6VmXqAfwjn84ZM6dePsLWTldPv8cNS5dESYlsY2U"}

j) delete document

In [9]:
%%bash
curl -s -X DELETE http://localhost:12036/gen/v1/rag/collection/demo/document/PwR6VmXqAfwjn84ZM6dePsLWTldPv8cNS5dESYlsY2U

{"message":"Document with id PwR6VmXqAfwjn84ZM6dePsLWTldPv8cNS5dESYlsY2U deleted successfully"}

---
#### Example : Index PDF using Multi-Step Indexing

The purpose for multi-step is to support interactive status update so that the client can make use of the websocket manager to get the status of the indexing process in a step-by-step manner.

NOTE: The same is achievable using the single-step indexing as well as before.

```
curl -X POST 'http://localhost:12036/gen/v1/rag/index-file' \
    -H 'accept: application/json' \
    -H 'Content-Type: multipart/form-data' \
    -s \
    -F 'collection_name=demo' \
    -F 'file=@./attention-is-all-you-need.pdf' \
    -F 'req={"CollectionName":"demo","Source": "https://arxiv.org/abs/1706.03762","FilePath":"attention-is-all-you-need.pdf"}'
```


##### Step 1: Index document header

In [10]:
%%bash
curl -X POST 'http://localhost:12036/gen/v1/rag/step/header' \
    -H 'accept: application/json' \
    -H 'Content-Type: multipart/form-data' \
    -s \
    -F 'file=@./attention-is-all-you-need.pdf' \
    -F 'req={"CollectionName":"demo","Source": "https://arxiv.org/abs/1706.03762","FilePath":"attention-is-all-you-need.pdf"}'

{"Id":"-Sc9eXzUiSlaFV3qEDaKam33Boamkvv4tea8YPsjpy0","CollectionName":"demo","ByteSize":2215244,"FileName":"attention-is-all-you-need.pdf","FileType":".pdf","File":null,"Source":"https://arxiv.org/abs/1706.03762","Abstract":null,"Authors":null,"Title":null,"Publisher":null,"PublishedDate":null,"Comments":null,"Keywords":null,"CreatedAt":"2024-09-06T12:23:29.007320","UpdatedAt":"2024-09-06T12:23:29.007324","IsActive":true,"ChunkGroups":[]}

##### Step 2: Split document into a group of chunks

In [11]:
import json
import os

# Execute the curl command and capture the response
response = !curl -X POST 'http://localhost:12036/gen/v1/rag/step/split' \
    -H 'Content-Type: application/json' \
    -s \
    -d '{\
            "collection_name": "demo",\
            "document_id": "-Sc9eXzUiSlaFV3qEDaKam33Boamkvv4tea8YPsjpy0",\
            "chunk_size": 1000,\
            "chunk_overlap": 100\
        }'

# Convert response to string, then load it as JSON
response_json = json.loads(''.join(response))

# Extract the Id
document_id = response_json["Id"]
print(document_id)

# Set environment variable
os.environ['DOCUMENT_ID'] = document_id


3e988daa-3521-4884-af09-64b6482477a3


##### Step 3: Index each chunk in the database

In [12]:
%%bash
echo $DOCUMENT_ID
curl -X POST 'http://localhost:12036/gen/v1/rag/step/index' \
    -H 'Content-Type: application/json' \
    -s \
    -d '{
            "collection_name": "demo",
            "document_id": "-Sc9eXzUiSlaFV3qEDaKam33Boamkvv4tea8YPsjpy0",
            "chunkgroup_id": "'$DOCUMENT_ID'"
        }'

3e988daa-3521-4884-af09-64b6482477a3
{"DocumentId":"-Sc9eXzUiSlaFV3qEDaKam33Boamkvv4tea8YPsjpy0","ChunkgroupId":"3e988daa-3521-4884-af09-64b6482477a3","ChunkIds":["d3521baa-875e-4a46-b0e0-68d694308516","331fc322-fe74-49ab-ad84-44170824e9a2","6688f6b5-6599-4c03-8f27-a6f8f8faeddf","b31dfb3d-7a6a-4ff0-ae5f-8ad366ad15b6","7846a80f-fb85-470d-9799-3036f49b2803","656352f4-3b09-4804-be3d-a6e5dc959205","ee8ba4e7-4f34-45bd-9069-44af8c42e710","28945c03-2534-4c20-ab09-5c400ef87680","ad9d23f6-39f2-44b0-a15b-1941e4f60d48","457fa979-d8c3-49dc-bd7f-f0f2ee90f1bd","2d77e4ce-aeb0-4fe0-b481-e9fdd776ca16","1707599b-5ee8-4ad1-992c-96b68e1144f0","7cc485c1-7e6e-4618-84c4-6d4b63ff88a1","3c0e3a26-d42a-48cb-9009-d6b368fca35e","33b29449-8b86-4d03-a716-d438b9116453","1d55d097-fa7f-4a33-ae80-80f07daf2b78","e46b6618-dfb0-40a8-ba3c-1f2249f76cb6","00719cda-5080-4ada-b1e8-781b930661e1","41a42692-61f6-45c5-ad08-49d89ba895ab","f222aef7-ef4c-4bc4-8a4d-239bf73ed083","3a72aad1-c91a-438c-b9c1-74144203b552","77131a65-d578-447

---

## 5. Docker

**Instructions:** 

- Press **CTRL+SHIFT+P** > **Tasks: Run Task** > **docker: build**

- get updated version number from pyproject.toml

- update docker-compose.yml gai-rag-svr image tag with the new version number.

- Press **CTRL+SHIFT+P** > **Tasks: Run Task** > **docker-compose: up**

#### Smoke Test

In [16]:
%%bash
curl http://localhost:12036


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    25  100    25    0     0   2012      0 --:--:-- --:--:-- --:--:--  2083


{"message":"gai-rag-svr"}

**Tests:**

Repeat the API test (#)

**Tear Down:**

- Press **CTRL+SHIFT+P** > **Tasks: Run Task** > **docker-compose: down**

### Debugging

a) Container must be started with "python -m debugpy --listen 0.0.0.0:5678 main.py"

b) Port 5678 must be opened.

c) Click on "Debug" in Tool bar

d) Select "Attach" > "Run and Debug"

e) Add a "breakpoint" in the code

f) Run the API test to see if it trigger the breakpoint.