# Gai/Gen: Retrieval-Augmented-Generation (RAG)

## 1. Note

The following examples has been tested on the following environment:

-   NVidia GeForce RTX 2060 6GB
-   Windows 11 + WSL2
-   Ubuntu 22.04
-   Python 3.10
-   CUDA Toolkit 11.8

## 2. Create Virtual Environment and Install Dependencies

We will create a seperate virtual environment for this to avoid conflicting dependencies that each underlying model requires.

```sh
sudo apt update -y && sudo apt install ffmpeg git git-lfs -y
conda create -n RAG python=3.10.10 -y
conda activate RAG
pip install -e ".[RAG]"
```

## 3. Install Model

In [None]:
%%bash
huggingface-cli download hkunlp/instructor-large \
        --local-dir ~/gai/models/instructor-large \
        --local-dir-use-symlinks False

## 4. Example

In [None]:
# Reset 'demo' collection
from gai.gen.rag import RAG
RAG.delete_collection("demo")
RAG.list_collections()

In [None]:
# Index a long speech
import asyncio
from gai.gen.Gaigen import Gaigen
gen = Gaigen.GetInstance().load('rag')
with open("../tests/gen/rag/pm_long_speech_2023.txt") as f:
    text = f.read()
doc_id = None
async def run_indexing(text):
    global doc_id
    doc_id = await gen.index_async(collection_name="demo",text=text, path_or_url="2023 National Day Speech", metadata={
        "source":"https://www.pmo.gov.sg/Newsroom/2023-National-Day-Rally-Speech",
        "title" : "2023 National Day Rally Speech",
        })
    print(f"Indexed document with id {doc_id}")

asyncio.create_task(run_indexing(text))



In [None]:
from gai.gen.rag import RAG
rag = RAG()
docs = rag.list_documents("demo")
for doc in docs:
    print(doc.Id)

In [None]:
doc = rag.get_document(doc_id)
print(doc.__dict__)

In [None]:
# Retrieve answers
from gai.gen.Gaigen import Gaigen
gen = Gaigen.GetInstance().load('rag')
result=gen.retrieve(collection_name="demo",query_texts="Who are the young seniors?")
print(result)


In [None]:
# Index and Retrieve PDF

from gai.common.PDFConvert import PDFConvert
from gai.common.utils import this_dir, os
import asyncio

src = "../tests/unit_tests/common/attention-is-all-you-need.pdf"
text=PDFConvert.pdf_to_text(src,False)
gen = Gaigen.GetInstance().load('rag')

async def index_and_retrieve():
    await gen.index_async(collection_name="demo",text=text, path_or_url=src, metadata={"source":src})
    result=gen.retrieve(collection_name="demo",query_texts="How is the transformer different from RNN?")
    print(result)

asyncio.create_task(index_and_retrieve())


## 5. Running as a Service

#### Step 1: Start Docker container

```bash
docker run -d \
    --name gai-rag \
    -p 12031:12031 \
    --gpus all \
    -v ~/gai/models:/app/models \
    kakkoii1337/gai-rag:latest
```

#### Step 2: Wait for model to load

```bash
docker logs gai-rag
```

When the loading is completed, the logs should show this:

```bash
INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:12031 (Press CTRL+C to quit)
```

#### Step 3: Test

The listener can be used to monitor the indexing progress via web socket. This is especially useful when indexing large files.

**Start Listening**

```bash
cd tests/gen/rag
python function_test_websocket_listener.py
```

**Send Request**

```bash
cd tests/gen/rag
./curl_index.sh
```


## 6. Static Methods

In [1]:
# List all collections (api)
from gai.gen.rag import RAG
RAG.list_collections()
[collection.name for collection in RAG.list_collections()]

['demo']

In [10]:
# List all documents in collection (api)
from gai.gen.rag import RAG
docs = RAG.list_documents("demo")
last_doc_id = docs[-1].Id
[{"id":doc.Id,"title":doc.Title,"size":doc.ByteSize,"chunk_count":doc.ChunkCount,"chunk_size":doc.ChunkSize,"overlap_size":doc.Overlap,"source":doc.Source} for doc in docs]


[{'id': '07ab5070-a065-4748-87e3-c97b0e17f5d9',
  'title': '2023 National Day Rally Speech',
  'size': 43153,
  'chunk_count': 29,
  'chunk_size': 2000,
  'overlap_size': 200,
  'source': 'https://www.pmo.gov.sg/Newsroom/2023-National-Day-Rally-Speech'}]

In [None]:
# List all chunks in collection
from gai.gen.rag import RAG
RAG.list_chunks("demo")

In [11]:
# Get one document
from gai.gen.rag import RAG
doc = RAG.get_document(last_doc_id)
doc.__dict__

{'_sa_instance_state': <sqlalchemy.orm.state.InstanceState at 0x7f9417f28ac0>,
 'Id': '07ab5070-a065-4748-87e3-c97b0e17f5d9',
 'Abstract': '',
 'UpdatedAt': datetime.datetime(2024, 2, 10, 1, 15, 24, 484191),
 'ChunkCount': 29,
 'Authors': '',
 'ByteSize': 43153,
 'Title': '2023 National Day Rally Speech',
 'ChunkSize': 2000,
 'Publisher': None,
 'Overlap': 200,
 'PublishedDate': None,
 'SplitAlgo': None,
 'Comments': '',
 'FileName': '',
 'IsActive': True,
 'CollectionName': 'demo',
 'Source': 'https://www.pmo.gov.sg/Newsroom/2023-National-Day-Rally-Speech',
 'CreatedAt': datetime.datetime(2024, 2, 10, 1, 15, 24, 484187),
 'chunks': [<gai.gen.rag.models.IndexedDocumentChunk.IndexedDocumentChunk at 0x7f941c478310>,
  <gai.gen.rag.models.IndexedDocumentChunk.IndexedDocumentChunk at 0x7f941c478370>,
  <gai.gen.rag.models.IndexedDocumentChunk.IndexedDocumentChunk at 0x7f941c4783d0>,
  <gai.gen.rag.models.IndexedDocumentChunk.IndexedDocumentChunk at 0x7f941c478430>,
  <gai.gen.rag.models.In

In [12]:
# Update document
from gai.gen.rag import RAG
doc.Title = "Attention is all you need"
doc.Source = "https://arxiv.org/abs/1706.03762"
doc.Authors = "Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin"
doc.Abstract = "The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles, by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data."
doc.PublishedDate = "2017-Jun-12"

RAG.update_document(doc)
RAG.get_document(last_doc_id).__dict__

{'_sa_instance_state': <sqlalchemy.orm.state.InstanceState at 0x7f9417f4ebc0>,
 'Id': '07ab5070-a065-4748-87e3-c97b0e17f5d9',
 'Abstract': '',
 'UpdatedAt': datetime.datetime(2024, 2, 10, 1, 44, 55, 589924),
 'ChunkCount': 29,
 'Authors': '',
 'ByteSize': 43153,
 'Title': 'Attention is all you need',
 'ChunkSize': 2000,
 'Publisher': None,
 'Overlap': 200,
 'PublishedDate': None,
 'SplitAlgo': None,
 'Comments': '',
 'FileName': '',
 'IsActive': True,
 'CollectionName': 'demo',
 'Source': 'https://www.pmo.gov.sg/Newsroom/2023-National-Day-Rally-Speech',
 'CreatedAt': datetime.datetime(2024, 2, 10, 1, 15, 24, 484187),
 'chunks': [<gai.gen.rag.models.IndexedDocumentChunk.IndexedDocumentChunk at 0x7f941c4877c0>,
  <gai.gen.rag.models.IndexedDocumentChunk.IndexedDocumentChunk at 0x7f941c487820>,
  <gai.gen.rag.models.IndexedDocumentChunk.IndexedDocumentChunk at 0x7f941c487880>,
  <gai.gen.rag.models.IndexedDocumentChunk.IndexedDocumentChunk at 0x7f941c4878e0>,
  <gai.gen.rag.models.Indexed

In [13]:
# Get all chunks of a document
from gai.gen.rag import RAG
chunks = RAG.get_document(last_doc_id).chunks
last_chunk_id = None
for chunk in chunks:
    last_chunk_id = chunk.ChunkId
    print(chunk.ChunkId)


0b7c01468ca7505ef751e06647b17d0475d51813c3a1925590df81a7a4b5a3a7
53bfb1d80c93baf656446d5c5257e79163a771f414351902244e6d2f2e6c88c9
8aa8cc17c40a9a090458c6725cd16cdae0cdf5d311e7b935affb461f1877a287
d96cc56503db04cfa4f6c10fd39aa3250f6e1a3c0ff9b85a119585b8a7d92df4
eafa20be8f5a92207f25cfd85774fbb436f53d5851c16d8bfac8cab47e0b2559
b9210565cbd269027dd0b19e91c3ab7db415670c088c0cada970c970a165f784
5d61c4ead1186f6de6909eca57f6675c789e43b09cfec6b738cb802b7b4eea69
f5b54668d2357185abda3d81ceda8d1218b230100a483e58e45c94d8765432ef
55e2cfde42f2e10abd85d1c5a0135cdf89138472bf18e390d768cc5eca919389
57d3f5af7dd4bf93488fe5e837e22591b53513ab7faa9199ef91a066bae13f31
5f400f3e27f615c6663d4f355745b58ecc9730380d74726e13e8731251c67e16
2487f6e4cc193ca3b9c7f0b961b0d47f5d28dd0ca7df5cfb6e8f276f452ed834
73ffba02a444eb4ded0bb5f1e6258660ede0897b8aff611bec1b5e2c269c9b68
0c494f2a001a0666487734f76141fd75fb0d570c4fcc9e1463cc31c8bf2eeaca
b14b2128be165ff7195c08914cc11f9203888392776b80867ff7594e4b49df1f
d040a9e16a818fd6598483721

In [16]:
# Get a chunk by id
from gai.gen.rag import RAG
RAG.get_chunk("demo",last_chunk_id)

{'ids': [],
 'embeddings': None,
 'metadatas': [],
 'documents': [],
 'uris': None,
 'data': None}

In [15]:
# Delete document
from gai.gen.rag import RAG
RAG.delete_document(last_doc_id)

In [17]:
# Delete collection (done)
from gai.gen.rag import RAG
RAG.delete_collection('demo')