## Example usage of BasicRAG from rag-colls

BasicRAG is the most simplest version of RAG for anyone to begin learning RAG.

### Install libraries

In [2]:
!pip install rag-colls

Collecting rag-colls
  Downloading rag_colls-0.1.0-py3-none-any.whl.metadata (448 bytes)
Collecting chromadb>=0.6.3 (from rag-colls)
  Downloading chromadb-0.6.3-py3-none-any.whl.metadata (6.8 kB)
Collecting litellm>=1.65.0 (from rag-colls)
  Downloading litellm-1.65.0-py3-none-any.whl.metadata (36 kB)
Collecting llama-index-embeddings-openai>=0.3.1 (from rag-colls)
  Downloading llama_index_embeddings_openai-0.3.1-py3-none-any.whl.metadata (684 bytes)
Collecting loguru>=0.7.3 (from rag-colls)
  Downloading loguru-0.7.3-py3-none-any.whl.metadata (22 kB)
Collecting pymupdf>=1.25.4 (from rag-colls)
  Downloading pymupdf-1.25.4-cp39-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (3.4 kB)
Collecting build>=1.0.3 (from chromadb>=0.6.3->rag-colls)
  Downloading build-1.2.2.post1-py3-none-any.whl.metadata (6.5 kB)
Collecting chroma-hnswlib==0.7.6 (from chromadb>=0.6.3->rag-colls)
  Downloading chroma_hnswlib-0.7.6-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metada

### Download sample data (some papers)

In [3]:
!gdown --fuzzy https://drive.google.com/file/d/1Hl5EU4WzQ7C9-NAEZlliYjkMdHnDOWWl/view?usp=sharing

!unzip papers.zip -d samples

!rm papers.zip

Downloading...
From: https://drive.google.com/uc?id=1Hl5EU4WzQ7C9-NAEZlliYjkMdHnDOWWl
To: /content/papers.zip
100% 21.7M/21.7M [00:00<00:00, 36.4MB/s]
Archive:  papers.zip
  inflating: samples/papers/2308.11432v5.pdf  
  inflating: samples/papers/2312.00752v2.pdf  
  inflating: samples/papers/2312.10997v5.pdf  
  inflating: samples/papers/2409.13571v1.pdf  
  inflating: samples/papers/2409.13576v1.pdf  
  inflating: samples/papers/2409.13588v1.pdf  
  inflating: samples/papers/Li_Density_Map_Guided_Object_Detection_in_Aerial_Images_CVPRW_2020_paper.pdf  


### Prepare OPEN_AI_API key

In [8]:
# In this example, we will use openai models.
import os
from google.colab import userdata

os.environ["OPENAI_API_KEY"] = userdata.get("OPENAI_API_KEY")

# If you run on your local machine, uncomment, fill your api key here and run this
# os.environ['OPENAI_API_KEY'] = "YOUR_OPENAI_API_KEY"

### Import core components of rag-colls

In [6]:
from rag_colls.rags.basic_rag import BasicRAG
from rag_colls.llms.litellm_llm import LiteLLM
from rag_colls.embeddings.openai_embedding import OpenAIEmbedding
from rag_colls.processors.chunkers.semantic_chunker import SemanticChunker
from rag_colls.databases.vector_databases.chromadb import ChromaVectorDatabase

### Initialize BasicRAG

In [9]:
rag = BasicRAG(
    vector_database=ChromaVectorDatabase(
        persistent_directory="./chroma_db", collection_name="test"
    ),
    chunker=SemanticChunker(embed_model_name="text-embedding-ada-002"),
    llm=LiteLLM(model_name="openai/gpt-4o-mini"),
    embed_model=OpenAIEmbedding(model_name="text-embedding-ada-002"),
)

[32m2025-03-30 15:53:45.828[0m | [32m[1mSUCCESS [0m | [36mrag_colls.loggers.loguru[0m:[36msuccess[0m:[36m70[0m - [32m[1mChromaVectorDatabase initialized successfully !!![0m
[32m2025-03-30 15:53:45.882[0m | [32m[1mSUCCESS [0m | [36mrag_colls.loggers.loguru[0m:[36msuccess[0m:[36m70[0m - [32m[1mCollection test created successfully !!![0m
[32m2025-03-30 15:53:45.884[0m | [1mINFO    [0m | [36mrag_colls.loggers.loguru[0m:[36minfo[0m:[36m26[0m - [1mNo processors provided. Using default processors ...[0m
[32m2025-03-30 15:53:45.885[0m | [1mINFO    [0m | [36mrag_colls.loggers.loguru[0m:[36minfo[0m:[36m26[0m - [1mInitializing default file processors ...[0m


### Run ingest in a single file

You could run more files but in this example, we only run once.

In [10]:
rag.ingest_db(file_paths=["samples/papers/2409.13588v1.pdf"], batch_embedding=100)

[32m2025-03-30 15:54:32.182[0m | [1mINFO    [0m | [36mrag_colls.loggers.loguru[0m:[36minfo[0m:[36m26[0m - [1mProcessing 1 files ...[0m
[32m2025-03-30 15:54:33.210[0m | [1mINFO    [0m | [36mrag_colls.loggers.loguru[0m:[36minfo[0m:[36m26[0m - [1mGet 12 documents.[0m


Parsing nodes:   0%|          | 0/12 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/22 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/37 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/44 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/28 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/33 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/41 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/24 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/52 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/49 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/44 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/191 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/43 [00:00<?, ?it/s]

Embedding ...: 100%|██████████| 1/1 [00:00<00:00,  1.19it/s]
[32m2025-03-30 15:54:50.055[0m | [32m[1mSUCCESS [0m | [36mrag_colls.loggers.loguru[0m:[36msuccess[0m:[36m70[0m - [32m[1mAdded: 48 documents.[0m


### Perform search

In [11]:
response, usage = rag.search(query="Chain Buddy là gì ?", top_k=5)

print(response)
print("===========")
print(usage)

ChainBuddy là một hệ thống AI trợ lý được thiết kế để tự động hóa việc tạo ra các pipel (đường ống) cho các mô hình ngôn ngữ lớn (LLM). Nó hoạt động trong nền tảng open-source ChainForge và cung cấp một giao diện người dùng thân thiện, tương tác giống như chatbot, giúp người dùng dễ dàng lập kế hoạch và đánh giá hành vi của các LLM. ChainBuddy nhằm giải quyết vấn đề khó khăn mà người dùng thường gặp khi bắt đầu từ con số không (vấn đề trang trắng) và hỗ trợ họ trong quá trình sản xuất các pipel cho nhiều tác vụ khác nhau, như xử lý dữ liệu và tối ưu hóa lời nhắc (prompt).
LLMUsage(prompt_tokens=1495, completion_tokens=150, total_tokens=1645)
