# RAG System Implementation

Brief description of your RAG system and the dataset used

- Step 1: Document Loading and Preprocessing
- Step 2: Text Splitting and Chunking
- Step 3: Embedding Generation (Indexing)
- Step 4: Vector Database Storage
- Step 5: Query Processing and Retrieval: Perform similarity search (Dense Retrieval) and retrieve top-K candidates
- Step 6: Reranking: Initialize cross-encoder model, rerank retrieved candidates and filter to top-N results
- Step 7: Context Preparation for LLM: Format retrieved chunks
- Step 8: LLM Generation (Optional): Initialize LLM and generate response with context
- _Evaluation and Testing_: Test queries, Evaluation metrics and Performance analysis

# 1. Splitting and Chunking

In [4]:
from typing import List # to mark return type

# solit chunks by paragraph
def split_into_chunks(doc_file: str) -> List[str]:
    with open(doc_file, 'r') as file:
        content = file.read()
    return [chunk for chunk in content.split("\n\n")]

chunks = split_into_chunks("cn.md")

def print_all_chunks(chunks):
    for i, chunk in enumerate(chunks):
        print(f"[{i}] {chunk}\n")

# print_all_chunks(chunks)

print(f"[0] {chunks[0]}\n")
print(f"[{len(chunks)-1}] {chunks[-1]}\n")

[0] # 魔戒与魔杖：两个世界的交汇

[17] 索伦发出震耳欲聋的咆哮，但还没有被完全击败。他将所有的黑暗力量集中到一点，准备做最后的反击。就在这个关键时刻，佛罗多举起至尊魔戒，大喊："In the name of the Shire and Hogwarts！"赫敏立即用变形咒将魔戒临时转化为一个巨大的魔法放大器。哈利抓住这个机会，施展了他从未尝试过的超强魔咒——他将阿瓦达索命咒（Avada Kedavra）的原始能量逆转，创造出了"Vita Restauro"（生命复原）咒语，这是一种纯粹的光明能量，专门克制索伦这样的黑暗存在。当这道白金色的光芒击中索伦时，黑暗魔君终于崩溃了。火焰之眼熄灭，半兽人军队瞬间化为灰烬，天空恢复了宁静。



# 2. Indexing

- We import `SentenceTransformer` object to load a embedding model called `shibing624`
- We create a function to get vector for each chunk via embedding process

In [2]:
from sentence_transformers import SentenceTransformer

embedding_model = SentenceTransformer("shibing624/text2vec-base-chinese")
def embed_chunk(chunk: str) -> List[float]:
    embedding = embedding_model.encode(chunk)
    return embedding.tolist()

test_embedding = embed_chunk("测试内容")
dimensions = len(test_embedding)
print(dimensions)
print(test_embedding[:5])

Loading weights:   0%|          | 0/199 [00:00<?, ?it/s]

BertModel LOAD REPORT from: shibing624/text2vec-base-chinese
Key                          | Status     |  | 
-----------------------------+------------+--+-
bert.embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


768
[0.5059826374053955, 0.158220112323761, 0.006481220480054617, 0.13777674734592438, 1.0255974531173706]


In [3]:
embeddings = [embed_chunk(chunk) for chunk in chunks]
assert len(embeddings) == len(chunks)
assert len(embeddings[0]) == dimensions

## Vector DB Initialization
A vector database stores, manages and indexes high-dimensional vector data. We use `chromadb` here.
- `chromadb.EphemeralClient()` -> will not write into disk data gets removed when terminate current process
- `chromadb.PersistentClient("./file_name.db")` -> will write into disk
- `collection` in chromadb is like creating a table for traditional database.

In [4]:
import chromadb

chromadb_client = chromadb.EphemeralClient()
chromadb_collection = chromadb_client.get_or_create_collection(name="default")

def save_embeddings(chunks: List[str], embeddings: List[List[float]]) -> None:
    """
    chromadb reuiqres an ID for each record; in our case each chunk must have an ID
    this function is to save all records into the database
    each record has an id, the orignal text as chunk, and its corresbonding indexed vector as embedding
    """
    ids = [str(i) for i in range(len(chunks))]
    chromadb_collection.add(
        documents = chunks,
        embeddings = embeddings,
        ids = ids
    )

save_embeddings(chunks, embeddings)

# 3. Retrieval
- Convert query into index vector and then pass this vector into database
- Compare query index with all other chunks index and score the similarity return top-K

---

Step 1: **Bi-encoder** Retrieval (Fast but less accurate)
- Encode query and docs separately → Compare vectors with cosine similarity
- Get top-K candidates
- Time: 0.03s

Step 2: **Cross-encoder** Reranking (Slow but very accurate)  
- Process (query + doc) together with attention → Get relevance score
- Rerank top-K → top 3 results
- Time: 2.5s

Q: Why not use cross-encoder only?
A: Too expensive! Cross-encoder must process ALL documents (10K docs = 
   500s per query). Bi-encoder caches document embeddings, so only needs 
   to encode query (0.03s). Result: 167× speedup.

Key Insight: Bi-encoder casts wide net efficiently, cross-encoder filters 
to highest quality.

In [5]:
def retrieve(query: str, top_k: int) -> List[str]:
    query_embedding = embed_chunk(query)
    results = chromadb_collection.query(
        query_embeddings = [query_embedding],
        n_results = top_k
    )
    return results['documents'][0]

query = "哈利波特用了什么魔法打败了索伦？"
retrieved_chunks = retrieve(query, 5)

for i, chunk in enumerate(retrieved_chunks):
    print(f"[{i}] {chunk}\n")

[0] 在甘道夫的指导下，哈利学会了如何将他的守护神咒与中土世界的光明魔法结合。他们制定了一个大胆的计划：哈利将使用"Expecto Patronum"（呼神护卫）咒语，但注入甘道夫的白光力量，创造出一个超级守护神来对抗索伦。同时，赫敏和佛罗多合作研究如何利用魔戒的力量作为诱饵，将索伦引入一个魔法陷阱。罗恩和山姆则负责协调霍格沃茨学生和霍比特人的防御工作。

[1] 索伦发出震耳欲聋的咆哮，但还没有被完全击败。他将所有的黑暗力量集中到一点，准备做最后的反击。就在这个关键时刻，佛罗多举起至尊魔戒，大喊："In the name of the Shire and Hogwarts！"赫敏立即用变形咒将魔戒临时转化为一个巨大的魔法放大器。哈利抓住这个机会，施展了他从未尝试过的超强魔咒——他将阿瓦达索命咒（Avada Kedavra）的原始能量逆转，创造出了"Vita Restauro"（生命复原）咒语，这是一种纯粹的光明能量，专门克制索伦这样的黑暗存在。当这道白金色的光芒击中索伦时，黑暗魔君终于崩溃了。火焰之眼熄灭，半兽人军队瞬间化为灰烬，天空恢复了宁静。

[2] 黄昏时分，索伦的火焰之眼终于突破了霍格沃茨的外层防护。邓布利多和甘道夫联手施展了"普罗特戈盾阵"（Protego Maxima），暂时挡住了半兽人的进攻。哈利站在城堡的最高塔上，深吸一口气。他举起魔杖，大声喊道："Expecto Patronum Illuminatus！"（光明守护神降临！）这是他与甘道夫共同创造的全新咒语。一只巨大的银色凤凰从他的魔杖中飞出，身上闪耀着甘道夫赋予的白色圣光。这只光明凤凰直冲向索伦的火焰之眼。

[3] 邓布利多立即召集了凤凰社成员和霍格沃茨的教授们。赫敏和罗恩也赶来与哈利会合。甘道夫则召唤了他在中土世界的盟友——精灵王子莱戈拉斯和矮人战士金雳也神奇地穿越了时空裂缝赶来支援。"我们需要一个计划，"邓布利多说，"索伦的力量在这个世界可能会变得更强，因为他能吸收我们的魔法能量。" 赫敏翻阅着《高级魔法理论》说："如果我们能结合两个世界的魔法，也许能创造出前所未有的强大咒语！"

[4] 但索伦的力量比预想的更强大。火焰之眼释放出黑暗射线，与光明凤凰僵持不下。这时，邓布利多施展了他最强大的魔法——"Fianto Duri"（钢铁守卫）和"Repello Inimicum"（驱逐敌人

# 4. Reranking
- Use [Cross-Encoder for multilingual MS Marco](https://huggingface.co/cross-encoder/mmarco-mMiniLMv2-L12-H384-v1) to rerank

In [6]:
from sentence_transformers import CrossEncoder

def rerank(query: str, retrieved_chunks: List[str], top_k: int) -> List[str]:
    corss_encoder = CrossEncoder('cross-encoder/mmarco-mMiniLMv2-L12-H384-v1')
    pairs = [(query, chunk) for chunk in retrieved_chunks]
    scores = corss_encoder.predict(pairs)

    scored_chunks = [(chunk, score) for chunk, score in zip(retrieved_chunks, scores)]
    scored_chunks.sort(key=lambda pair: pair[1], reverse=True)

    # we only return text chunk but get rid of score
    return [chunk for chunk, _ in scored_chunks][:top_k]

reranked_chunks = rerank(query, retrieved_chunks, 3)

# Print top-K result out
for i, chunk in enumerate(reranked_chunks):
    print(f"[{i}] {chunk}\n")

Loading weights:   0%|          | 0/201 [00:00<?, ?it/s]

XLMRobertaForSequenceClassification LOAD REPORT from: cross-encoder/mmarco-mMiniLMv2-L12-H384-v1
Key                             | Status     |  | 
--------------------------------+------------+--+-
roberta.embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


[0] 索伦发出震耳欲聋的咆哮，但还没有被完全击败。他将所有的黑暗力量集中到一点，准备做最后的反击。就在这个关键时刻，佛罗多举起至尊魔戒，大喊："In the name of the Shire and Hogwarts！"赫敏立即用变形咒将魔戒临时转化为一个巨大的魔法放大器。哈利抓住这个机会，施展了他从未尝试过的超强魔咒——他将阿瓦达索命咒（Avada Kedavra）的原始能量逆转，创造出了"Vita Restauro"（生命复原）咒语，这是一种纯粹的光明能量，专门克制索伦这样的黑暗存在。当这道白金色的光芒击中索伦时，黑暗魔君终于崩溃了。火焰之眼熄灭，半兽人军队瞬间化为灰烬，天空恢复了宁静。

[1] 在甘道夫的指导下，哈利学会了如何将他的守护神咒与中土世界的光明魔法结合。他们制定了一个大胆的计划：哈利将使用"Expecto Patronum"（呼神护卫）咒语，但注入甘道夫的白光力量，创造出一个超级守护神来对抗索伦。同时，赫敏和佛罗多合作研究如何利用魔戒的力量作为诱饵，将索伦引入一个魔法陷阱。罗恩和山姆则负责协调霍格沃茨学生和霍比特人的防御工作。

[2] 但索伦的力量比预想的更强大。火焰之眼释放出黑暗射线，与光明凤凰僵持不下。这时，邓布利多施展了他最强大的魔法——"Fianto Duri"（钢铁守卫）和"Repello Inimicum"（驱逐敌人）的组合咒，从地面发射出金色的魔法光柱，击中了索伦的侧翼。甘道夫则举起他的法杖，召唤出中土世界的太阳之火："你不能通过！我是秘火的仆人，执掌安诺之焰！暗影之火对你毫无用处，魔苟斯的走狗！回归虚无吧！" 



## 5. LLM Generation
- using Gemini 2.5 Flash