# Notebook for RAG on OpenROAD
- Notebook uses Github data sources only. 
- https://huggingface.co/spaces/mteb/leaderboard

## Structure
- Get text
- Chunk text
- Embed text
- Retrieve text

## TODO
- https://stackoverflow.com/questions/10993612/how-to-remove-xa0-from-string-in-python
- Improve embedding model. Current embedding model (all-MiniLM-L6-v2 is ranked 42th on the MTEB)
- Improve vector store. QDrant?
- To write benchmarking function, see how long each embedding model take.
- Improve prompt model. Refer to https://github.com/tinygrad/tinygrad/blob/master/examples/llama.py
- Use better model: Mistral-7B-instruct?? Refer to instruction format. https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1

In [None]:
import os 
import glob

# Load data
documents = []
for name in glob.glob('./data/*.md'): 
    print(name) 
    documents.append(open(name, 'r').read())

len(documents)

In [None]:
from langchain.text_splitter import CharacterTextSplitter, MarkdownHeaderTextSplitter
from langchain.text_splitter import RecursiveCharacterTextSplitter

# from langchain.vectorstores import Qdrant
# from langchain.document_loaders import TextLoader

markdown_splitter = MarkdownHeaderTextSplitter(headers_to_split_on=headers_to_split_on)
md_header_splits = markdown_splitter.split_text(text)

# Optional: We can do recursive splitting within each document
# # Char-level splits

# chunk_size = 250
# chunk_overlap = 30
# text_splitter = RecursiveCharacterTextSplitter(
#     chunk_size=chunk_size, chunk_overlap=chunk_overlap
# )

# # Split
# splits = text_splitter.split_documents(md_header_splits)
# splits

In [None]:
# Putting it together as a function
# TODO: Document loading seems to remove the markdown headers by default. Is there any way to keep the headers?
from langchain.text_splitter import CharacterTextSplitter, MarkdownHeaderTextSplitter
from langchain.text_splitter import RecursiveCharacterTextSplitter
import re

def chunk(doc):
    """
    Args:
    doc - in Document format
    
    Output:
    md_header_splits - markdown split format. Gives a more principled level of splitting 
    """
    text = doc.page_content
    header_pattern = r'^#+ .+'
    headers = re.findall(header_pattern, text, re.MULTILINE)
    headers_to_split_on = []
    for i, header in enumerate(headers):
        headers_to_split_on.append((header, f'Header {i}'))
    return headers_to_split_on
    # markdown_splitter = MarkdownHeaderTextSplitter(headers_to_split_on=headers_to_split_on)
    # md_header_splits = markdown_splitter.split_text(text)

    # # Optional char-level splits within each document 
    # chunk_size = 250
    # chunk_overlap = 30
    # text_splitter = RecursiveCharacterTextSplitter(
    #     chunk_size=chunk_size, chunk_overlap=chunk_overlap
    # )
    # splits = text_splitter.split_documents(md_header_splits)
    # return splits


chunk(documents[0])

In [None]:
import torch.nn.functional as F
from torch import Tensor
from transformers import AutoTokenizer, AutoModel

def average_pool(last_hidden_states: Tensor,
                 attention_mask: Tensor) -> Tensor:
    last_hidden = last_hidden_states.masked_fill(~attention_mask[..., None].bool(), 0.0)
    return last_hidden.sum(dim=1) / attention_mask.sum(dim=1)[..., None]

input_texts = [
    "what is the capital of China?",
    "how to implement quick sort in python?",
    "Beijing",
    "sorting algorithms"
]

tokenizer = AutoTokenizer.from_pretrained("thenlper/gte-base")
model = AutoModel.from_pretrained("thenlper/gte-base")

# Tokenize the input texts
batch_dict = tokenizer(input_texts, max_length=512, padding=True, truncation=True, return_tensors='pt')

outputs = model(**batch_dict)
embeddings = average_pool(outputs.last_hidden_state, batch_dict['attention_mask'])

# (Optionally) normalize embeddings
embeddings = F.normalize(embeddings, p=2, dim=1)
scores = (embeddings[:1] @ embeddings[1:].T) * 100
print(scores.tolist())


In [None]:
from langchain.vectorstores import Qdrant
qdrant = Qdrant.from_documents(
    docs, embeddings, 
    location=":memory:",  # Local mode with in-memory storage only
    collection_name="my_documents",
)


In [None]:
from langchain.retrievers import TFIDFRetriever
retriever = TFIDFRetriever.from_texts(["foo", "bar", "world", "hello", "foo bar"])
result = retriever.get_relevant_documents("foo")
result

## Main

In [11]:
from langchain.document_loaders import TextLoader, DirectoryLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import SentenceTransformerEmbeddings
from langchain.vectorstores import Chroma
from langchain.prompts import ChatPromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough
from langchain.document_loaders import GitLoader
from langchain.schema import Document
import json
from typing import Iterable

In [12]:
def load_docs_from_jsonl(file_path)->Iterable[Document]:
    array = []
    with open(file_path, 'r') as jsonl_file:
        for line in jsonl_file:
            data = json.loads(line)
            obj = Document(**data)
            array.append(obj)
    return array

In [14]:
%%time
# Step 1: Load the document(s) and split it into chunks
# There is a memory issue with loading all documents. How to solve? 
chunks = load_docs_from_jsonl('tempdata/data.jsonl')

# Step 2: Create embeddings
embedding_model = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
# chunks2 = [x.page_content for x in chunks] # HFEmbeddings only accepts str
# embeddings = embedding_model.embed_documents(chunks2)

# Step 3: Store embeddings in ChromaDB and save locally.
db = Chroma.from_documents(chunks, embedding_model, persist_directory="./chroma_db")
# db = Chroma(persist_directory="./chroma_db", embedding_function=embedding_model)

# Step 4: Create a retriever
retriever = db.as_retriever()

CPU times: user 14min 38s, sys: 3min 34s, total: 18min 13s
Wall time: 2min 25s


In [15]:
from llama_cpp import Llama

def create_completion(question):
    # Step 5: Define the prompt template
    # source: https://chat.openai.com/share/c85e64f6-4dd2-4920-b82e-78f128898cbb
    template = """<s>[INST]You are now an expert in OpenROAD EDA software. 

    
    Here is an excerpt of code or documentation you may refer to:
    {context}
    
    {question}[/INST]"""
    prompt = ChatPromptTemplate.from_template(template)
    
    # Step 6: Generate a query and search for relevant chunks
    context = db.similarity_search(question)[0].page_content
    final_prompt = prompt.format_messages(context = context, question = question)[0].content
    
    # Step 7: Use llama-cpp-python as a prototype. 
    llm = Llama(model_path="../llama.cpp/models/mistral-instruct/ggml-model-q4_0.gguf", n_ctx=8000)
    output = llm.create_completion(final_prompt,
                                   suffix=None,
                                   max_tokens=0, # set this to 0 for no limit on tokens (depend on n_ctx)
                                   temperature=0.8, # higher temperature, more factual.
                                   top_p=0.95,
                                   logprobs=None,
                                   echo=True,
                                   stop=[],
                                   frequency_penalty=0.0,
                                   presence_penalty=0.0,
                                   repeat_penalty=1.1,
                                   top_k=40,
                                   stream=False,
                                   tfs_z=1.0,
                                   mirostat_mode=0,
                                   mirostat_tau=5.0,
                                   mirostat_eta=0.1,
                                   model=None,
                                   stopping_criteria=None,
                                   logits_processor=None)
    return output
# question = "What does the PDN stand for?"
# create_completion(question)

## List of benchmark questions
1. What does the short form [INSERT TOOL NAME] stand for?
2. What are the goals of OpenROAD?
3. What is the purpose of the [INSERT TOOL NAME] in OpenROAD?
5. How do you create a...

In [29]:
tools = ['par', 'rmp', 'ifp', 'ppl', 'pad']
final = []
for tool in tools:
    question = f'What does the module name "{tool}" stand for?'
    final.append(create_completion(question)["choices"][0]["text"])

llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from ../llama.cpp/models/mistral-instruct/ggml-model-q4_0.gguf (version GGUF V3 (latest))
llama_model_loader: - tensor    0:                token_embd.weight q4_0     [  4096, 32000,     1,     1 ]
llama_model_loader: - tensor    1:              blk.0.attn_q.weight q4_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    2:              blk.0.attn_k.weight q4_0     [  4096,  1024,     1,     1 ]
llama_model_loader: - tensor    3:              blk.0.attn_v.weight q4_0     [  4096,  1024,     1,     1 ]
llama_model_loader: - tensor    4:         blk.0.attn_output.weight q4_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    5:            blk.0.ffn_gate.weight q4_0     [  4096, 14336,     1,     1 ]
llama_model_loader: - tensor    6:              blk.0.ffn_up.weight q4_0     [  4096, 14336,     1,     1 ]
llama_model_loader: - tensor    7:            blk.0.ffn_down.weight q4_0     

In [30]:
for i in final:
    print(i)

<s>[INST]You are now an expert in OpenROAD EDA software. 

    
    Here is an excerpt of code or documentation you may refer to:
    EMS_NONE,
     "The GROUP REGION pt pt statement is obsolete in version 5.5 and "
     "later.\nThe DEF parser will ignore this statement.",
     -1,
     0},
    {7028,
     EMS_NONE,
     "The GROUP SOFT MAXX statement is obsolete in version 5.5 and later.\nThe "
     "DEF parser will ignore this statement.",
     -1,
     0},
    {7029,
     EMS_NONE,
     "The GROUP SOFT MAXY statement is obsolete in version 5.5 and later.\nThe "
     "DEF parser will ignore this statement.",
     -1,
     0},
    {7030,
     EMS_NONE,
     "The GROUP SOFT MAXHALFPERIMETER statement is obsolete in version 5.5 and "
     "later.\nThe DEF parser will ignore this statement.",
     -1,
     0},
    {7031,
     EMS_NONE,
     "The ASSERTIONS statement is obsolete in version 5.4 and later.\nThe DEF "
     "parser will ignore this statement.",
     -1,
     0},
    {7032,
 

## Baseline: No RAG

In [25]:
from llama_cpp import Llama

def create_no_rag_completion(question):
    # Step 5: Define the prompt template
    # source: https://chat.openai.com/share/c85e64f6-4dd2-4920-b82e-78f128898cbb
    template = """<s>[INST]You are now an expert in OpenROAD EDA software. 
    
    {question}[/INST]"""
    prompt = ChatPromptTemplate.from_template(template)
    final_prompt = prompt.format_messages(question = question)[0].content
    
    # Step 7: Use llama-cpp-python as a prototype. 
    llm = Llama(model_path="../llama.cpp/models/mistral-instruct/ggml-model-q4_0.gguf", n_ctx=8000)
    output = llm.create_completion(final_prompt,
                                   suffix=None,
                                   max_tokens=0, # set this to 0 for no limit on tokens (depend on n_ctx)
                                   temperature=0.8, # higher temperature, more factual.
                                   top_p=0.95,
                                   logprobs=None,
                                   echo=True,
                                   stop=[],
                                   frequency_penalty=0.0,
                                   presence_penalty=0.0,
                                   repeat_penalty=1.1,
                                   top_k=40,
                                   stream=False,
                                   tfs_z=1.0,
                                   mirostat_mode=0,
                                   mirostat_tau=5.0,
                                   mirostat_eta=0.1,
                                   model=None,
                                   stopping_criteria=None,
                                   logits_processor=None)
    return output
# create_no_rag_completion(question)

## Run benchmark

In [31]:
tools = ['par', 'rmp', 'ifp', 'ppl', 'pad']
final2 = []
for tool in tools:
    question = f'What does the short form "{tool.upper()}" stand for?'
    final2.append(create_no_rag_completion(question)["choices"][0]["text"])

llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from ../llama.cpp/models/mistral-instruct/ggml-model-q4_0.gguf (version GGUF V3 (latest))
llama_model_loader: - tensor    0:                token_embd.weight q4_0     [  4096, 32000,     1,     1 ]
llama_model_loader: - tensor    1:              blk.0.attn_q.weight q4_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    2:              blk.0.attn_k.weight q4_0     [  4096,  1024,     1,     1 ]
llama_model_loader: - tensor    3:              blk.0.attn_v.weight q4_0     [  4096,  1024,     1,     1 ]
llama_model_loader: - tensor    4:         blk.0.attn_output.weight q4_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    5:            blk.0.ffn_gate.weight q4_0     [  4096, 14336,     1,     1 ]
llama_model_loader: - tensor    6:              blk.0.ffn_up.weight q4_0     [  4096, 14336,     1,     1 ]
llama_model_loader: - tensor    7:            blk.0.ffn_down.weight q4_0     

In [32]:
for i in final2:
    print(i)

<s>[INST]You are now an expert in OpenROAD EDA software. 
    
    What does the short form "PAR" stand for?[/INST]
I am sorry, but I do not have any knowledge about "OpenROAD EDA software". Can you please provide more context or clarification about what you are asking?
<s>[INST]You are now an expert in OpenROAD EDA software. 
    
    What does the short form "RMP" stand for?[/INST
RMP stands for Routing, Mapping, and Planning. It is a shortened version of the OpenROAD suite of tools. The "RMP" is used as a catch-all term to refer to all the functionality of OpenROAD that deals with routing, mapping, and planning of digital circuits on physical substrates such as silicon wafers or printed circuit boards.
<s>[INST]You are now an expert in OpenROAD EDA software. 
    
    What does the short form "IFP" stand for?[/INST]
  IFP stands for Integrated Flowpath Planning in OpenROAD EDA software. This module allows users to plan and optimize flowpaths through a design, which is particularly