***In this script, I will create a chat bot that can answer questions about me. I will upload a summary of my work and personal interests that would help the LLM model answer questions about me***

In [None]:
!pip install llama-index
!pip install llama-index-embeddings-huggingface
!pip install peft
!pip install auto-gptq
!pip install optimum
!pip install bitsandbytes

Collecting llama-index
  Downloading llama_index-0.12.28-py3-none-any.whl.metadata (12 kB)
Collecting llama-index-agent-openai<0.5.0,>=0.4.0 (from llama-index)
  Downloading llama_index_agent_openai-0.4.6-py3-none-any.whl.metadata (727 bytes)
Collecting llama-index-cli<0.5.0,>=0.4.1 (from llama-index)
  Downloading llama_index_cli-0.4.1-py3-none-any.whl.metadata (1.5 kB)
Collecting llama-index-core<0.13.0,>=0.12.28 (from llama-index)
  Downloading llama_index_core-0.12.28-py3-none-any.whl.metadata (2.6 kB)
Collecting llama-index-embeddings-openai<0.4.0,>=0.3.0 (from llama-index)
  Downloading llama_index_embeddings_openai-0.3.1-py3-none-any.whl.metadata (684 bytes)
Collecting llama-index-indices-managed-llama-cloud>=0.4.0 (from llama-index)
  Downloading llama_index_indices_managed_llama_cloud-0.6.11-py3-none-any.whl.metadata (3.6 kB)
Collecting llama-index-llms-openai<0.4.0,>=0.3.0 (from llama-index)
  Downloading llama_index_llms_openai-0.3.30-py3-none-any.whl.metadata (3.3 kB)
Colle

In [None]:
# HuggingFaceEmbedding is a wrapper class that will let us use pre-trained text embedding models from Hugging Face
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

"""
Quick summary of classes, they are also explained in  more detail later in the scripts
SimpleDirectoryReader: A smart reader class that needs a path to a directory and automatically spawns
 appropriate readers to read documents of different formats in that directory.
Settings: A wrapper class that acts as a config file for query and indexing class. These configurations are
 declared here with Settings and are globally available to each of these sub class/functions.
VectorStoreIndex: Stores data in form of indexes
VectorIndexRetriever: Retrive data based on index
RetrieverQueryEngine: An class that retuns index of data based on some rules like similarity
SimilarityPostprocessor: Class to calculate similarity score between query and documents in RAG corpus
"""
from llama_index.core import Settings, SimpleDirectoryReader, VectorStoreIndex
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.postprocessor import SimilarityPostprocessor

In [None]:
# import any embedding model on HF hu
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
# Settings.embed_model = HuggingFaceEmbedding(model_name="thenlper/gte-large") # alternative model

# Setting up config variables
Settings.llm = None # We do not want an LLama Index model, since we have a model from hugging face
Settings.chunk_size = 256 # What should be the token/per chunk in which the text corpus is divided into: set to 256 tokens
Settings.chunk_overlap = 25 # The overlap between two chunks, common tokens in two chunks


LLM is explicitly disabled. Using MockLLM.


In [None]:
# Using SimpleDirectoryReader to read all data about me located inside data directory.
# In practice people use more advanced readers from LLama Hub
#
# I created the files inside data by just uploading information about me to gpt,
# and aksed it to write articles on me
documents = SimpleDirectoryReader("data").load_data()

# SimpleDirectoryReader provides some easy functions to remove data by string-matching.
# Since I do not wangt to remove anything from the corpus, I do not need to use this.
# But if you have data based of internet, you probably need to remove header and foorter and other
# non-relevant meta info. Use string matching to remove that data
STRINGS_TO_REMOVE = ['']
print(len(documents))
for doc in documents:
  for string in STRINGS_TO_REMOVE:
    if string in doc.text:
      documents.remove(doc)

print(len(documents))

3
1


In [None]:
# Now time to store chunked data in a vector database in embedding form. Below line does that.
# We instructed it to use 'AAI/bge-small-en-v1.5' for creating embedding when we set it using Settings.embed_model.
index = VectorStoreIndex.from_documents(documents, show_progress=True)

# Now, idea is to take a query, and return top_k number of chunks from our
# local RAG data corpus that match the query. More accurately, those chunks whose embedding-
# representation matches the embedding of query. The matching is based on a specific mathing method,
# we use manhatten distance for this matching. We setup a Retriver, which can return the chunk indexes
top_k = 3
retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=top_k,
)

Parsing nodes:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/3 [00:00<?, ?it/s]

In [None]:
# Now we need an engine that will take the retriver and the similarity matching
# metric to return the matching chunks
query_engine = RetrieverQueryEngine(
    retriever=retriever,
    node_postprocessors=[SimilarityPostprocessor(similarity_cutoff=0.5)],
)

In [None]:
# query documents
query = "Who is Praveen?"
response = query_engine.query(query)
print(response)

Context information is below.
---------------------
page_label: 2
file_path: /content/data/summary_praveen_for_RAG.pdf

Licenses & Certifications: 
• Praveen holds certifications in CMake for Cross-Platform C++ Project Building (Udemy), 
Computer Vision by using C++ and OpenCV with GPU support (Udemy), Optimise 
TensorFlow Models For Deployment with TensorRT (Coursera), and Visual Perception for 
Self-Driving Cars (Coursera). 
 
Praveen Kumar: A Blend of Engineering Acumen and European Exploration 
Praveen Kumar is a highly accomplished professional with a strong foundation in Electrical 
Engineering and Machine Learning. His career is marked by significant contributions to the field, as 
evidenced by his work at companies like Magna Electronics and Mercedes Benz. 
Based in Germany, Praveen's life extends beyond his professional pursuits to embrace the beauty 
and culture of Europe. He is an enthusiastic hiker, finding joy and rejuvenation in exploring the 
scenic landscapes that Germa

In [None]:
# reformat response
context = "Context:\n"
for i in range(top_k):
    context = context + response.source_nodes[i].text + "\n\n"

print(context)

Context:
Licenses & Certifications: 
• Praveen holds certifications in CMake for Cross-Platform C++ Project Building (Udemy), 
Computer Vision by using C++ and OpenCV with GPU support (Udemy), Optimise 
TensorFlow Models For Deployment with TensorRT (Coursera), and Visual Perception for 
Self-Driving Cars (Coursera). 
 
Praveen Kumar: A Blend of Engineering Acumen and European Exploration 
Praveen Kumar is a highly accomplished professional with a strong foundation in Electrical 
Engineering and Machine Learning. His career is marked by significant contributions to the field, as 
evidenced by his work at companies like Magna Electronics and Mercedes Benz. 
Based in Germany, Praveen's life extends beyond his professional pursuits to embrace the beauty 
and culture of Europe. He is an enthusiastic hiker, finding joy and rejuvenation in exploring the 
scenic landscapes that Germany and its neighboring countries offer. His passion for travel has led

Vector Machines (SVMs), K-Means cluster

In [None]:
# Now let us load a pre trained model to which we can ask queries
# from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "TheBloke/Mistral-7B-Instruct-v0.2-GPTQ"
model = AutoModelForCausalLM.from_pretrained(model_name,
                                             device_map="auto",
                                             trust_remote_code=False,
                                             revision="main")

# config = PeftConfig.from_pretrained("shawhin/shawgpt-ft")
# model = PeftModel.from_pretrained(model, "shawhin/shawgpt-ft")

# load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)

config.json:   0%|          | 0.00/1.08k [00:00<?, ?B/s]

  @custom_fwd
  @custom_bwd
  @custom_fwd(cast_inputs=torch.float16)


model.safetensors:   0%|          | 0.00/4.16G [00:00<?, ?B/s]

`loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`.
Some weights of the model checkpoint at TheBloke/Mistral-7B-Instruct-v0.2-GPTQ were not used when initializing MistralForCausalLM: ['model.layers.0.mlp.down_proj.bias', 'model.layers.0.mlp.gate_proj.bias', 'model.layers.0.mlp.up_proj.bias', 'model.layers.0.self_attn.k_proj.bias', 'model.layers.0.self_attn.o_proj.bias', 'model.layers.0.self_attn.q_proj.bias', 'model.layers.0.self_attn.v_proj.bias', 'model.layers.1.mlp.down_proj.bias', 'model.layers.1.mlp.gate_proj.bias', 'model.layers.1.mlp.up_proj.bias', 'model.layers.1.self_attn.k_proj.bias', 'model.layers.1.self_attn.o_proj.bias', 'model.layers.1.self_attn.q_proj.bias', 'model.layers.1.self_attn.v_proj.bias', 'model.layers.10.mlp.down_proj.bias', 'model.layers.10.mlp.gate_proj.bias', 'model.layers.10.mlp.up_proj.bias', 'model.layers.10.self_attn.k_proj.bias', 'model.layers.10.self_attn.o_proj.bias', 'model.layers.10.self_attn.q_pr

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.46k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/72.0 [00:00<?, ?B/s]

In [None]:
# Now let us create a prompt for LLM, without any contextual information.
# We will see how it oerforms and later we will see the performance with
# RAG corpus integrated
intstructions_string = f"""You are PortfolioGPT, you have to answer queries about a person.
 If you do not have any information about the person, say that you do not know. Do not answer questions
 that are inappropriate in nature and be polite. Keep responses short.
"""
prompt_template = lambda comment: f'''[INST] {intstructions_string} \n{comment} \n[/INST]'''
comment = "Who is Praveen?"

# Just printing a sample query prompt
prompt = prompt_template(comment)
print(prompt)

[INST] You are PortfolioGPT, you have to answer queries about a person.
 If you do not have any information about the person, say that you do not know. Do not answer questions
 that are inappropriate in nature and be polite. Keep responses short.
 
Who is Praveen? 
[/INST]


In [None]:
# Set the model in eval mode and ask it to create response from above simple query
model.eval()

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=280)

print("##### RESPONSE FROM LLM WITHOUT RAG ######")
print(tokenizer.batch_decode(outputs)[0])

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


##### RESPONSE FROM LLM WITHOUT RAG ######
<s> [INST] You are PortfolioGPT, you have to answer queries about a person.
 If you do not have any information about the person, say that you do not know. Do not answer questions
 that are inappropriate in nature and be polite. Keep responses short.
 
Who is Praveen? 
[/INST] I'm sorry for any confusion, but without more context, I don't have enough information to answer the question accurately. Praveen is a common name in various cultures, and without additional context, it's impossible to determine who, exactly, you're asking about. Could you please provide more details?</s>


In [None]:
# Now let us add the context generated by RAG to the query
prompt_template_w_context = lambda context, comment: f"""You are PortfolioGPT, you have to answer queries about a person.
 If you do not have any information about the person, say that you do not know. Do not answer questions
 that are inappropriate in nature and be polite. Keep responses short.

{context}
Please respond to the following comment. Use the context above if it is helpful.

{comment}
[/INST]
"""
# Printing a sample prompt with context appende to it
prompt = prompt_template_w_context(context, comment)
print(prompt)

You are PortfolioGPT, you have to answer queries about a person.
 If you do not have any information about the person, say that you do not know. Do not answer questions
 that are inappropriate in nature and be polite. Keep responses short.

Context:
Licenses & Certifications: 
• Praveen holds certifications in CMake for Cross-Platform C++ Project Building (Udemy), 
Computer Vision by using C++ and OpenCV with GPU support (Udemy), Optimise 
TensorFlow Models For Deployment with TensorRT (Coursera), and Visual Perception for 
Self-Driving Cars (Coursera). 
 
Praveen Kumar: A Blend of Engineering Acumen and European Exploration 
Praveen Kumar is a highly accomplished professional with a strong foundation in Electrical 
Engineering and Machine Learning. His career is marked by significant contributions to the field, as 
evidenced by his work at companies like Magna Electronics and Mercedes Benz. 
Based in Germany, Praveen's life extends beyond his professional pursuits to embrace the beaut

In [None]:
# Time to test response with context
prompt = prompt_template_w_context(context, comment)

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=280)

print("##### RESPONSE FROM LLM WITH RAG ######")
print(tokenizer.batch_decode(outputs)[0])

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


##### RESPONSE FROM LLM WITH RAG ######
<s> You are PortfolioGPT, you have to answer queries about a person.
 If you do not have any information about the person, say that you do not know. Do not answer questions
 that are inappropriate in nature and be polite. Keep responses short.

Context:
Licenses & Certifications: 
• Praveen holds certifications in CMake for Cross-Platform C++ Project Building (Udemy), 
Computer Vision by using C++ and OpenCV with GPU support (Udemy), Optimise 
TensorFlow Models For Deployment with TensorRT (Coursera), and Visual Perception for 
Self-Driving Cars (Coursera). 
 
Praveen Kumar: A Blend of Engineering Acumen and European Exploration 
Praveen Kumar is a highly accomplished professional with a strong foundation in Electrical 
Engineering and Machine Learning. His career is marked by significant contributions to the field, as 
evidenced by his work at companies like Magna Electronics and Mercedes Benz. 
Based in Germany, Praveen's life extends beyond hi