# Retrievel Augmented Generation (RAG) For Resume and Work Portfolio Analysis

This project is aimed at creating a Retrieval Augmented (RAG) pipeline to enable users to ask questions from my Resume and Work Portfolio.

In this notebook, we will follow three approaches to implement Retrieval Augmented Generation (RAG):
1. Basic RAG Pipeline:
    - Basic Indexing of Documents using LlamaIndex.
    - Querying from the Created Index.
2. Advanced RAG Pipeline - Sentence Window Retrieval:
    - Creating Sentence Window Based Index (Breaks down documents into smaller chunks like sentences).
    - Querying most relevant chunks along with surrounding context.

3. Advanced RAG Pipeline - Auto-merging retrieval:
    - Creating Automerging Retrieval Based Index.
    - Querying using auto-merging retrieval which merges information from multiple sources or segments of text to create a more comprehensive and contextually relevant response to a query

## Basic RAG Pipeline

In [23]:
# Setting environment and importing required dependencies.
import os
import pandas as pd
from llama_index import Document
from llama_index.llms import OpenAI
from utils import build_sentence_window_index
from llama_index import SimpleDirectoryReader
from llama_index import VectorStoreIndex, ServiceContext
from local_variables import OPENAI_API_KEY, HUGGINGFACE_API_KEY

os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY
os.environ['HUGGINGFACE_API_KEY'] = HUGGINGFACE_API_KEY

In [24]:
# Importing Resume and Work Portfolio files.
documents = SimpleDirectoryReader(
    input_files=["./FAIZAN KHAN Resume NLP.pdf","./Faizan Work Corpus.pdf"]
).load_data()

In [25]:
document = Document(text="\n\n".join([doc.text for doc in documents]))

In [26]:
# Creating a Vector Store from the input documents.
llm_openai = OpenAI(model="gpt-3.5-turbo", temperature=0.1)
service_context = ServiceContext.from_defaults(llm=llm_openai, embed_model="local:BAAI/bge-small-en-v1.5")
index = VectorStoreIndex.from_documents([document],service_context=service_context)

In [27]:
# Querying from the Vector Store created.
query_engine = index.as_query_engine()

In [28]:
response = query_engine.query(
    "Where did Faizan Khan study?"
)

In [29]:
print(str(response))

Faizan Khan studied at Carnegie Mellon University (CMU) in Pittsburgh, PA and Vellore Institute of Technology (VIT) in Vellore, India.


## Evaluation setup

In [30]:
# Importing custom questions from a questions file.
eval_questions = []
with open('eval_questions.txt', 'r') as file:
    for line in file:
        item = line.strip()
        print(item)
        eval_questions.append(item)

What work did Faizan do at Finetune Learning?
What were the important courses Faizan did at Carnegie Mellon University(CMU)?
What are Faizan's top 5 skills?
Rate Faizan's Natural Language Processing experience on a scale of 1-10.
Does Faizan have any experience coding Deep Learning architectures from scratch?
What is Faizan's most impressive project?
Does Faizan have any experience in Search? If so, summarize his experience.
What is the right AI job for Faizan?


In [31]:
all_responses = []
for question in eval_questions:
    response = query_engine.query(question)
    all_responses.append(response)

In [32]:
# Creating an output dataframe.
responses_df = pd.DataFrame({"Sample_Questions":eval_questions, "Model_Responses_Basic":all_responses})

In [33]:
responses_df

Unnamed: 0,Sample_Questions,Model_Responses_Basic
0,What work did Faizan do at Finetune Learning?,Faizan Khan worked as an AI & ML Applied Scien...
1,What were the important courses Faizan did at ...,Faizan Khan completed the following important ...
2,What are Faizan's top 5 skills?,Faizan's top 5 skills are Natural Language Pro...
3,Rate Faizan's Natural Language Processing expe...,Faizan's Natural Language Processing experienc...
4,Does Faizan have any experience coding Deep Le...,"Yes, Faizan has experience coding Deep Learnin..."
5,What is Faizan's most impressive project?,Faizan's most impressive project is the develo...
6,Does Faizan have any experience in Search? If ...,Faizan Khan has experience in search. He has i...
7,What is the right AI job for Faizan?,"Based on the context information provided, the..."


# Advanced RAG Pipeline

### Sentence Window Retrieval

In [34]:
# Building a Sentence Window Based Index.
sentence_index = build_sentence_window_index(
    document,
    llm_openai,
    embed_model="local:BAAI/bge-small-en-v1.5",
    save_dir="sentence_index"
)

In [35]:
from utils import get_sentence_window_query_engine

sentence_window_engine = get_sentence_window_query_engine(sentence_index)

In [36]:
# Querying the sentence window based engine.
window_response = sentence_window_engine.query(
    "How do I contact Faizan?"
)
print(str(window_response))

You can contact Faizan Khan through his email address at faizankh29@gmail.com or by phone at (412) -419-4560. Additionally, you can connect with him on LinkedIn at www.linkedin.com/in/faizan-khan-nlp.


In [37]:
# Recording responses for all questions.
all_responses_sentence_window = []
for question in eval_questions:
    response = sentence_window_engine.query(question)
    all_responses_sentence_window.append(response)
    print(question)
    print(str(response))

What work did Faizan do at Finetune Learning?
Faizan Khan worked as an AI & ML Applied Scientist at Finetune Learning. In this role, he implemented machine-learned models for understanding queries and ranking search results. He also innovated in prompt engineering by leveraging GPT-3.5 and GPT-4 models. Additionally, Faizan led the innovation and implementation of a Chain-of-Thought (CoT) strategy, enhancing the system's capability to tag and score questions efficiently. He worked closely with Subject Matter Experts and technical teams to achieve an EM of 78% and F1 of 85%.
What were the important courses Faizan did at Carnegie Mellon University(CMU)?
Faizan Khan took the following important courses at Carnegie Mellon University (CMU): Introduction to Deep Learning, Question Answering, Advanced Natural Language Processing, and Data Mining.
What are Faizan's top 5 skills?
Faizan's top 5 skills are Natural Language Processing, Machine Learning, Deep Learning, Image Processing, and Audio 

In [38]:
responses_df = pd.DataFrame({"Sample_Questions":eval_questions,"Model_Responses_Basic":all_responses, "Model_Responses_Sentence_Window":all_responses_sentence_window})

### Auto-Merging Retrieval

In [39]:
# Building an Auto-Merging Index
from utils import build_automerging_index

automerging_index = build_automerging_index(
    documents,
    llm_openai,
    embed_model="local:BAAI/bge-small-en-v1.5",
    save_dir="merging_index"
)

In [40]:
from utils import get_automerging_query_engine

automerging_query_engine = get_automerging_query_engine(
    automerging_index,
)

In [41]:
# Querying based on auto-merging retrieval.
auto_merging_response = automerging_query_engine.query(
    "Summarize Faizan's experience in Video processing."
)
print(str(auto_merging_response))

> Merging 3 nodes into parent node.
> Parent node id: 97765485-74d9-4142-87ff-ad00a0c603f2.
> Parent node text: ●
Implemented
(from
scratch)
a
topic
modeling
algorithm
for
gaining
personality
insights
from
wri...

Faizan has experience in video processing. He has developed and deployed an application that extracts prosodic features from audio and combines them with a recurrent neural network-based text summation model to produce videos that summarize the results.


In [42]:
# Recording responses for all questions.
all_responses_automerging = []
for question in eval_questions:
        response = automerging_query_engine.query(question)
        all_responses_automerging.append(response)

> Merging 2 nodes into parent node.
> Parent node id: 405ee8b1-3cc3-4d84-ab0e-c5ee4a831df7.
> Parent node text: Projects
Before
2020:
●
Successfully
completed
an
Emotionally
Aware
personal
assistant
using
a
Re...

> Merging 2 nodes into parent node.
> Parent node id: 2bd59080-0ff0-40be-8cf6-028e69652819.
> Parent node text: ●
State
level
Chess
player
in
the
U-16
category.
Skills
Technical
Skills
:
Natural
Language
Proce...

> Merging 2 nodes into parent node.
> Parent node id: 2bd59080-0ff0-40be-8cf6-028e69652819.
> Parent node text: ●
State
level
Chess
player
in
the
U-16
category.
Skills
Technical
Skills
:
Natural
Language
Proce...

> Merging 2 nodes into parent node.
> Parent node id: 27ad8850-8f3d-4ab5-babd-337b5ec7fd25.
> Parent node text: Surpassed the State -of-the-Art, with 97% recall.  
SKILLS   
Natural Language Processing:  Deep ...

> Merging 2 nodes into parent node.
> Parent node id: d805904d-8c8e-4e53-9cfa-0989486cbc63.
> Parent node text: ●
Contract
and
Invoice
Automatio

In [45]:
# Taking a look at results from all the approaches.
responses_df = pd.DataFrame({"Sample_Questions":eval_questions,
                             "Model_Responses_Basic":all_responses,
                              "Model_Responses_Sentence_Window":all_responses_sentence_window,
                              "Model_ResponseS_Auto_Merging":all_responses_automerging})

In [46]:
responses_df

Unnamed: 0,Sample_Questions,Model_Responses_Basic,Model_Responses_Sentence_Window,Model_ResponseS_Auto_Merging
0,What work did Faizan do at Finetune Learning?,Faizan Khan worked as an AI & ML Applied Scien...,Faizan Khan worked as an AI & ML Applied Scien...,Faizan implemented machine-learned models for ...
1,What were the important courses Faizan did at ...,Faizan Khan completed the following important ...,Faizan Khan took the following important cours...,Faizan did the following important courses at ...
2,What are Faizan's top 5 skills?,Faizan's top 5 skills are Natural Language Pro...,Faizan's top 5 skills are Natural Language Pro...,Faizan's top 5 skills are Natural Language Pro...
3,Rate Faizan's Natural Language Processing expe...,Faizan's Natural Language Processing experienc...,Faizan's Natural Language Processing experienc...,Faizan's Natural Language Processing experienc...
4,Does Faizan have any experience coding Deep Le...,"Yes, Faizan has experience coding Deep Learnin...","Yes, Faizan has experience coding Deep Learnin...","Yes, Faizan has experience coding Deep Learnin..."
5,What is Faizan's most impressive project?,Faizan's most impressive project is the develo...,Faizan's most impressive project is the implem...,Faizan's most impressive project is the develo...
6,Does Faizan have any experience in Search? If ...,Faizan Khan has experience in search. He has i...,"Yes, Faizan Khan has experience in search. He ...","Yes, Faizan has experience in Search. His expe..."
7,What is the right AI job for Faizan?,"Based on the context information provided, the...",The right AI job for Faizan would be a positio...,The right AI job for Faizan would be a positio...
