## Installations

In [1]:
# %%capture
# !pip install llama-parse

## Imports

In [2]:
import nest_asyncio

nest_asyncio.apply()


In [5]:

import os
import json
import shutup
from llama_parse import LlamaParse
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_groq import ChatGroq
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

from langchain_community.embeddings import HuggingFaceEmbeddings


shutup.please()

go to llamaparse and make a account here and get a key -> [llamaparse](https://cloud.llamaindex.ai/login)

In [6]:

# API access to llama-cloud
os.environ["LLAMA_CLOUD_API_KEY"] = "llx-your-own-key"

In [7]:
documents = LlamaParse(result_type="markdown").load_data("sahib-cv_can-flowcv.pdf")


Started parsing the file under job_id de49e08f-2087-4168-87da-1a5f5eece1da


## Sample output

In [8]:
print(documents[0].text[:1000] + "...")

## Sahibpreet Singh

Data Scientist

ss9334931@gmail.com
4 Larkberry Road, Ontario

https://www.linkedin.com/in/sahibpreetsinghh/

### PROFILE

Accomplished Data Scientist specializing in NLP and Conversational AI. Top Kaggle mentor, ranked within the top 0.01%. Expert in cutting-edge models, delivering innovative solutions, and optimizing decision-making with data-driven insights. Strong analytical abilities, exceptional problem-solving skills, and a passion for impactful results.

### PROFESSIONAL EXPERIENCE

|Company|Date|Position|Location|
|---|---|---|---|
|Tatras Data|11/2022 – 12/2023|Data Scientist|Chandigarh, India|
|- Successfully delivered the product for Text-to-SQL problem using LLM with deployment using Streamlit and with Langchain as Framework.
- Led the development of a centralized system aimed at streamlining the creation of contextual chatbots for diverse counties across the United States.
- Transformed the chatbot development process by introducing a centralized serv

## Saving parsed pdf to Txt File

In [9]:
text_from_pdf = json.loads(documents[0].to_json())['text']
with open("resume.txt", "w") as text_file:
    text_file.write(text_from_pdf)


In [10]:

# Load data
loader = TextLoader("resume.txt")
docs = loader.load()

# Split text into chunks 
text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(docs)

In [11]:

# Define the embedding model
#using mixbread's embedding and in binary mode

embeddings = HuggingFaceEmbeddings(model_name="mixedbread-ai/mxbai-embed-large-v1",
                                   encode_kwargs = {'precision': 'binary'})


In [12]:
groq_key = 'gsk_your-own-key'

In [13]:


# FAISS the vector store 
vector = FAISS.from_documents(documents, embeddings)

# Define a retriever interface
retriever = vector.as_retriever()

# Define LLM
model = ChatGroq(temperature=0, groq_api_key=groq_key, model_name="llama3-8b-8192")

# Define prompt template
prompt = ChatPromptTemplate.from_template("""Answer the following question based only on the provided context:

<context>
{context}
</context>

Question: {input}""")

# Create a retrieval chain to answer questions
document_chain = create_stuff_documents_chain(model, prompt)
retrieval_chain = create_retrieval_chain(retriever, document_chain)

In [15]:
%%time
response = retrieval_chain.invoke({"input": "what is the name of the candidate"})
print(response["answer"])

The name of the candidate is Sahibpreet Singh.
CPU times: user 118 ms, sys: 352 ms, total: 470 ms
Wall time: 2.05 s


In [16]:
%%time
response = retrieval_chain.invoke({"input": "How many number of jobs in total he did?"})
print(response["answer"])

Based on the provided context, Sahibpreet Singh has had the following jobs:

1. Tatras Data (11/2022 - 12/2023) - Data Scientist
2. ZS Associates (11/2021 - 11/2022) - Data Science Associate
3. Tatras Data (07/2020 - 10/2021) - Junior Data Scientist

So, in total, Sahibpreet Singh has had 3 jobs.
CPU times: user 125 ms, sys: 428 ms, total: 553 ms
Wall time: 2.06 s


In [17]:
%%time
response = retrieval_chain.invoke({"input": "what was the first job of candidate?"})
print(response["answer"])

According to the provided context, the first job of the candidate was as a Junior Data Scientist at Tatras Data from July 2020 to October 2021.
CPU times: user 94.6 ms, sys: 290 ms, total: 384 ms
Wall time: 1.64 s
