#### 1. Loading Libraries

In [1]:
import os
from pathlib import Path
from dotenv import load_dotenv
from PyPDF2 import PdfReader

from langchain_community.vectorstores import FAISS
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough

#### 2. Load env variable

In [2]:
load_dotenv()
groq_api = os.getenv("GROQ_API_KEY")

if not groq_api:
    raise ValueError("‚ùå GROQ_API_KEY not found in .env file.")
print("‚úÖ GROQ_API_KEY loaded successfully")

‚úÖ GROQ_API_KEY loaded successfully


#### 3. Initialise LLM

In [3]:
from langchain_groq import ChatGroq

# Initialize Groq model
llm = ChatGroq(
    model="qwen/qwen3-32b", 
    groq_api_key=groq_api,          
    temperature=0.3
)

# Quick test
try:
    response = llm.invoke("Say 'Groq connected' if you're working.")
    print("‚úÖ LLM Test:", response.content)
except Exception as e:
    raise RuntimeError(f"‚ùå Groq connection failed: {e}")

‚úÖ LLM Test: <think>
Okay, the user wants me to say "Groq connected" if I'm working. Let me check if I can do that. First, I need to understand what they mean by "Groq connected." Groq is a company that provides AI inference chips, so maybe they want confirmation that the model is using Groq's technology. However, as Qwen, I don't have access to Groq's hardware. I run on Alibaba's infrastructure. So I can't actually be "Groq connected." I should inform them that I can't connect to Groq and explain my current setup. Also, I should offer help with whatever they need. Let me phrase that clearly and politely.
</think>

I don't have access to Groq's hardware or capabilities. I'm Qwen, a large language model developed by Alibaba Cloud, running on Alibaba's infrastructure. How can I assist you today?


#### 4. Read files

In [4]:
def read_pdf(file_path):
    text = ""
    with open(file_path, "rb") as f:
        reader = PdfReader(f)
        for page in reader.pages:
            text += page.extract_text() or ""
    return text

#Paths of the folder
resumes_path = Path("../data/resumes")
jd_path = Path("../data/job_descriptions/jd1.txt")

#Load JD
with open(jd_path, "r", encoding="utf-8") as f:
    jd_text = f.read()

#Read Resume
resumes_texts = []
metadata = []

for f in resumes_path.glob("*.pdf"):
    resumes_texts.append(read_pdf(f))
    metadata.append({"file_name": f.name})

print(f"‚úÖ Loaded {len(resumes_texts)} resumes")

‚úÖ Loaded 4 resumes


In [9]:
resumes_texts

['Chartered Accountant (CA)\nName: Priya Sharma\nEmail: priya.sharma@example.com\nPhone: +91-XXXXXXXXXX\nLinkedIn: linkedin.com/in/priyasharma\nProfessional Summary:\nChartered Accountant with 6 years of experience in auditing, taxation, and financial advisory\nservices.\nSkills:\nAccounting, Taxation, Auditing, Financial Reporting, SAP, Excel, IFRS, GST\nExperience:\n- Conducted statutory audits for multiple companies in manufacturing and IT sectors.\n- Managed tax planning and compliance for clients, reducing liabilities by 15%.\n- Provided financial analysis and advisory to improve operational efficiency.\nEducation:\nCA Certification, Institute of Chartered Accountants of India, 2017\nB.Com, GHI University, 2014',
 'Business Analyst (BA)\nName: Rahul Kumar\nEmail: rahul.kumar@example.com\nPhone: +91-XXXXXXXXXX\nLinkedIn: linkedin.com/in/rahulkumar\nProfessional Summary:\nBusiness Analyst with 5 years of experience in data-driven decision making, requirement gathering,\nand reportin

#### 5. Create Embedding

In [5]:
embedding = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vectorstore = FAISS.from_texts(resumes_texts, embedding, metadatas=metadata)

  from .autonotebook import tqdm as notebook_tqdm


#### 6. Retrieve and Extract name 

In [6]:
results = vectorstore.similarity_search(jd_text, k=3)

# Extract candidate name using Groq
prompt = PromptTemplate(
    input_variables=["resume_text"],
    template="""
Extract ONLY the candidate's full name from the resume text below.
If not found, say "Unknown".

Resume:
{resume_text}
"""
)

extract_name_chain = prompt | llm | RunnablePassthrough()

def extract_candidate_name(resume_text):
    response = extract_name_chain.invoke({"resume_text": resume_text[:1000]})
    return response.content.strip() if hasattr(response, "content") else str(response)

#### 7. Final Output

In [7]:
for res in results:
    file_name = res.metadata.get("file_name", "Unknown")
    name = extract_candidate_name(res.page_content)
    print(f"üìÑ File: {file_name}")
    print(f"üë§ Candidate: {name}")
    print("-" * 50)

üìÑ File: Resume2.pdf
üë§ Candidate: <think>
Okay, let's see. The user wants me to extract only the candidate's full name from the resume text provided. If the name isn't found, I should say "Unknown". 

Looking at the resume, the first thing I notice is the line that starts with "Name: Rahul Kumar". That seems straightforward. The name is clearly mentioned there. But wait, I need to make sure there's no other name in the document that might be the candidate's full name. Let me scan through the rest of the resume.

The email address is "rahul.kumar@example.com", which uses the same name. The LinkedIn profile is linkedin.com/in/rahulkumar, which again matches the name given. The professional summary and experience sections don't mention any other names. The education section lists the university as DEF University, but that's not the candidate's name. 

So, the only name present is Rahul Kumar. There's no indication of a middle name or additional initials, but the user asked for the fu