# This project will score resumes (PDF files) based on a job description using LLMs & NLP. Here’s a step-by-step pipeline to solve this problem.

## Step 1: Problem Understanding & Requirements
Input:
- Resumes (PDF files)
- Job description (Text input)

Output:
- Score (0 - 100) based on relevance
- Feedback on missing/extra skills
-  Recommendations for improvements



## Step 2: Data Preprocessing
Extract text from PDFs

Use PyMuPDF (fitz) / pdfminer / PyPDF2
Preprocess text

Remove stopwords, punctuation, special characters

Convert to lowercase

In [102]:
import fitz  # PyMuPDF

def extract_text_from_pdf(pdf_path):
    text = ""
    doc = fitz.open(pdf_path)
    for page in doc:
        text += page.get_text()
    return text

resume_text = extract_text_from_pdf("./data/neeraj_resume.pdf")

print(resume_text)


Neeraj Nagar
# nnagar@uwaterloo.ca |  (+1) 437-808-2696 |  Portfolio | ï LinkedIn | § GitHub
Education
University of Waterloo, Canada
Sep 2023 - Aug 2025
MASc (thesis, computer software) in Electrical and Computer Engineering
Waterloo, Ontario
Scholarship: Graduate research studentship, International Master’s Award of Excellence (IMAE)
Indian Institute of Technology (IIT) BHU, India
Jul 2016 - May 2020
Bachelor of Technology in Electronics Engineering
Varanasi, India
Experience
Software Engineer, Camera Team, Samsung R&D – India
Feb 2021 – Jul 2023
• Developed and optimized camera software for Samsung smartphones, focusing on image stabilization, deblurring, and macro
photography. This improved user experience and elevated image quality across multiple models.
• Optimized the hyper-lapse video stabilization feature on flagship Samsung devices, delivering smoother, more stable video
capture, strengthening the product’s competitive edge in the market.
• Enhanced low-light photography p

## Step 3: Resume Parsing & Feature Extraction
Extract key features from resume
- Skills (NER - Named Entity Recognition)
- Experience (Years of experience)
- Education (Degree, university)
- Certifications & Projects

Libraries to Use
- spaCy (for Named Entity Recognition - NER)
- NLTK (for keyword extraction)



In [103]:
import spacy

nlp = spacy.load("en_core_web_sm")

def extract_skills(text):
    doc = nlp(text)
    # Named Entity Recognition (NER)
    # for ent in doc.ents:
    #  print(f"{ent.text} - {ent.label_}")

    skills = [ent.text for ent in doc.ents if ent.label_ in ["ORG", "PRODUCT", "WORK_OF_ART"]]
    return skills

resume_skills = extract_skills(resume_text)
print(resume_skills)


['Neeraj Nagar', 'GitHub\nEducation\nUniversity', 'Electrical', 'Computer Engineering', 'International Master’s Award of Excellence', 'Indian Institute of Technology', 'IIT', 'BHU', 'Bachelor of Technology in Electronics Engineering', 'Camera Team', 'Samsung', 'Samsung', '• Enhanced', 'Android 12 & 13', 'Galaxy S21', 'S22', 'Software', 'Samsung R&D', 'TensorFlow', 'MIT ADE20K', 'First Examination Report', 'FER', 'Enhanced Adaptive Equalization', 'User Behavior', 'IJACEN', 'Method', 'MiniZinc', 'Dijkstra', 'Arduino', 'Technical Skills\nProgramming Languages: C', 'PyTorch', 'GitHub', 'Visual Studio, MiniZinc', 'Gurobi', 'Data Structures', 'Neural Networks & Deep Learning\nCoursework:', 'Algorithm Design & Analysis', 'Operating Systems']


## Step 4: Resume Matching Using NLP & Embeddings
✅ Convert Resume & Job Description to Embeddings

Use OpenAI Embeddings, Sentence Transformers (BERT), or TF-IDF

Compute similarity score using Cosine Similarity

In [104]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

def compute_similarity(resume_text, job_desc):
    vectorizer = TfidfVectorizer()
    tfidf_matrix = vectorizer.fit_transform([resume_text, job_desc])
    similarity_score = cosine_similarity(tfidf_matrix[0], tfidf_matrix[1])
    return similarity_score[0][0] * 100  # Convert to percentage

# Read job description from a text file
with open("./data/amazon_SDE1.txt", "r", encoding="utf-8") as file:
    job_description = file.read().strip()  


resume_score = compute_similarity(resume_text, job_description)
print(f"Resume Match Score: {resume_score:.2f}%")


Resume Match Score: 42.16%


## Step 5: Scoring Mechanism
✅ Score Based on Multiple Factors

Skill Match (40%) → Extracted skills vs Job skills

Experience Match (30%) → Years of experience vs Required

Education Match (20%) → Degree vs Required

Certifications/Projects Match (10%)

In [105]:
def weighted_score(skill_score, exp_score, edu_score, proj_score):
    return (skill_score * 0.4) + (exp_score * 0.3) + (edu_score * 0.2) + (proj_score * 0.1)

final_score = weighted_score(80, 75, 90, 60)  # Example scores
print(f"Final Resume Score: {final_score:.2f}%")


Final Resume Score: 78.50%


## Step 6: Generate Feedback Using LLM
**Use GPT-4 or LLama to Provide Feedback**

What’s missing?

Recommendations to improve resume

Additional certifications to take

## Prompt Section

In [106]:
prompt_low = f"""
Compare the following resume text with the job description and provide:
1. Missing skills
2. Areas for improvement
3. Additional suggestions

Resume: {resume_text}
Job Description: {job_description}
"""

prompt_mid = f"""
You are an expert career coach and hiring manager. Your task is to analyze the candidate's resume against the given job description and provide **constructive** and **actionable** feedback.

### **Instructions:**
Compare the resume with the job description and provide **detailed** feedback under the following categories:

1 **Missing Skills & Qualifications:**  
- Identify the key skills, technologies, and qualifications that are required in the job description but missing in the resume.  
- If applicable, suggest how the candidate can gain these skills (e.g., courses, certifications, projects).  

2 **Areas for Improvement:**  
- Highlight weak areas in the resume (e.g., lack of experience, unclear descriptions, missing keywords).  
- Suggest improvements to make the resume better aligned with the job description.  

3 **Strengths & Competitive Edge:**  
- Identify the candidate's strong points and how they match the job requirements.  
- Suggest how they can further highlight these strengths in the resume.  

4 **Additional Recommendations:**  
- Resume formatting, structure, and clarity tips.  
- Any other advice that would improve the chances of getting the job.  

### **Candidate Resume:**  
{resume_text}

### **Job Description:**  
{job_description}

Provide feedback in a **clear, structured, and professional** manner.
"""


prompt_high = f"""
You are an advanced AI trained in resume evaluation and job matching. Your task is to analyze the candidate's resume based on the provided job description and **explain their resume score**, while offering **clear, structured, and actionable feedback**.

### **Candidate's Resume Score: {final_score}%**  

### **Score Interpretation:**  
- Explain what this score means in terms of job suitability.  
- If the score is **above 80%**, highlight why it's strong but suggest improvements.  
- If the score is **between 50-80%**, indicate which areas need moderate improvements.  
- If the score is **below 50%**, explain why it needs significant improvement.  

### **Detailed Feedback & Action Plan:**  

1 **Missing Skills & Gaps:**  
   - Identify the **critical** skills, qualifications, or experience missing from the resume based on the job description.  
   - Suggest **specific** ways to acquire these skills (e.g., courses, certifications, side projects).  

2 **Resume Strengths:**  
   - Highlight **strong aspects** of the resume that align well with the job description.  
   - Advise on how the candidate can further **emphasize these strengths** to increase their hiring chances.  

3 **Areas for Improvement:**  
   - Point out **weak or unclear sections** in the resume that could be **better aligned** with the job description.  
   - Suggest how to rewrite or restructure sections for **better impact**.  

4 **Keyword Optimization:**  
   - List **missing or underused keywords** that are important in the job description.  
   - Recommend how to naturally integrate them into the resume to **pass ATS (Applicant Tracking Systems).**  

5 **Final Recommendations:**  
   - Provide final tips on **formatting, clarity, and presentation**.  
   - Suggest if the candidate should add a portfolio, GitHub, LinkedIn updates, or other relevant materials.  

### **Candidate's Resume:**  
{resume_text}

### **Job Description:**  
{job_description}

Your response should be **detailed, structured, and professional**, offering **clear action steps** to help the candidate improve their resume and increase their hiring chances.
"""

### Mistral LLM model

In [111]:
import requests
import os
from dotenv import load_dotenv
load_dotenv()

# Mistral API Key
MISTRAL_API_KEY = os.getenv("MISTRAL_API_KEY")

def generate_feedback(resume_text, job_description, prompt):
    if not resume_text or not job_description:
        return "Please provide both resume text and job description."
    
    url = "https://api.mistral.ai/v1/chat/completions"
    headers = {
        "Authorization": f"Bearer {MISTRAL_API_KEY}",
        "Content-Type": "application/json"
    }

    
    # Define prompt for resume and job description comparison
    if not prompt:
        prompt = f"""
        You are an expert career coach and hiring manager. Your task is to analyze the candidate's resume against the given job description and provide **constructive** and **actionable** feedback.
        Please provide feedback in the following format:
        1. Missing skills
        2. Areas for improvement
        3. Additional suggestions

        Resume: {resume_text}
        Job Description: {job_description}
        """
    
    # Choose a Mistral AI Model (Options: "mistral-small", "mistral-medium", "mixtral")
    data = {
        "model": "mistral-medium",  # Best free-tier model
        # "model": "mixtral" # Best for complex tasks,  More powerful (higher accuracy)
        # "model": "mistral-small" # Best for simple tasks
        "messages": [{"role": "system", "content": prompt}]
    }
    
    # Make API request
    response = requests.post(url, headers=headers, json=data)

    # Handle Response
    if response.status_code == 200:
        return response.json()["choices"][0]["message"]["content"]
    else:
        return f"Error: {response.json()}"

# resume_text = "I am a Python Developer with experience in Machine Learning and AI."
# job_description = "Looking for a Python Developer with experience in ML and AI."
prompt = prompt_high
feedback = generate_feedback(resume_text, job_description, prompt)
print("Resume Feedback:\n", feedback)

Resume Feedback:
 **Resume Analysis and Feedback**

**Resume Score Interpretation:**

With a score of 78.5%, Neeraj's resume is strong and shows a good match for the Amazon Hiring Software Delivery (HSD) Team position. However, there are still areas for improvement to increase the score and make the resume stand out even more.

**Detailed Feedback & Action Plan:**

1. **Missing Skills & Gaps:**

   - Amazon specifically mentions the need for experience with AWS technologies like EC2, RDS/DynamoDB/RedShift. While Neeraj has a solid background in various programming languages and machine learning frameworks, there is no explicit mention of AWS experience. To address this gap, Neeraj should consider taking relevant AWS courses or obtaining certifications to demonstrate their proficiency in these technologies.
   
   - Familiarity with modern front-end technologies such as JavaScript, TypeScript, Module Federation, React, and React Native is essential for this role. Although Neeraj has exp

## Local model

In [108]:
import ollama

def generate_feedback(resume_text, job_description, prompt):
    if not resume_text or not job_description:
        return "Please provide both resume text and job description."
    if not prompt:
        prompt = f"""
        Compare the following resume text with the job description and provide:
        1. Missing skills
        2. Areas for improvement
        3. Additional suggestions

        Resume: {resume_text}
        Job Description: {job_description}
        """
    
    response = ollama.chat(model="mistral", messages=[{"role": "user", "content": prompt}])
    # response = ollama.chat(model="llama3.2", messages=[{"role": "user", "content": prompt}])

    return response["message"]["content"]

prompt = prompt_high
feedback = generate_feedback(resume_text, job_description, prompt)
print("Resume Feedback:\n", feedback)

Resume Feedback:
  Dear Candidate,

Thank you for submitting your application to Amazon's Hiring Software Delivery (HSD) Team. We appreciate your interest in joining our innovative team that shapes and influences Amazon's future growth. Here are some suggestions to enhance your resume and improve your chances of getting selected:

1. **Highlight your relevant experience**: Clearly mention your 1+ years of professional software development experience in the summary or objective section of your resume. Highlight any projects or accomplishments that demonstrate your skills in building scalable backend web services, APIs, and interactive interfaces using modern technologies such as JavaScript, TypeScript, Module Federation, React & React Native.

2. **Tailor your resume to the job description**: Ensure your resume is tailored to the job requirements of the HSD Team. Emphasize how you meet the key job responsibilities mentioned in the job posting, such as working through all phases of the p