# AI-Powered Resume Analyzer and Job Matcher
Kaggle x Google GenAI Capstone Project <br/> Heather Anderson <br/> April 2025

## Overview  
This project explores how generative AI can be used to help job seekers better understand how their resume aligns with job descriptions. It takes in a resume and a few job listings, then compares them using vector similarity and large language models. The system returns a ranked list of jobs that are a good fit and suggests ways to improve the resume based on the top match.

## What this project demonstrates  
- Document understanding: extracting structured information from text
- Structured output (JSON): converting unstructured resumes and job posts into clean, usable data
- Embeddings + vector search: comparing text similarity in a meaningful way
- Few-shot prompting: generating tailored suggestions to improve a resume


In [1]:
# Resume (as a string)
resume_text = """
Name: Alex Taylor
Email: alex.taylor@example.com
Phone: (555) 123-4567

Summary:
Data science professional with 3+ years of experience applying machine learning and data analysis to solve business problems. Passionate about natural language processing and deploying models in production.

Skills:
- Python, SQL, R
- Pandas, NumPy, Scikit-learn, TensorFlow, PyTorch
- Data visualization (Matplotlib, Seaborn, Plotly)
- NLP, Generative AI, Prompt Engineering
- Docker, Git, Flask, REST APIs

Experience:
Data Scientist | Insight Tech | Jan 2022 – Present
- Built customer churn prediction model with 85% accuracy
- Designed dashboards to track performance metrics in real time
- Collaborated with engineering to deploy models via REST APIs

Data Analyst | MarketPulse | Jul 2020 – Dec 2021
- Automated weekly reporting pipelines using Python and SQL
- Conducted A/B tests and presented findings to stakeholders

Education:
M.S. in Data Science – University of Florida (2020)
B.S. in Statistics – University of Georgia (2018)
"""

# Job descriptions
job_descriptions = [
    """Machine Learning Engineer – HealthAI
We are looking for a Machine Learning Engineer to build and deploy models in the healthcare domain. Responsibilities include:
- Developing scalable ML pipelines using Python
- Applying NLP to analyze patient records
- Collaborating with software engineers to integrate models into APIs
- Requirements: Python, TensorFlow/PyTorch, Docker, cloud experience
""",
    """Data Analyst – FinServe
Join our data team to support financial analytics. You will:
- Build dashboards for real-time metrics
- Perform exploratory data analysis
- Write efficient SQL queries and present findings
- Requirements: SQL, Python, BI tools, business acumen
""",
    """AI Research Assistant – DeepThink Labs
Help us push the frontiers of AI by assisting with experiments and research documentation. Tasks include:
- Running LLM experiments with different prompts
- Conducting literature reviews and summarizing papers
- Documenting experiment results in Jupyter notebooks
- Requirements: Python, Transformers, attention to detail, academic writing
"""
]


## Step 1: Extract structured information from a resume

The first step is to turn the raw text of a resume into structured data. This includes identifying things like skills, job titles, experience summaries, and education history. We use a prompt-based approach to simulate what a language model might extract from this kind of text.


In [2]:
import json

resume_prompt = f"""
You are an AI assistant that extracts structured information from resumes.

Parse the following resume and return a JSON object with these fields:
- Name
- Skills (as a list)
- Work Experience (as a list of dicts with keys: title, company, start_date, end_date, summary)
- Education (as a list of degrees with fields: degree, field, university, year)

Resume:
{resume_text}
"""

# Simulated response for now:
parsed_resume = {
    "name": "Alex Taylor",
    "skills": [
        "Python", "SQL", "R", "Pandas", "NumPy", "Scikit-learn",
        "TensorFlow", "PyTorch", "Matplotlib", "Seaborn", "Plotly",
        "NLP", "Generative AI", "Prompt Engineering", "Docker",
        "Git", "Flask", "REST APIs"
    ],
    "work_experience": [
        {
            "title": "Data Scientist",
            "company": "Insight Tech",
            "start_date": "Jan 2022",
            "end_date": "Present",
            "summary": "Built customer churn model, created dashboards, deployed models via APIs"
        },
        {
            "title": "Data Analyst",
            "company": "MarketPulse",
            "start_date": "Jul 2020",
            "end_date": "Dec 2021",
            "summary": "Automated reports, performed A/B testing"
        }
    ],
    "education": [
        {
            "degree": "M.S.",
            "field": "Data Science",
            "university": "University of Florida",
            "year": "2020"
        },
        {
            "degree": "B.S.",
            "field": "Statistics",
            "university": "University of Georgia",
            "year": "2018"
        }
    ]
}

print(json.dumps(parsed_resume, indent=2))


{
  "name": "Alex Taylor",
  "skills": [
    "Python",
    "SQL",
    "R",
    "Pandas",
    "NumPy",
    "Scikit-learn",
    "TensorFlow",
    "PyTorch",
    "Matplotlib",
    "Seaborn",
    "Plotly",
    "NLP",
    "Generative AI",
    "Prompt Engineering",
    "Docker",
    "Git",
    "Flask",
    "REST APIs"
  ],
  "work_experience": [
    {
      "title": "Data Scientist",
      "company": "Insight Tech",
      "start_date": "Jan 2022",
      "end_date": "Present",
      "summary": "Built customer churn model, created dashboards, deployed models via APIs"
    },
    {
      "title": "Data Analyst",
      "company": "MarketPulse",
      "start_date": "Jul 2020",
      "end_date": "Dec 2021",
      "summary": "Automated reports, performed A/B testing"
    }
  ],
  "education": [
    {
      "degree": "M.S.",
      "field": "Data Science",
      "university": "University of Florida",
      "year": "2020"
    },
    {
      "degree": "B.S.",
      "field": "Statistics",
      "univers

## Step 2: Extract structured information from job descriptions

Just like with the resume, we want to extract key details from each job posting using a language model. This includes the job title, company (if listed), required skills, and a summary of responsibilities. This helps us compare each role more effectively to the candidate’s background.

In [3]:
job_parsing_prompt_template = """
You are an AI assistant that extracts structured information from job descriptions.

For each job description, return a JSON object with the following fields:
- Title
- Company (if given)
- Required Skills (list)
- Responsibilities (list)

Job Description:
{job_text}
"""

# Simulated function to parse each job
def parse_job_description(job_text):
    prompt = job_parsing_prompt_template.format(job_text=job_text)
    
    # Simulated responses for now
    if "HealthAI" in job_text:
        return {
            "title": "Machine Learning Engineer",
            "company": "HealthAI",
            "required_skills": ["Python", "TensorFlow", "PyTorch", "Docker", "Cloud"],
            "responsibilities": [
                "Develop scalable ML pipelines",
                "Apply NLP to patient records",
                "Integrate models into APIs"
            ]
        }
    elif "FinServe" in job_text:
        return {
            "title": "Data Analyst",
            "company": "FinServe",
            "required_skills": ["SQL", "Python", "BI Tools", "Business Acumen"],
            "responsibilities": [
                "Build dashboards for metrics",
                "Perform data analysis",
                "Write SQL queries"
            ]
        }
    elif "DeepThink" in job_text:
        return {
            "title": "AI Research Assistant",
            "company": "DeepThink Labs",
            "required_skills": ["Python", "Transformers", "Academic Writing"],
            "responsibilities": [
                "Run LLM experiments",
                "Summarize papers",
                "Document results"
            ]
        }

# Apply parsing to all jobs
structured_jobs = [parse_job_description(jd) for jd in job_descriptions]


## Step 3: Match the resume with job descriptions using embeddings

To understand how well a resume aligns with a given job, we use text embeddings. These are numerical representations of text that let us measure similarity using cosine distance. By embedding both the resume and each job post, we can score how closely they match.


In [4]:
from sentence_transformers import SentenceTransformer, util
import torch

# Load embedding model
model = SentenceTransformer('all-MiniLM-L6-v2')


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [5]:
# Embed the resume
resume_embed = model.encode(resume_text, convert_to_tensor=True)

# Prepare job text (combine responsibilities + required skills into one string per job)
job_texts_for_embedding = []
for job in structured_jobs:
    skills = ", ".join(job["required_skills"])
    responsibilities = ". ".join(job["responsibilities"])
    job_text = f"{job['title']} at {job['company']}. Skills: {skills}. Responsibilities: {responsibilities}."
    job_texts_for_embedding.append(job_text)

# Embed all job descriptions
job_embeds = model.encode(job_texts_for_embedding, convert_to_tensor=True)


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

In [6]:
# Compute cosine similarities between resume and each job
similarities = util.cos_sim(resume_embed, job_embeds)[0]

# Pair each score with its job
job_matches = []
for i, score in enumerate(similarities):
    job = structured_jobs[i]
    job_matches.append({
        "title": job["title"],
        "company": job["company"],
        "score": round(score.item(), 3)
    })

# Sort jobs by match score
job_matches = sorted(job_matches, key=lambda x: x["score"], reverse=True)

# Display
print("Top Job Matches for This Resume:\n")
for match in job_matches:
    print(f"{match['title']} at {match['company']} — Match Score: {match['score']}")


Top Job Matches for This Resume:

Machine Learning Engineer at HealthAI — Match Score: 0.59
Data Analyst at FinServe — Match Score: 0.586
AI Research Assistant at DeepThink Labs — Match Score: 0.448


## Step 4: Suggest ways to improve the resume

Based on the top-matching job, we ask a language model to review the resume and offer 2–3 ways the candidate could improve it. These suggestions are tailored to the job description and are meant to help the candidate become a better fit. This step demonstrates few-shot prompting for creative, helpful responses.


In [8]:
# Get top job match
top_match_index = similarities.argmax().item()
top_job = job_texts_for_embedding[top_match_index]

# Prompt template
suggestion_prompt = f"""
You are a career coach assistant. A candidate is applying for the following job:

JOB DESCRIPTION:
{top_job}

Here is their current resume:
{resume_text}

Suggest 3 ways the candidate can improve their resume to be a better fit for this job.
Return the result in bullet points.
"""

# Simulated response
suggestions = [
    "Add experience or coursework related to NLP in healthcare if applicable.",
    "Mention any experience with cloud platforms (AWS, GCP, Azure) explicitly.",
    "Include examples of collaboration with cross-functional teams, especially software engineers."
]

# Display the suggestions
print("Resume Improvement Suggestions:\n")
for s in suggestions:
    print(f"- {s}")


Resume Improvement Suggestions:

- Add experience or coursework related to NLP in healthcare if applicable.
- Mention any experience with cloud platforms (AWS, GCP, Azure) explicitly.
- Include examples of collaboration with cross-functional teams, especially software engineers.


## Summary

In this project, a simple prototype of an AI-powered assistant was built for the purpose of helping people explore how well their resume matches job descriptions, and what they might improve. 

Using a combination of text understanding, structured generation, and vector similarity, we were able to:
- Parse a resumes and job descriptions (document understanding)
- Use LLMs to generate clean, usable JSON (structured output) 
- Semantically match resumes with job roles (embeddings + vector search)
- Provide personalized, intelligent, and actionable resume improvement suggestions (few-shot prompting)

This project simulates how modern ML systems streamline recruiting and personal branding, and demonstrates how generative AI can play a role in building smarter career tools.
