Resume Parser - AI-Driven Recruitment System

An intelligent resume parsing and candidate scoring system that extracts structured data from PDF/DOCX resumes, scores candidates against configurable criteria, and presents results through an interactive dashboard.

Screenshots

Verified Candidate	Rejected Candidate

How It Works

Select a job category from the sidebar (Data Science, Data Engineering, or Data Visualization) - or upload your own custom data dictionary (.xlsx)
Adjust scoring weights using the sidebar sliders - control how much each dimension (skills, degree, experience, certifications) contributes to the overall score
Upload one or more resumes (PDF or DOCX) - they are parsed concurrently with a progress bar
View scored results in the Table View tab - candidates are ranked by overall score, with "Verified" or "Rejected" status based on exclusion lists
Explore breakdowns across sub-tabs: Overall Score, Skill Score, Experience Score, Degree Score, Certification Score
Compare top candidates side-by-side using the Radar Chart tab - a polar overlay showing each candidate's strengths across all four dimensions
Save to database for future reference, or download as CSV for offline analysis
Review past results in the "View Existing Resumes" tab, filtered by job category

Features

Multi-format support - parse PDF and DOCX resumes
Automated extraction - contact info, education, skills, experience, certifications
Configurable scoring - weighted scoring across skill match, degree level, experience, and certifications
Data dictionaries - customizable skill taxonomies per job category (Data Science, Data Engineering, Data Visualization)
Candidate comparison - radar chart overlay comparing top candidates across all dimensions
Batch processing - concurrent parsing of multiple resumes with progress tracking
Persistent storage - SQLite backend for saving and reviewing past results
CSV export - download scored results for offline analysis

Architecture

flowchart LR
    A[PDF / DOCX Upload] --> B[Text Extraction]
    B --> C[Field Parsing]
    C --> D[Skill Matching]
    C --> E[Degree Detection]
    C --> F[Experience Extraction]
    C --> G[Certification Count]
    D & E & F & G --> H[Weighted Scoring]
    H --> I[Dashboard + Charts]
    H --> J[SQLite Storage]
    H --> K[CSV Export]

sequenceDiagram
    participant U as User
    participant S as Streamlit UI
    participant P as Parser
    participant SC as Scorer
    participant DB as SQLite

    U->>S: Upload resumes + select category
    S->>P: Parse each file (concurrent)
    P->>P: Extract text (PDF/DOCX)
    P->>P: Extract email, phone, degrees, skills, experience
    P-->>S: ResumeData objects
    S->>SC: Build results DataFrame
    SC->>SC: Rank-based scoring with weights
    SC-->>S: Scored DataFrame
    S->>U: Display table, charts, radar comparison
    U->>S: Click Save
    S->>DB: Store results (parameterized queries)

Scoring

Each candidate is scored across four dimensions, normalized by rank within the batch:

Overall Score = (Skill Score x W1) + (Degree Score x W2) + (Experience Score x W3) + (Certification Score x W4)

Dimension	What it measures	How
Skill Score	Match against data dictionary skill taxonomy	NLP n-gram tokenization + fuzzy matching
Degree Score	Education level	Fuzzy match against degree classification lists
Experience Score	Years of experience	Regex patterns for "X years", date ranges
Certification Score	Number of certifications	Keyword frequency count

Weights are adjustable via sidebar sliders (must sum to 100%).

Quick Start

# Clone
git clone https://github.com/samitmohan/resume-parser.git
cd resume-parser

# Install dependencies
uv sync

# Run
uv run streamlit run app.py

Open http://localhost:8501 in your browser.

Project Structure

resume-parser/
    app.py                  # Streamlit UI - layout, charts, user interaction
    src/
        __init__.py
        config.py           # Constants, regex patterns, degree lists, paths
        parser.py           # Text extraction + field parsing (PDF/DOCX)
        scorer.py           # Rank-based scoring and DataFrame construction
        database.py         # SQLite operations with parameterized queries
    data_dictionary/        # Excel files defining skill taxonomies per category
        Data Engineering.xlsx
        Data Science.xlsx
        Data Visualization.xlsx
    assets/
        icon_g.png
    temp/                   # Temporary upload directory (gitignored)
    pyproject.toml

Tech Stack

Python 3.10+
Streamlit - interactive web dashboard
PyMuPDF (fitz) - PDF text extraction
python-docx - DOCX text extraction
scikit-learn - CountVectorizer for n-gram tokenization
rapidfuzz - fuzzy string matching for degree/skill detection
Plotly - bar charts, scatter plots, radar charts
pandas - data manipulation and scoring
SQLite - persistent result storage

Data Dictionaries

Each .xlsx file in data_dictionary/ contains three sheets:

Sheet	Purpose
Skills	Skill segments with inclusion keywords for matching
Exclusion Skills	Keywords that trigger automatic rejection
Exclusion Company	Company names that trigger rejection

Upload a custom dictionary via the sidebar to define your own scoring criteria.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
data_dictionary		data_dictionary
src		src
temp		temp
.gitignore		.gitignore
README.md		README.md
app.py		app.py
pyproject.toml		pyproject.toml
resume.sqlite		resume.sqlite
samitmohan_resume.pdf		samitmohan_resume.pdf
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Resume Parser - AI-Driven Recruitment System

Screenshots

How It Works

Features

Architecture

Scoring

Quick Start

Project Structure

Tech Stack

Data Dictionaries

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Resume Parser - AI-Driven Recruitment System

Screenshots

How It Works

Features

Architecture

Scoring

Quick Start

Project Structure

Tech Stack

Data Dictionaries

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages