Skip to content

ramlasyaa/ResumeRank

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📄 ResumeRank — Automated Resume Screening System

Ranks candidate resumes against a job description using TF-IDF and BERT semantic embeddings. Outputs a ranked table with match scores.

Python scikit-learn BERT


📌 What Does It Do?

Manually reading hundreds of resumes is slow and inconsistent. ResumeRank automates this by:

  1. Reading your job description from a text file
  2. Extracting text from all candidate resumes (PDF / DOCX / TXT)
  3. Computing a similarity score between the JD and each resume
  4. Printing a ranked table of best-matched candidates
  5. Exporting results to a CSV file

🧠 How It Works

Method 1 — TF-IDF + Cosine Similarity (Default, Fast)

Job Description → TF-IDF Vector (keyword frequencies)
Each Resume     → TF-IDF Vector

Cosine Similarity = how "parallel" these vectors are
                  = 1.0 means perfect match, 0.0 means no overlap

Best for: pure keyword matching — when skills listed in the JD must appear in the resume.


Method 2 — BERT Sentence Embeddings (Accurate)

Job Description → 384-dimensional meaning vector (via MiniLM-L6 model)
Each Resume     → 384-dimensional meaning vector

Cosine Similarity between the two meaning vectors

Best for: semantic matching — understands that "Python Engineer" ≈ "Software Developer".


📁 Project Structure

ResumeRank/
├── config.py              # All settings (paths, method, thresholds)
├── extract_text.py        # Extract text from PDF / DOCX / TXT files
├── ranker.py              # TF-IDF and BERT ranking logic
├── main.py                # CLI entry point  ← Run this
├── job_description.txt    # Paste your job description here
├── requirements.txt
├── resumes/
│   ├── README.md          # Drop candidate resumes here
│   └── (your .pdf/.docx files)
└── results/
    └── ranked_resumes.csv ← Auto-generated output

🚀 Getting Started

1. Clone the repo

git clone https://github.com/ramlasyaa/ResumeRank.git
cd ResumeRank

2. Set up environment

python3 -m venv venv
source venv/bin/activate    # macOS/Linux
pip install -r requirements.txt

3. Add your job description

Edit job_description.txt with the role you're hiring for.

4. Add resumes

Drop .pdf, .docx, or .txt resumes into the resumes/ folder.

5. Run

# Default (TF-IDF, fast)
python main.py

# BERT-based semantic matching (more accurate)
python main.py --method bert

# Custom JD + custom folder
python main.py --jd path/to/jd.txt --resumes path/to/folder/

# Show top 5 only
python main.py --top 5

📊 Sample Output

╭──────┬──────────────────────┬─────────────┬────────────────╮
│ Rank │ Resume File          │ Match Score │ Match Level    │
├──────┼──────────────────────┼─────────────┼────────────────┤
│    1 │ alice_cv.pdf         │ 72.4%       │ 🟢 Strong Match │
│    2 │ bob_resume.docx      │ 61.3%       │ 🟢 Strong Match │
│    3 │ carol_profile.pdf    │ 48.7%       │ 🟡 Moderate     │
│    4 │ dave_resume.pdf      │ 31.2%       │ 🔴 Weak Match   │
╰──────┴──────────────────────┴─────────────┴────────────────╯

💾 Results saved to: results/ranked_resumes.csv

⚙️ Configuration

All settings live in config.py:

RANKING_METHOD      = "tfidf"       # "tfidf" or "bert"
TFIDF_MAX_FEATURES  = 5000
TFIDF_NGRAM_RANGE   = (1, 2)        # unigrams + bigrams
BERT_MODEL          = "all-MiniLM-L6-v2"
TOP_N               = 10

🛠️ Tech Stack

Component Tool / Library
Language Python 3.9+
TF-IDF Ranking Scikit-learn
BERT Ranking sentence-transformers (SBERT)
PDF Extraction PyPDF2
DOCX Extraction python-docx
Data Output Pandas (CSV)
CLI Table tabulate

💡 Real-World Uses

  • HR / Recruiting teams — automate first-round resume filtering
  • Job portals — rank applicants by JD fit automatically
  • University placement cells — shortlist students for company drives
  • Freelancers — filter projects that match your own skill set

Built by Ram Lasya · Amrita Vishwa Vidyapeetham

About

Automated resume screening system. Ranks candidate resumes against a job description using TF-IDF and BERT semantic embeddings. Outputs ranked table + CSV.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages