### JOB MATCHER FOR RESUME

Have you ever been unsure if your resume matches the job you're applying for? Do you have all the key words in your resume to pass the initial resume screening round? If not, this program is just for you! 

With the current job market being so competitve, your resume needs to be nearly perfect and match every job you apply for to even get a chance to interview. In this notebook, I created a program that allows you to upload your resume and the job description and determine how well your resume matches the description and what words you are missing.

In [None]:
# import packages
import pdfplumber
import docx
import re
from sentence_transformers import SentenceTransformer, util
import textstat
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import string
nltk.download('punkt_tab')

model = SentenceTransformer('all-MiniLM-L6-v2')

  from tqdm.autonotebook import tqdm, trange
[nltk_data] Downloading package punkt_tab to
[nltk_data]     /Users/riyamhatre/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


In [None]:
def extract_text(filepath):
    if filepath.endswith(".pdf"):
        with pdfplumber.open(filepath) as pdf:
            return "\n".join(page.extract_text() for page in pdf.pages if page.extract_text())
    elif filepath.endswith(".docx"):
        doc = docx.Document(filepath)
        return "\n".join(p.text for p in doc.paragraphs)
    elif filepath.endswith(".txt"):
        with open(filepath, "r", encoding="utf-8") as f:
            return f.read()
    else:
        raise ValueError("Unsupported file type")
def extract_keywords(text):
    return set(re.findall(r'\b[a-zA-Z]{3,}\b', text.lower()))

def remove_stopwords(text):
    stop_words = set(stopwords.words('english'))
    word_tokens = word_tokenize(text)
    return ' '.join(w for w in word_tokens if w.lower() not in stop_words and w not in string.punctuation)


In [None]:
def analyze_resume(resume_path, jd):
    resume_text = extract_text(resume_path).replace('\n', ' ')
    jd_text = jd
    resume_text = remove_stopwords(resume_text)
    jd_text = remove_stopwords(jd_text)
    # Semantic similarity
    resume_vec = model.encode(resume_text)
    jd_vec = model.encode(jd_text)
    similarity = util.cos_sim(resume_vec, jd_vec).item()

    # Keyword comparison
    resume_keywords = extract_keywords(resume_text)
    jd_keywords = extract_keywords(jd_text)

    common = resume_keywords & jd_keywords
    missing = jd_keywords - resume_keywords
    match_score = 100 * len(common) / len(jd_keywords) if jd_keywords else 0


    # Readability
    readability = textstat.flesch_reading_ease(resume_text)

    # Results
    return {
        "semantic_similarity": round(similarity * 100, 2),
        "keyword_match_score": round(match_score, 2),
        "common_keywords": sorted(common),
        "missing_keywords": sorted(missing),
        "readability_score": round(readability, 2)
    }

In [3]:
a = analyze_resume('/Users/riyamhatre/Desktop/mhatre_riya_resume.pdf','''Direct experience writing complex queries in SQL from scratch
Strong data visualization skills using Power BI, Tableau, Looker, or equivalent
Exposure to Azure Data Warehouse
Preferrably experienced with statistical programming languages such as R, Python, etc.
 Experience in analytics, advanced analytics/statistics, predictive modeling
 Strong analytic skills with the ability to extract, collect, organize, analyze and interpret trends or patterns in complex data sets
 Demonstrated project management skills
 Effective interpersonal, verbal and written communication skills''')

print(a)

{'semantic_similarity': 52.52, 'keyword_match_score': 38.78, 'common_keywords': ['analytics', 'data', 'effective', 'experience', 'languages', 'management', 'modeling', 'power', 'predictive', 'project', 'python', 'skills', 'sql', 'statistical', 'tableau', 'trends', 'using', 'visualization', 'writing'], 'missing_keywords': ['ability', 'advanced', 'analytic', 'analyze', 'azure', 'collect', 'communication', 'complex', 'demonstrated', 'direct', 'equivalent', 'etc', 'experienced', 'exposure', 'extract', 'interpersonal', 'interpret', 'looker', 'organize', 'patterns', 'preferrably', 'programming', 'queries', 'scratch', 'sets', 'statistics', 'strong', 'verbal', 'warehouse', 'written'], 'readability_score': -46.24}
