<a href="https://colab.research.google.com/github/priyavarshini04/-PROJECTS-/blob/main/Resume_to_Job_Description_Matcher.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Resume to Job Description Matcher**

This code entirely works on TF-IDF and cosine similarity

**Install Required Libraries**

In [2]:
!pip install -q docx2txt PyPDF2 nltk scikit-learn


[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/232.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m232.6/232.6 kB[0m [31m7.0 MB/s[0m eta [36m0:00:00[0m
[?25h

**Import modules and download nltk stopwords**

In [3]:
import os
import re
import docx2txt
import PyPDF2
import nltk
import numpy as np
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from google.colab import files

nltk.download('stopwords')


[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


True

**Helper functions**

In [4]:
def extract_text_from_pdf(file_path):
    text = ""
    with open(file_path, 'rb') as file:
        reader = PyPDF2.PdfReader(file)
        for page in reader.pages:
            text += page.extract_text() or ""
    return text

def extract_text_from_docx(file_path):
    return docx2txt.process(file_path)

def extract_text_from_txt(file_path):
    with open(file_path, 'r', encoding='utf-8') as file:
        return file.read()

def extract_text(file_path):
    if file_path.endswith('.pdf'):
        return extract_text_from_pdf(file_path)
    elif file_path.endswith('.docx'):
        return extract_text_from_docx(file_path)
    elif file_path.endswith('.txt'):
        return extract_text_from_txt(file_path)
    else:
        return ""

def preprocess_text(text):
    text = re.sub(r'[^a-zA-Z0-9\s]', '', text).lower()
    stop_words = set(stopwords.words('english'))
    words = text.split()
    return ' '.join([word for word in words if word not in stop_words])


**Upload Job description and Resumes**

In [10]:
print("Upload Job Description file (TXT, PDF, or DOCX)")
job_file = files.upload()
job_file_path = next(iter(job_file))
job_text = preprocess_text(extract_text(job_file_path))

print("Upload Resume Files (PDF, DOCX, or TXT)")
resume_files = files.upload()

resume_texts = []
resume_names = []

for name in resume_files:
    text = extract_text(name)
    preprocessed = preprocess_text(text)
    resume_texts.append(preprocessed)
    resume_names.append(name)


Upload Job Description file (TXT, PDF, or DOCX)


Saving job des.txt.txt to job des.txt (1).txt
Upload Resume Files (PDF, DOCX, or TXT)


Saving sophia-designer.pdf to sophia-designer (1).pdf
Saving software developer.pdf to software developer (1).pdf
Saving john-resume.pdf to john-resume (1).pdf
Saving info resume.pdf to info resume (1).pdf
Saving graphic (1).docx to graphic (1) (1).docx
Saving designer.pdf to designer (1).pdf
Saving david-software developer.pdf to david-software developer (1).pdf
Saving bob-backend developer.pdf to bob-backend developer (1).pdf
Saving banking.txt to banking (1).txt
Saving Victoria-Teacher.pdf to Victoria-Teacher (1).pdf
Saving Teacher.pdf to Teacher (1).pdf
Saving Julia-graphic.docx to Julia-graphic (1).docx
Saving Charlie-banking.txt to Charlie-banking (1).txt


**Compute the similarity between Job description and Resume**

In [11]:
vectorizer = TfidfVectorizer()
vectors = vectorizer.fit_transform([job_text] + resume_texts)
similarity_scores = cosine_similarity(vectors[0:1], vectors[1:])[0]

# Sort results
sorted_indices = np.argsort(similarity_scores)[::-1]
top_resumes = [resume_names[i] for i in sorted_indices]
top_scores = [round(similarity_scores[i] * 100, 2) for i in sorted_indices]

# Display top matches
print("\nTop Matching Resumes:")
for name, score in zip(top_resumes, top_scores):
    print(f"{name}: {score}% match")



Top Matching Resumes:
john-resume (1).pdf: 24.34% match
info resume (1).pdf: 24.34% match
bob-backend developer (1).pdf: 11.17% match
david-software developer (1).pdf: 10.11% match
software developer (1).pdf: 10.11% match
sophia-designer (1).pdf: 8.75% match
designer (1).pdf: 8.75% match
Teacher (1).pdf: 5.98% match
Victoria-Teacher (1).pdf: 5.98% match
Charlie-banking (1).txt: 5.27% match
banking (1).txt: 5.27% match
Julia-graphic (1).docx: 3.75% match
graphic (1) (1).docx: 3.75% match


In [12]:
# Display top 3 matches
print("\nTop 3 Matching Resumes:")
for i in range(min(3, len(top_resumes))):
    print(f"{top_resumes[i]}: {top_scores[i]}% match")


Top 3 Matching Resumes:
john-resume (1).pdf: 24.34% match
info resume (1).pdf: 24.34% match
bob-backend developer (1).pdf: 11.17% match
