<a href="https://colab.research.google.com/github/nitishkpandey/EDA-Python-ML/blob/main/NLP_for_Intelligent_Talent_Acquisition.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##**NLP Framework for Intelligent Talent Acquisition**

Title: NLP Framework for Intelligent Talent Acquisition

Author: Nitish Kumar Pandey

**1. Introduction of the Business Context**

**Client Overview**

The client is a large recruitment platform (similar to Indeed, LinkedIn, StepStone, etc.) that receives thousands of resumes daily. They want to automate the candidate screening process to improve efficiency and reduce manual workload for recruiters.

**Business Problem**

Recruiters struggle to manually analyze resumes, compare them across roles, and shortlist suitable candidates. This leads to:

*   Slow hiring cycles
*   Inconsistent evaluation
*   High recruiter workload
*   Missed potential candidates

The client wants an **NLP-driven system** that:

*   Understands resume content
*   Extracts key information (skills, experience, education)
*   Compares it with job descriptions
*   Predicts how well a candidate matches a job
*   Supports recruiters with AI-driven shortlisting

**2. Business Benefits**

**For the Business / Recruitment Platform**

*   Automated candidate shortlisting
*   Consistent and data-driven evaluation
*   Identification of top candidates faster
*   Improved applicant experience (reduced waiting time)
*   Increased recruiter productivity
*   Better job-candidate matching → higher hiring success

**For Recruiters / Clients**

*   Easy comparison of candidates
*   Transparent similarity/match scoring
*   Identifies missing or incorrect skills
*   Helps focus on high-value interviews instead of resume reading

**For Job Seekers**

*   Better visibility if their skills match roles
*   More accurate recommendations

**3. Dataset Information & Collection**

**Dataset Source**

The dataset is taken from: https://www.kaggle.com/datasets/saugataroyarghya/resume-dataset

**Dataset Description**

The dataset contains:

**Resume information:**

*   Skills
*   Experience
*   Projects
*   Career objective
*   Certifications

**Job description information:**

*   Job position
*   Responsibilities
*   Required skills
*   Experience requirements

**Labels:**

*   matched_score — rating between candidate & job posting (numeric)

**Why this Dataset?**

It supports both:

*   **NER task**: extract structured entities from resumes
*   **Matching task**: predict similarity score between candidate & job

**4. Formulating as an NLP Task**

This project consists of three NLP components:

**i. Resume Parsing (NER Task)**

*   Extract entities like:
*   Skills
*   Experience years
*   Degree / Education
*   Tools / Technologies
*   Companies

**ii. Text Representation & Vectorization**

*   Convert resume & job text into numerical representations using:
*   TF-IDF
*   Sentence embeddings (optional)
*   Skill overlap features
*   Similarity measures

**iii. Candidate–Job Matching (Regression/Classification)**

*   Using constructed features to predict:
*   Match score (regression)

    or

*   High/Medium/Low suitability (classification)

**5. Importing Required Libraries**

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import spacy
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics.pairwise import cosine_similarity
import warnings
warnings.filterwarnings("ignore")

**7. Fetching & Viewing the Data**

In [3]:
df = pd.read_csv("resume_data.csv")
print(df.shape)
df.head()
df.isnull().sum()
df.duplicated().sum()

(9544, 35)


np.int64(0)