<a href="https://colab.research.google.com/github/pushparani7/NLP_project/blob/main/Resume_Parser.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Setting up the Environment**

In [61]:
!pip install pdfminer.six
!pip install spacy
!python m spacy download en_core_web_sm

python3: can't open file '/content/m': [Errno 2] No such file or directory


**Extracting Text frpm PDF using Pdfminer**

In [62]:
from pdfminer.high_level import extract_text
import os
def extract_text_from_pdf(pdf_path):
    return extract_text(pdf_path)
sample_text= extract_text_from_pdf("data_analysis_resume.pdf")
print(sample_text[:1000])

Resume

Name

Priya Sharma

Email

priya.sharma@example.com

Phone

+91 9876543210

Skills

Python, SQL, Data Visualization, Pandas, NumPy, Machine Learning, Excel, Tableau

Education

Bachelor of Technology in Artificial Intelligence and Data Science, Anna University, 2021 - 2025

Experience

Data Analyst Intern, ABC Analytics, June 2024 - Aug 2024

- Performed data cleaning, transformation, and visualization using Python and Tableau.

- Assisted in building predictive models for sales forecasting.

- Collaborated with the team to automate reporting workflows.




**Cleaning & Preprocessing Text**

In [63]:
import re
def clean_text(text):
  text=re.sub(r'\n+','\n',text)
  text=re.sub(r'\+',' ',text)
  return text.strip()
cleaned=clean_text(sample_text)
print(cleaned)

Resume
Name
Priya Sharma
Email
priya.sharma@example.com
Phone
 91 9876543210
Skills
Python, SQL, Data Visualization, Pandas, NumPy, Machine Learning, Excel, Tableau
Education
Bachelor of Technology in Artificial Intelligence and Data Science, Anna University, 2021 - 2025
Experience
Data Analyst Intern, ABC Analytics, June 2024 - Aug 2024
- Performed data cleaning, transformation, and visualization using Python and Tableau.
- Assisted in building predictive models for sales forecasting.
- Collaborated with the team to automate reporting workflows.


**Extracting Name,E-mail,Phone**

In [64]:
def extract_email(text):
  match=re.search(r'\S+@\S+',text)
  return match.group(0) if match else None

def extract_phone(text):
  match=re.search(r'\+?\d[\d\s\_()]{8,15}',text)
  return match.group(0) if match else None

email=extract_email(cleaned)
phone=extract_phone(cleaned)
print("Email:",email, " Phone:",phone,"Name:",name)

Email: priya.sharma@example.com  Phone: 91 9876543210
 Name: Priya Sharma


In [65]:
import spacy
nlp=spacy.load('en_core_web_sm')

def extract_name(text):
  doc=nlp(text)
  for ent in doc.ents:
    if ent.label_=='PERSON':
      return ent.text
  return None
name = extract_name(cleaned)
print("Extracted Name:", name)

Extracted Name: Priya Sharma


**Custom Skill Matching from Resume**

In [66]:
SKILL_SET=['python','Excel','Machine learning','SQL','power bi','data Analysis']

def extract_skills(text,skills=SKILL_SET):
  found=[skill for skill in skills if skill.lower() in text.lower()]
  return list(set(found))

**Extracting Education & Degree**

In [67]:
EDU_KEYWORDS=['Bachelor','B.tech','M.tech','B.E','M.E','PhD']

def extract_education(text):
  lines=text.split('\n')
  education=[]
  for line in lines:
    for word in EDU_KEYWORDS:
      if word.lower() in line.lower():
        education.append(line.strip())
  return education

**Extracting Work Experiences Snippet**

In [68]:
def extract_experience(text):
  experience_keywords=['experience','work','internship','employment']
  exp_lines=[]

  lines=text.split('\n')
  for line in lines:
    for keyword in experience_keywords:
      if keyword.lower() in line.lower():
        exp_lines.append(line.strip())
  return exp_lines

**FINAL OUTPUT DICTIONARY**

In [69]:
parsed_resume={
    'name':extract_name(cleaned),
    'email':extract_email(cleaned),
    'phone':extract_phone(cleaned),
    'skills':extract_skills(cleaned),
    'education':extract_education(cleaned),
    'experience':extract_experience(cleaned)
}
print(parsed_resume)

{'name': 'Priya Sharma', 'email': 'priya.sharma@example.com', 'phone': '91 9876543210\n', 'skills': ['SQL', 'Machine learning', 'python', 'Excel'], 'education': ['Bachelor of Technology in Artificial Intelligence and Data Science, Anna University, 2021 - 2025'], 'experience': ['Experience', '- Collaborated with the team to automate reporting workflows.']}


**Automating Resume Folder Parcing**

In [None]:
import pandas as pd

def process_folder(folder_path):
  results=[]
  for filename in os.listdir(folder_path):
    if filename.endswith('.pdf'):
      path=os.path.join(folder_path,filename)
      text=clean_text(extract_text_from_pdf(pdf_path))
      parsed={
          'filename':filename,
          'name':extract_name(text),
          'email':extract_email(text),
          'phone':extract_phone(text),
          'skills':extract_skills(text),
          'education':extract_education(text),
          'experience':extract_experience(text)

      }
      results.append(parsed)
  return pd.DataFrame(results)
df=process_folder('resumes')
df.to_csv('parsed_resumes.csv',index=False)

