# Introduction:
Resume parsing is a valuable tool used in various real-life scenarios to simplify and streamline the hiring process. Imagine you're a busy hiring manager or a human resources professional responsible for reviewing countless resumes. It can be quite overwhelming and time-consuming to manually read through each document and extract the relevant information. This is where a resume parser comes in.

A resume parser is like a smart assistant that helps automate the initial screening of resumes. It uses advanced algorithms and natural language processing techniques to analyze the content of a resume and extract key details such as contact information, education history, work experience, skills, and more. This information is then organized into a structured format, making it easier for recruiters to evaluate candidates efficiently.

With a resume parser, you can quickly scan through a large pool of resumes and identify the most suitable candidates based on specific criteria. It allows you to search for particular skills, experience levels, educational backgrounds, or any other qualifications you require for the job. The parser also eliminates the possibility of human error and ensures consistent and accurate data extraction.

Furthermore, resume parsing can be integrated with applicant tracking systems (ATS) or other recruitment software. This integration enables seamless data transfer and eliminates the need for manual data entry, saving a significant amount of time and reducing administrative burdens. The parsed data can be easily sorted, filtered, and compared, enabling recruiters to shortlist candidates efficiently and make informed decisions.

In summary, resume parsing technology acts as a valuable assistant for hiring professionals, making the resume screening process more efficient, accurate, and manageable. It simplifies the initial stages of recruitment, allowing recruiters to focus their time and energy on evaluating the most promising candidates and conducting more meaningful interactions during interviews and assessments.

In [15]:
!pip install pdfminer-six
!python -m spacy download en_core_web_sm
from pdfminer.high_level import extract_text
 
def extract_text_from_pdf(pdf_path):
    return extract_text(pdf_path)
 
if __name__ == '__main__':
    print(extract_text_from_pdf(r"F:\interview\GauriDesaiResume18-11-2023.pdf"))

Collecting en-core-web-sm==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl (12.8 MB)
[38;5;2m[+] Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
Gauri Desai 

LinkedIn | https://www.linkedin.com/in/gauri-desai-19a423206 

Email  | gaurid380@gmail.com 

Github 

| https://github.com/gaurid380/AIMLProjects 

Mobile | +91 9970161397 

E-Portfolio Link | https://eportfolio.mygreatlearning.com/gauri-desai  Location: Pune 

Portfolio Link | https://sites.google.com/view/gauridesaiportfolio/home 

Summary 
16 years of software development experience. My area of expertise is in analysing customer requirements, 
planning, designing, development, deployment, testing, performance tuning and delivery of various projects with 
leading Java/J2EE/Android/IOS/Cloud technologies. 
I completed PG program in AIML from the University of Texas and now eager to a

# Exctracting Name from Resume:

The code snippet demonstrates a function that extracts text from a PDF file using pdfminer library. It then utilizes a regular expression pattern to extract a potential name from the extracted text. If a name is found, it is printed; otherwise, a "Name not found" message is displayed. This code can be used as a starting point for resume parsing tasks to extract names from resumes.

In [16]:
import pdfminer
import re

def extract_text_from_pdf(pdf_path):
    return extract_text(pdf_path)

def extract_name_from_resume(text):
    name = None

    # Use regex pattern to find a potential name
    pattern = r"(\b[A-Z][a-z]+\b)\s(\b[A-Z][a-z]+\b)"
    match = re.search(pattern, text)
    if match:
        name = match.group()

    return name

if __name__ == '__main__':
    text = extract_text_from_pdf(r"F:\interview\GauriDesaiResume18-11-2023.pdf")
    name = extract_name_from_resume(text)

    if name:
        print("Name:", name)
    else:
        print("Name not found")


Name: Gauri Desai


# Exctract Contact Number:

The provided code snippet defines a function to extract text from a PDF file using pdfminer. It also includes another function to extract a potential contact number from the extracted text using a regular expression pattern. The code then calls these functions to extract the contact number from a specific resume file. If a contact number is found, it is printed; otherwise, a "Contact Number not found" message is displayed. This code can be used as a starting point for extracting contact numbers from resumes.

In [17]:
 def extract_text_from_pdf(pdf_path):
    return extract_text(pdf_path)

def extract_contact_number_from_resume(text):
    contact_number = None

    # Use regex pattern to find a potential contact number
    pattern = r"\b(?:\+?\d{1,3}[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}\b"
    match = re.search(pattern, text)
    if match:
        contact_number = match.group()

    return contact_number

if __name__ == '__main__':
    text = extract_text_from_pdf(r"F:\interview\GauriDesaiResume18-11-2023.pdf")
    contact_number = extract_contact_number_from_resume(text)

    if contact_number:
        print("Contact Number:", contact_number)
    else:
        print("Contact Number not found")

Contact Number: 91 9970161397


# Exctract Email Id : 

The provided code snippet defines a function extract_text_from_pdf() to extract text from a PDF file using pdfminer. It also includes another function extract_email_from_resume() to extract a potential email address from the extracted text using a regular expression pattern.

The code then calls these functions to extract the email address from a specific resume file. If an email address is found, it is printed as "Email: [email address]"; otherwise, a "Email not found" message is displayed.

This code can be used as a starting point for extracting email addresses from resumes.

In [4]:

def extract_text_from_pdf(pdf_path):
    return extract_text(pdf_path)

def extract_email_from_resume(text):
    email = None

    # Use regex pattern to find a potential email address
    pattern = r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b"
    match = re.search(pattern, text)
    if match:
        email = match.group()

    return email

if __name__ == '__main__':
    text = extract_text_from_pdf(r"F:\interview\GauriDesaiResume18-11-2023.pdf")
    email = extract_email_from_resume(text)

    if email:
        print("Email:", email)
    else:
        print("Email not found")


Email: gaurid380@gmail.com


# Exctracting Skills:

The provided code snippet includes a function extract_text_from_pdf() that extracts text from a PDF file using pdfminer. Additionally, it defines a function extract_skills_from_resume() that takes the extracted text and a list of predefined skills as input.

The extract_skills_from_resume() function searches for each skill in the provided list within the resume text using regular expressions. If a skill is found, it is added to the skills list. Finally, the function returns the list of extracted skills.

In the code's main section, the extract_text_from_pdf() function is called to extract the text from a specific resume file. A predefined list of skills is defined, and the extract_skills_from_resume() function is invoked with the extracted text and skills list as arguments. The extracted skills are then printed as "Skills: [extracted skills]" if any skills are found, otherwise, a "No skills found" message is displayed.

This code can be utilized to extract skills from resumes by providing a list of predefined skills and the resume text. It serves as a basic framework for skill extraction and can be extended or customized to meet specific requirements.

In [18]:
def extract_text_from_pdf(pdf_path):
    return extract_text(pdf_path)

def extract_skills_from_resume(text, skills_list):
    skills = []

    # Search for skills in the resume text
    for skill in skills_list:
        pattern = r"\b{}\b".format(re.escape(skill))
        match = re.search(pattern, text, re.IGNORECASE)
        if match:
            skills.append(skill)

    return skills

if __name__ == '__main__':
    text = extract_text_from_pdf(r"F:\interview\GauriDesaiResume18-11-2023.pdf")

    # List of predefined skills
    skills_list = ['Python', 'Data Analysis', 'Machine Learning', 'Communication', 'Project Management', 'Deep Learning', 'SQL', 'Tableau','AI']

    extracted_skills = extract_skills_from_resume(text, skills_list)

    if extracted_skills:
        print("Skills:", extracted_skills)
    else:
        print("No skills found")


Skills: ['Python', 'Machine Learning', 'Communication', 'Project Management', 'Deep Learning', 'SQL', 'Tableau', 'AI']


# Exctracting Education:

The provided code snippet consists of a function extract_text_from_pdf() that extracts text from a PDF file using the pdfminer library. Additionally, it includes a function extract_education_from_resume() that takes the extracted text as input.

The extract_education_from_resume() function utilizes a regular expression pattern to search for education information in the resume text. The pattern is designed to match various education degrees such as BSc, B.Tech, M.Tech, Ph.D., Bachelor's, Master's, and Ph.D., followed by the corresponding field of study.

Within the code's main section, the extract_text_from_pdf() function is invoked to extract the text from a specific resume file. Then, the extract_education_from_resume() function is called with the extracted text as an argument. If any education information is found, it is appended to the education list. Finally, the list of extracted education details is printed as "Education: [extracted_education]" if education information is found. Otherwise, a "No education information found" message is displayed.

This code provides a basic framework for extracting education information from resumes using regular expressions. It can be further customized or expanded to handle additional patterns or extract more specific details related to education.

In [30]:
def extract_text_from_pdf(pdf_path):
    return extract_text(pdf_path)

def extract_education_from_resume(text):
    education = []

    # Use regex pattern to find education information
    pattern = r"(?i)(?:(?:Bachelor(?:'s)?|B\.S\.|B\.A\.|Master|M\.S\.|M\.A\.|Ph\.D\.)\s(?:[A-Za-z]+\s)*[A-Za-z]+)"
    matches = re.findall(pattern, text)
    for match in matches:
        education.append(match.strip())

    return education

if __name__ == '__main__':
    text = extract_text_from_pdf(r"F:\interview\GauriDesaiResume18-11-2023.pdf")

    extracted_education = extract_education_from_resume(text)
    if extracted_education:
        print("Education:", extracted_education)
    else:
        print("No education information found")


No education information found


In [31]:
def extract_text_from_pdf(pdf_path):
    return extract_text(pdf_path)

def extract_education_from_resume(text):
    education = []

    # Use regex pattern to find education information
    pattern = r"(?i)(?:Bsc|\bB\.\w+|\bM\.\w+|\bPh\.D\.\w+|\bBachelor(?:'s)?|\bMaster(?:'s)?|\bPh\.D)\s(?:\w+\s)*\w+"
    
    matches = re.findall(pattern, text)
    for match in matches:
        education.append(match.strip())

    return education

if __name__ == '__main__':
    text = extract_text_from_pdf(r"F:\interview\GauriDesaiResume18-11-2023.pdf")

    extracted_education = extract_education_from_resume(text)
    if extracted_education:
        print("Education:", extracted_education)
    else:
        print("No education information found")


No education information found


In [16]:

import spacy

nlp = spacy.load('en_core_web_sm')

def extract_data_science_education(text):
    doc = nlp(text)

    education = []

    for ent in doc.ents:
        if ent.label_ == 'ORG' and 'AIML' in ent.text:
            education.append(ent.text)

    return education

if __name__ == '__main__':
    text = extract_text_from_pdf(r"F:\interview\GauriDesaiResume18-11-2023.pdf")

    extracted_education = extract_data_science_education(text)
    if extracted_education:
        print("Data Science Education:", extracted_education)
    else:
        print("No data science education found")


Collecting en-core-web-sm==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl (12.8 MB)
[38;5;2m[+] Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
Data Science Education: ['AIML', 'AIML PG Program']


In [21]:

def extract_university_name(text):
    lines = text.split('\n')
    college_pattern = r"(?i).*University.*"
    for line in lines:
        if re.match(college_pattern, line):
            return line.strip()
    return None

# Example usage:
    text = extract_text_from_pdf(r"F:\interview\GauriDesaiResume13-09-2023.pdf")


university_name = extract_university_name(text)
if university_name:
    print("University:", university_name)
else:
    print("University name not found.")


University: I completed PG program in AIML from the University of Texas and now eager to apply AI ML technology to real


# Conclusion:

In conclusion, the provided code snippets demonstrate a basic implementation of a resume parser. Each code snippet focuses on extracting specific information from a resume, such as name, contact number, email, skills, and education.

The resume parser utilizes various techniques, including regular expressions and text extraction from PDF files. It showcases how these techniques can be applied to automate the extraction of important details from resumes.

However, it's important to note that the code snippets provide a starting point and can be further enhanced and customized based on specific requirements. For example, additional patterns or algorithms can be implemented to improve the accuracy of information extraction.

Resume parsing plays a vital role in automating the initial screening process for job applications. By extracting key details from resumes, it saves time and effort for recruiters and allows for efficient filtering of candidates.

As technology advances, resume parsing algorithms can be further refined to handle more complex resume formats, languages, and diverse information extraction requirements. This will help in building more sophisticated and accurate resume parsing systems.

Overall, the provided code snippets serve as a foundation for developing a resume parser and demonstrate the potential of automating the extraction of essential information from resumes, streamlining the recruitment process, and improving efficiency in candidate evaluation.