# Introduction:
Resume parsing is a valuable tool that streamlines the hiring process by automating the initial resume screening. It uses advanced algorithms and natural language processing to extract key details like contact information, education, work experience, and skills from resumes. This information is organized into a structured format, making it easier for recruiters to evaluate candidates quickly and accurately.

A resume parser helps recruiters filter resumes based on specific criteria, eliminating human error and ensuring consistent data extraction. It can be integrated with applicant tracking systems (ATS) to save time, reduce manual data entry, and improve efficiency in candidate evaluation. Overall, resume parsing simplifies recruitment, allowing recruiters to focus on the most promising candidates.

In [97]:
from pdfminer.high_level import extract_text
 
def extract_text_from_pdf(pdf_path):
    return extract_text(pdf_path)
 
if __name__ == '__main__':
    print(extract_text_from_pdf("Sample_Resume1.pdf"))

Gelina Jorg 

9812345678 

gelinajxx@yahoo.com 

Seattle 

SUMMARY 

Experienced Machine Learning Engineer skilled in developing predictive models and deploying AI solutions, 

with a focus on driving business value through ML. Proficient in Python, SQL, and simplifying technical 
insights for non-technical audiences. 

Technical SKILLS 

Python, SQL, Regression, Clustering, Tableau, LLM, OpenAI, Modeling, CNN, Excel, GitHub 

Key SKILLS 

Algorithms, Data Analysis & Manipulation, Data Visualization, Critical Thinking, Leadership, Adaptability, Curiosity 

EDUCATION 

IIT Mumbai |  Mumbai, IN 

Post-Graduation in Data Science 

•  Course Modules: 

Jul '20  -  Sep '22 

○  Data Analysis in Excel | Analytics Problem Solving | Data Analysis using SQL | Introduction to Python 
○  Programming in Python | Python for Data Science | Inferential Statistics | Hypothesis Testing 
○  Introduction to Machine Learning and Linear Regression | Logistic Regression | Tree Models 

KEY PROJECTS 

Lead S

# Exctracting Name from Resume:

The code snippet demonstrates a function that extracts text from a PDF file using pdfminer library. It then utilizes a regular expression pattern to extract a potential name from the extracted text. If a name is found, it is printed; otherwise, a "Name not found" message is displayed. This code can be used as a starting point for resume parsing tasks to extract names from resumes.

In [99]:
import pdfminer
import re

def extract_text_from_pdf(pdf_path):
    return extract_text(pdf_path)

def extract_name_from_resume(text):
    name = None

    # Use regex pattern to find a potential name
    pattern = r"(\b[A-Z][a-z]+\b)\s(\b[A-Z][a-z]+\b)"
    match = re.search(pattern, text)
    if match:
        name = match.group()

    return name

if __name__ == '__main__':
    text = extract_text_from_pdf("Sample_Resume1.pdf")
    name = extract_name_from_resume(text)

    if name:
        print("Name:", name)
    else:
        print("Name not found")


Name: Gelina Jorg


# Exctract Contact Number:

The provided code snippet defines a function to extract text from a PDF file using pdfminer. It also includes another function to extract a potential contact number from the extracted text using a regular expression pattern. The code then calls these functions to extract the contact number from a specific resume file. If a contact number is found, it is printed; otherwise, a "Contact Number not found" message is displayed. This code can be used as a starting point for extracting contact numbers from resumes.

In [105]:
 def extract_text_from_pdf(pdf_path):
    return extract_text(pdf_path)

def extract_contact_number_from_resume(text):
    contact_number = None

    # Use regex pattern to find a potential contact number
    #pattern = r"\b(?:\+?\d{1,3}[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}\b"
    pattern = r"\b(?:\+?\d{1,3}\s?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}\b"
    match = re.search(pattern, text)
    if match:
        contact_number = match.group()

    return contact_number

if __name__ == '__main__':
    text = extract_text_from_pdf("Sample_Resume1.pdf")
    contact_number = extract_contact_number_from_resume(text)

    if contact_number:
        print("Contact Number:", contact_number)
    else:
        print("Contact Number not found")

Contact Number: 9812345678


# Exctract Email Id : 

The provided code snippet defines a function extract_text_from_pdf() to extract text from a PDF file using pdfminer. It also includes another function extract_email_from_resume() to extract a potential email address from the extracted text using a regular expression pattern.

The code then calls these functions to extract the email address from a specific resume file. If an email address is found, it is printed as "Email: [email address]"; otherwise, a "Email not found" message is displayed.

This code can be used as a starting point for extracting email addresses from resumes.

In [107]:

def extract_text_from_pdf(pdf_path):
    return extract_text(pdf_path)

def extract_email_from_resume(text):
    email = None

    # Use regex pattern to find a potential email address
    pattern = r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b"
    match = re.search(pattern, text)
    if match:
        email = match.group()

    return email

if __name__ == '__main__':
    text = extract_text_from_pdf("Sample_Resume1.pdf")
    email = extract_email_from_resume(text)

    if email:
        print("Email:", email)
    else:
        print("Email not found")


Email: gelinajxx@yahoo.com


# Exctracting Skills:

The provided code snippet includes a function extract_text_from_pdf() that extracts text from a PDF file using pdfminer. Additionally, it defines a function extract_skills_from_resume() that takes the extracted text and a list of predefined skills as input.

The extract_skills_from_resume() function searches for each skill in the provided list within the resume text using regular expressions. If a skill is found, it is added to the skills list. Finally, the function returns the list of extracted skills.

In the code's main section, the extract_text_from_pdf() function is called to extract the text from a specific resume file. A predefined list of skills is defined, and the extract_skills_from_resume() function is invoked with the extracted text and skills list as arguments. The extracted skills are then printed as "Skills: [extracted skills]" if any skills are found, otherwise, a "No skills found" message is displayed.

This code can be utilized to extract skills from resumes by providing a list of predefined skills and the resume text. It serves as a basic framework for skill extraction and can be extended or customized to meet specific requirements.

In [109]:
def extract_text_from_pdf(pdf_path):
    return extract_text(pdf_path)

def extract_skills_from_resume(text, skills_list):
    skills = []

    # Search for skills in the resume text
    for skill in skills_list:
        pattern = r"\b{}\b".format(re.escape(skill))
        match = re.search(pattern, text, re.IGNORECASE)
        if match:
            skills.append(skill)

    return skills

if __name__ == '__main__':
    text = extract_text_from_pdf("Sample_Resume1.pdf")

    # List of predefined skills
    skills_list = ['Python', 'Data Analysis', 'Machine Learning', 'Communication', 'Project Management', 'Deep Learning', 'SQL', 'Tableau']

    extracted_skills = extract_skills_from_resume(text, skills_list)

    if extracted_skills:
        print("Skills:", extracted_skills)
    else:
        print("No skills found")


Skills: ['Python', 'Data Analysis', 'Machine Learning', 'SQL', 'Tableau']


# Exctracting Education:

The provided code snippet consists of a function extract_text_from_pdf() that extracts text from a PDF file using the pdfminer library. Additionally, it includes a function extract_education_from_resume() that takes the extracted text as input.

The extract_education_from_resume() function utilizes a regular expression pattern to search for education information in the resume text. The pattern is designed to match various education degrees such as BSc, B.Tech, M.Tech, Ph.D., Bachelor's, Master's, and Ph.D., followed by the corresponding field of study.

Within the code's main section, the extract_text_from_pdf() function is invoked to extract the text from a specific resume file. Then, the extract_education_from_resume() function is called with the extracted text as an argument. If any education information is found, it is appended to the education list. Finally, the list of extracted education details is printed as "Education: [extracted_education]" if education information is found. Otherwise, a "No education information found" message is displayed.

This code provides a basic framework for extracting education information from resumes using regular expressions. It can be further customized or expanded to handle additional patterns or extract more specific details related to education.

In [111]:
import re
from pdfminer.high_level import extract_text

def extract_text_from_pdf(pdf_path):
    """Extracts text from a PDF file."""
    return extract_text(pdf_path)

def extract_education_section(text):
    """
    Extracts education-related information using a regex pattern.
    
    The regex matches diverse education formats like:
    - Post-Graduation in Data Science
    - B. Sc. in Information Technology
    - Bachelor of Arts
    """
    # Updated regex to match various education formats
    pattern = r"(?i)(?:Post[-\s]?Graduation|(?:Bachelor|B\.S\.|B\.A\.|Master|M\.S\.|M\.A\.|Ph\.D\.|B\.Sc\.|M\.Sc\.)\s(?:in\s)?(?:[A-Za-z]+\s)*[A-Za-z]+)"
    
    # Find all matches in the text
    matches = re.findall(pattern, text)
    
    # Return the matched education details
    return matches

if __name__ == '__main__':
    # Extract text from the PDF
    text = extract_text_from_pdf("Sample_Resume1.pdf")

    # Extract education information
    education_details = extract_education_section(text)

    if education_details:
        print("Education Details Found:")
        for detail in education_details:
            print("-", detail)
    else:
        print("Education details not found in the resume.")

Education Details Found:
- Post-Graduation


In [113]:
import re
from pdfminer.high_level import extract_text

def extract_text_from_pdf(pdf_path):
    """Extracts text from a PDF file."""
    return extract_text(pdf_path)

def extract_college_name(text):
    """Extracts college names based on the presence of the word 'college'."""
    lines = text.split('\n')
    
    # Regex pattern to match lines containing the word 'college', but allows some variation (e.g., 'University', 'Institute', etc.)
    college_pattern = r"(?i).*\b(college|university|institute)\b.*"
    
    for line in lines:
        if re.search(college_pattern, line):
            return line.strip()
    return None

# Example usage
if __name__ == '__main__':
    # Extract text from the PDF
    text = extract_text_from_pdf("Sample_Resume1.pdf")

    # Extract college name
    college_name = extract_college_name(text)

    if college_name:
        print("College:", college_name)
    else:
        print("College name not found.")

College name not found.


# Conclusion:

To sum up, the code snippets provided illustrate a basic approach to building a resume parser. Each snippet focuses on extracting essential information such as names, contact details, email addresses, skills, and education from resumes.

The parser leverages methods like regular expressions and text extraction from PDF documents, showcasing how automation can simplify the extraction of critical details from resumes.

However, these snippets serve as a foundational framework that can be expanded upon and tailored to meet specific needs. For instance, adding more sophisticated patterns or algorithms could further enhance the accuracy of data extraction.

Resume parsing is an important tool in automating the initial stages of recruitment, saving valuable time for recruiters by streamlining candidate screening and improving efficiency.

As technology evolves, the algorithms for resume parsing will continue to improve, enabling them to process more complex formats, handle multiple languages, and extract a wider range of information. This will lead to the development of more advanced and precise resume parsing systems.

In conclusion, these code snippets provide a solid starting point for creating a resume parser, highlighting the potential of automating the extraction of key information and simplifying the recruitment process for better efficiency in evaluating candidates.