# Introduction:
Building a resume parser using SpaCy can greatly streamline the process of extracting relevant information from resumes, enabling efficient candidate evaluation. In this guide, we will explore step-by-step instructions to develop a resume parser using the powerful natural language processing library, SpaCy.

1. Understanding the Problem:
Define the objective of the resume parser and the specific information to be extracted, such as name, contact details, skills, education, and work experience.

2. Preparing the Environment:
Install SpaCy and its required dependencies.
Download and load the necessary SpaCy language models.

3. Extracting Text from Resumes:
Utilize PDF parsing libraries like pdfminer to extract text content from resume files.
Implement a function to extract text from PDF files using the chosen library.

4. Extracting Name from Resumes:
Use SpaCy's linguistic capabilities to extract names from resume text.
Define name patterns using SpaCy's Matcher module to identify different name formats.

5. Extracting Contact Details:
Employ regular expressions to extract contact numbers from resume text.
Define patterns to capture various phone number formats.

6. Extracting Email Addresses:
Utilize regular expressions to identify and extract email addresses from resume text.
Define email patterns to ensure accurate extraction.

7. Extracting Skills:
Create a predefined list of skills relevant to the desired job requirements.
Utilize SpaCy's linguistic capabilities to match and extract skills from the resume text.

8. Extracting Education:
Define a set of education keywords or patterns to identify educational information.
Utilize regular expressions to extract education details from the resume text.

9. Putting it All Together:
Combine the individual extraction functions to create a comprehensive resume parser.
Process the resume text, extract the desired information, and store it in a structured format.

10. Enhancements and Customizations:
Explore advanced techniques to improve extraction accuracy, such as named entity recognition and entity linking.
Consider handling different resume formats and languages for broader compatibility.
Implement additional features like extracting work experience, certifications, or personal projects based on specific requirements.

In [None]:
Note to Readers:

I am thrilled to share with you a comprehensive guide on building a resume parser using SpaCy, which is now available on Analytics Vidhya. This guide aims to empower you with the knowledge and tools to create a powerful resume parser from scratch.

Throughout the guide, I have covered various essential aspects of resume parsing, including text extraction from PDFs, extracting important information like contact details, skills, education, and more. I have also demonstrated how to leverage the capabilities of SpaCy, a popular natural language processing library, to perform these tasks effectively.

By following the step-by-step instructions and code examples provided in the guide, you will gain a solid understanding of the resume parsing process and be equipped with the skills to build your own resume parser. Whether you are a data scientist, developer, or HR professional, this guide will help you streamline and automate the resume screening process, saving you valuable time and effort.

I encourage you to dive into the guide, experiment with the code, and adapt it to your specific requirements. Remember that building a resume parser is an iterative process, and you may need to fine-tune and customize it based on the unique characteristics of the resumes you encounter.

I would like to express my gratitude to the Analytics Vidhya platform for providing the opportunity to share this guide with the community. Their dedication to promoting knowledge sharing and empowering data professionals is truly commendable.

I hope this guide serves as a valuable resource for you on your journey of building a resume parser using SpaCy. Feel free to reach out with any questions, feedback, or success stories you may have. Happy parsing!

In [1]:
import re
from pdfminer.high_level import extract_text
import spacy
from spacy.matcher import Matcher

def extract_text_from_pdf(pdf_path):
    return extract_text(pdf_path)

def extract_contact_number_from_resume(text):
    contact_number = None

    # Use regex pattern to find a potential contact number
    pattern = r"\b(?:\+?\d{1,3}[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}\b"
    match = re.search(pattern, text)
    if match:
        contact_number = match.group()

    return contact_number

def extract_email_from_resume(text):
    email = None

    # Use regex pattern to find a potential email address
    pattern = r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b"
    match = re.search(pattern, text)
    if match:
        email = match.group()

    return email

def extract_skills_from_resume(text, skills_list):
    skills = []

    for skill in skills_list:
        pattern = r"\b{}\b".format(re.escape(skill))
        match = re.search(pattern, text, re.IGNORECASE)
        if match:
            skills.append(skill)

    return skills

def extract_education_from_resume(text):
    education = []

    # Use regex pattern to find education information
    pattern = r"(?i)(?:Bsc|\bB\.\w+|\bM\.\w+|\bPh\.D\.\w+|\bBachelor(?:'s)?|\bMaster(?:'s)?|\bPh\.D)\s(?:\w+\s)*\w+"
    matches = re.findall(pattern, text)
    for match in matches:
        education.append(match.strip())

    return education

def extract_name(resume_text):
    nlp = spacy.load('en_core_web_sm')
    matcher = Matcher(nlp.vocab)

    # Define name patterns
    patterns = [
        [{'POS': 'PROPN'}, {'POS': 'PROPN'}],  # First name and Last name
        [{'POS': 'PROPN'}, {'POS': 'PROPN'}, {'POS': 'PROPN'}],  # First name, Middle name, and Last name
        [{'POS': 'PROPN'}, {'POS': 'PROPN'}, {'POS': 'PROPN'}, {'POS': 'PROPN'}]  # First name, Middle name, Middle name, and Last name
        # Add more patterns as needed
    ]

    for pattern in patterns:
        matcher.add('NAME', patterns=[pattern])

    doc = nlp(resume_text)
    matches = matcher(doc)

    for match_id, start, end in matches:
        span = doc[start:end]
        return span.text

    return None

if __name__ == '__main__':
    resume_paths = [r"C:\Users\SANKET\Downloads\Untitled-resume.pdf"]

    for resume_path in resume_paths:
        text = extract_text_from_pdf(resume_path)

        print("Resume:", resume_path)

        name = extract_name(text)
        if name:
            print("Name:", name)
        else:
            print("Name not found")

        contact_number = extract_contact_number_from_resume(text)
        if contact_number:
            print("Contact Number:", contact_number)
        else:
            print("Contact Number not found")

        email = extract_email_from_resume(text)
        if email:
            print("Email:", email)
        else:
            print("Email not found")

        skills_list = ['Python', 'Data Analysis', 'Machine Learning', 'Communication', 'Project Management', 'Deep Learning', 'SQL', 'Tableau']
        extracted_skills = extract_skills_from_resume(text, skills_list)
        if extracted_skills:
            print("Skills:", extracted_skills)
        else:
            print("No skills found")

        extracted_education = extract_education_from_resume(text)
        if extracted_education:
            print("Education:", extracted_education)
        else:
            print("No education information found")

        print()


Resume: C:\Users\SANKET\Downloads\Untitled-resume.pdf
Name: Sanket Sarwade
Contact Number: 7798248452
Email: sanketsarwade111@gmail.com
Skills: ['Python', 'Data Analysis', 'Machine Learning', 'Communication', 'Deep Learning', 'SQL', 'Tableau']
Education: ['Bsc Microbiology']



# Conclusion:
Building a resume parser using SpaCy empowers recruiters and hiring professionals to automate the extraction of crucial information from resumes, saving time and effort in candidate evaluation. This guide has provided a comprehensive walkthrough of the key steps involved in creating a resume parser using SpaCy, enabling you to harness the power of natural language processing for efficient resume analysis.

By leveraging the flexibility and extensibility of SpaCy, you can customize and enhance the resume parser to suit your unique requirements. Continual refinement and adaptation will result in a robust and accurate resume parsing solution, streamlining the recruitment process and improving decision-making.

Remember, the resume parser presented here serves as a starting point, and you can further innovate and expand its capabilities to meet evolving needs and technological advancements. Embrace the power of SpaCy and embark on the journey of building a cutting-edge resume parser to revolutionize your recruitment process.