<a href="https://colab.research.google.com/github/hasnain112e/ai-resume-parser-gradio/blob/main/AI_Powered_Resume_Parser.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task
Create a powerful end-to-end AI-powered Resume Parser Web App that automates resume analysis and extracts essential candidate information using Natural Language Processing (NLP). The application will enable users to upload resumes in PDF format, and through smart parsing logic, it will extract and display key sections such as:

Full Name

Email Address

Phone Number

LinkedIn (optional)

Education History

Technical & Soft Skills

Work Experience

Certifications (optional)



## Set up gradio environment

### Subtask:
Install the Gradio library and any other necessary libraries for handling file uploads and displaying results within Gradio.


**Reasoning**:
I need to install the required libraries for Gradio, PDF handling, and NLP, and then download the necessary spaCy model as specified in the instructions for this subtask.



In [10]:
%pip install PyMuPDF



In [11]:
%pip install gradio fitz spacy
%run -m spacy download en_core_web_sm

[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


## Adapt the resume parsing logic for gradio

### Subtask:
Adapt the existing parsing functions to work within the Gradio request-response cycle and handle the uploaded file data.


**Reasoning**:
Define the main parsing function `parse_resume_gradio` that takes a file path, reads the PDF content, calls the existing parsing functions, formats the output, and returns it, as required by the instructions. This function will adapt the existing parsing logic to the Gradio request-response cycle.



In [12]:
%pip install gradio fitz spacy
%run -m spacy download en_core_web_sm

[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


In [13]:
import fitz # PyMuPDF
import re
import spacy
import io
import os

# Load the spaCy model outside of the request handling
# This assumes the spaCy model "en_core_web_sm" has been downloaded previously
try:
    nlp = spacy.load("en_core_web_sm")
    print("spaCy model loaded successfully.") # Print to console for verification
except Exception as e:
    nlp = None
    print(f"Error loading spaCy model: {e}") # Print error to console

# Redefine the parsing functions to ensure they are available in this scope
def extract_text_from_pdf(pdf_stream):
    """Extracts text from a PDF file stream."""
    text = ""
    try:
        # Use a file-like object directly with fitz.open
        doc = fitz.open(stream=pdf_stream.read(), filetype="pdf")
        for page_num in range(doc.page_count):
            page = doc.load_page(page_num)
            text += page.get_text()
    except Exception as e:
        print(f"Error extracting text from PDF: {e}") # Print error to console
        return None
    return text

def extract_contact_info(text):
    """Extracts email, phone numbers, and LinkedIn profiles using regex."""
    email = re.findall(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', text)
    phone = re.findall(r'\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}', text)
    # Regex for LinkedIn profile URLs
    # This pattern looks for common LinkedIn URL structures
    linkedin = re.findall(r'(?:http(s)?:\/\/)?([\w]+\.)?linkedin\.com\/(pub|in|profile)\/([-a-zA-Z0-9]+)\/?', text)
    # Format the LinkedIn results to get just the URLs or relevant parts
    linkedin_urls = []
    for match in linkedin:
         # Reconstruct the URL parts that were captured
        protocol = match[0] if match[0] else '' # http or https
        subdomain = match[1] if match[1] else '' # www. or empty
        profile_type = match[2] if match[2] else '' # pub, in, or profile
        profile_id = match[3] if match[3] else '' # the profile ID
        linkedin_urls.append(f"{'http'+protocol+'://' if protocol or subdomain else ''}{subdomain}linkedin.com/{profile_type}/{profile_id}")


    return {"email": email, "phone": phone, "linkedin": linkedin_urls}


def parse_certifications(certifications_text):
    """Parses the text identified as 'Certifications' to extract individual certifications."""
    certifications_list = []
    if certifications_text:
        # Simple parsing: split by common separators like newlines, commas, or semicolons
        # This is a basic approach and can be improved with more sophisticated pattern matching
        lines = certifications_text.split('\n')
        for line in lines:
            line = line.strip()
            if line:
                # Further refine parsing based on potential patterns within lines
                # For now, just add non-empty lines as individual certifications
                certifications_list.append(line)
    return certifications_list


def extract_sections(text):
    """Extracts sections like Education, Skills, Work Experience, and Certifications."""
    sections = {}
    # Simple keyword-based extraction (can be improved with more sophisticated NLP)
    keywords = {
        "education": ["education", "academic"],
        "skills": ["skills", "proficiencies"],
        "experience": ["experience", "work history", "employment"],
        # Updated keywords for certifications for potentially better matching
        "certifications": ["certifications", "licenses", "professional development", "training", "awards"]
    }

    text_lower = text.lower()

    # Sort keywords by their appearance in the text to better identify section boundaries
    found_keywords = sorted([
        (text_lower.find(kw), section, kw) for section, kws in keywords.items() for kw in kws if kw in text_lower
    ])

    # Extract sections based on the order of keywords
    for i, (start_index, section, kw) in enumerate(found_keywords):
        if start_index != -1:
            # Find the end index for the current section
            end_index = len(text)
            if i + 1 < len(found_keywords):
                next_start_index, _, _ = found_keywords[i+1]
                end_index = next_start_index

            section_text = text[start_index:end_index].strip()

            # Remove the keyword itself from the start of the section text
            # Find the exact match case-insensitively and remove it
            keyword_match = re.search(r'\b' + re.escape(kw) + r'\b', section_text, re.IGNORECASE)
            if keyword_match:
                section_text = section_text[keyword_match.end():].strip()


            sections[section] = section_text.strip()

    # If a section wasn't found by keyword but other sections were,
    # a simple keyword match might still be useful for a fallback
    for section, kws in keywords.items():
        if section not in sections:
             for kw in kws:
                start_index = text_lower.find(kw)
                if start_index != -1:
                    # Simple approach: take text from keyword until the next potential section header or end of text
                    remaining_text = text_lower[start_index:]
                    end_index = len(remaining_text)
                    for other_section, other_kws in keywords.items():
                        if other_section != section:
                            for other_kw in other_kws:
                                other_kw_index = remaining_text.find(other_kw)
                                if other_kw_index != -1 and other_kw_index < end_index:
                                    end_index = other_kw_index
                    section_text = text[start_index:start_index + end_index]

                    # Remove the keyword itself from the start of the section text
                    keyword_match = re.search(r'\b' + re.escape(kw) + r'\b', section_text, re.IGNORECASE)
                    if keyword_match:
                        section_text = section_text[keyword_match.end():].strip()

                    sections[section] = section_text.strip()
                    break # Found a keyword for this section


    # Parse the certifications text specifically
    certifications_text = sections.get("certifications", "")
    parsed_certifications = parse_certifications(certifications_text)
    sections["certifications"] = parsed_certifications # Store as a list of strings

    return sections

# Need to define extract_name function for parse_resume to work
def extract_name(text, nlp):
    """Extracts a potential name from the text using spaCy."""
    if nlp:
        doc = nlp(text)
        names = []
        for ent in doc.ents:
            # Assuming PERSON entities are names
            if ent.label_ == "PERSON":
                names.append(ent.text)
        # Return the longest name found, or None if no names found
        if names:
            return max(names, key=len)
    return None # Return None if nlp model is not loaded or no name is found


def parse_resume_gradio(file_path):
    """
    Main parsing function adapted for Gradio.
    Takes a file path, extracts text, and parses resume information.
    """
    if file_path is None:
        return "Please upload a PDF file.", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A" # Return empty fields for Gradio output components

    try:
        with open(file_path, 'rb') as f:
            pdf_stream = io.BytesIO(f.read())

        text = extract_text_from_pdf(pdf_stream)

        if not text:
            return "Could not extract text from PDF.", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A" # Return empty fields

        contact_info = extract_contact_info(text)
        name = extract_name(text, nlp)
        sections = extract_sections(text)

        # Format data for Gradio output
        full_name = name if name else "N/A"
        email = ", ".join(contact_info.get("email", [])) if contact_info.get("email") else "N/A"
        phone = ", ".join(contact_info.get("phone", [])) if contact_info.get("phone") else "N/A"
        linkedin = ", ".join(contact_info.get("linkedin", [])) if contact_info.get("linkedin") else "N/A"
        education = sections.get("education", "N/A")
        skills = sections.get("skills", "N/A")
        work_experience = sections.get("work_experience", "N/A")
        certifications = ", ".join(sections.get("certifications", [])) if sections.get("certifications") else "N/A"


        return "Parsing successful!", full_name, email, phone, linkedin, education, skills, work_experience, certifications

    except Exception as e:
        print(f"An error occurred during Gradio parsing: {e}") # Print error to console
        return f"An error occurred during parsing: {e}", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A" # Return error message and empty fields

print("Gradio parsing function defined.")

spaCy model loaded successfully.
Gradio parsing function defined.


In [14]:
import fitz # PyMuPDF
import re
import spacy
import io
import os

# Load the spaCy model outside of the request handling
# This assumes the spaCy model "en_core_web_sm" has been downloaded previously
try:
    nlp = spacy.load("en_core_web_sm")
    print("spaCy model loaded successfully.") # Print to console for verification
except Exception as e:
    nlp = None
    print(f"Error loading spaCy model: {e}") # Print error to console

# Redefine the parsing functions to ensure they are available in this scope
def extract_text_from_pdf(pdf_stream):
    """Extracts text from a PDF file stream."""
    text = ""
    try:
        # Use a file-like object directly with fitz.open
        doc = fitz.open(stream=pdf_stream.read(), filetype="pdf")
        for page_num in range(doc.page_count):
            page = doc.load_page(page_num)
            text += page.get_text()
    except Exception as e:
        print(f"Error extracting text from PDF: {e}") # Print error to console
        return None
    return text

def extract_contact_info(text):
    """Extracts email, phone numbers, and LinkedIn profiles using regex."""
    email = re.findall(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', text)
    phone = re.findall(r'\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}', text)
    # Regex for LinkedIn profile URLs
    # This pattern looks for common LinkedIn URL structures
    linkedin = re.findall(r'(?:http(s)?:\/\/)?([\w]+\.)?linkedin\.com\/(pub|in|profile)\/([-a-zA-Z0-9]+)\/?', text)
    # Format the LinkedIn results to get just the URLs or relevant parts
    linkedin_urls = []
    for match in linkedin:
         # Reconstruct the URL parts that were captured
        protocol = match[0] if match[0] else '' # http or https
        subdomain = match[1] if match[1] else '' # www. or empty
        profile_type = match[2] if match[2] else '' # pub, in, or profile
        profile_id = match[3] if match[3] else '' # the profile ID
        linkedin_urls.append(f"{'http'+protocol+'://' if protocol or subdomain else ''}{subdomain}linkedin.com/{profile_type}/{profile_id}")


    return {"email": email, "phone": phone, "linkedin": linkedin_urls}


def parse_certifications(certifications_text):
    """Parses the text identified as 'Certifications' to extract individual certifications."""
    certifications_list = []
    if certifications_text:
        # Simple parsing: split by common separators like newlines, commas, or semicolons
        # This is a basic approach and can be improved with more sophisticated pattern matching
        lines = certifications_text.split('\n')
        for line in lines:
            line = line.strip()
            if line:
                # Further refine parsing based on potential patterns within lines
                # For now, just add non-empty lines as individual certifications
                certifications_list.append(line)
    return certifications_list


def extract_sections(text):
    """Extracts sections like Education, Skills, Work Experience, and Certifications."""
    sections = {}
    # Simple keyword-based extraction (can be improved with more sophisticated NLP)
    keywords = {
        "education": ["education", "academic"],
        "skills": ["skills", "proficiencies"],
        "experience": ["experience", "work history", "employment"],
        # Updated keywords for certifications for potentially better matching
        "certifications": ["certifications", "licenses", "professional development", "training", "awards"]
    }

    text_lower = text.lower()

    # Sort keywords by their appearance in the text to better identify section boundaries
    found_keywords = sorted([
        (text_lower.find(kw), section, kw) for section, kws in keywords.items() for kw in kws if kw in text_lower
    ])

    # Extract sections based on the order of keywords
    for i, (start_index, section, kw) in enumerate(found_keywords):
        if start_index != -1:
            # Find the end index for the current section
            end_index = len(text)
            if i + 1 < len(found_keywords):
                next_start_index, _, _ = found_keywords[i+1]
                end_index = next_start_index

            section_text = text[start_index:end_index].strip()

            # Remove the keyword itself from the start of the section text
            # Find the exact match case-insensitively and remove it
            keyword_match = re.search(r'\b' + re.escape(kw) + r'\b', section_text, re.IGNORECASE)
            if keyword_match:
                section_text = section_text[keyword_match.end():].strip()


            sections[section] = section_text.strip()

    # If a section wasn't found by keyword but other sections were,
    # a simple keyword match might still be useful for a fallback
    for section, kws in keywords.items():
        if section not in sections:
             for kw in kws:
                start_index = text_lower.find(kw)
                if start_index != -1:
                    # Simple approach: take text from keyword until the next potential section header or end of text
                    remaining_text = text_lower[start_index:]
                    end_index = len(remaining_text)
                    for other_section, other_kws in keywords.items():
                        if other_section != section:
                            for other_kw in other_kws:
                                other_kw_index = remaining_text.find(other_kw)
                                if other_kw_index != -1 and other_kw_index < end_index:
                                    end_index = other_kw_index
                    section_text = text[start_index:start_index + end_index]

                    # Remove the keyword itself from the start of the section text
                    keyword_match = re.search(r'\b' + re.escape(kw) + r'\b', section_text, re.IGNORECASE)
                    if keyword_match:
                        section_text = section_text[keyword_match.end():].strip()

                    sections[section] = section_text.strip()
                    break # Found a keyword for this section


    # Parse the certifications text specifically
    certifications_text = sections.get("certifications", "")
    parsed_certifications = parse_certifications(certifications_text)
    sections["certifications"] = parsed_certifications # Store as a list of strings

    return sections

# Need to define extract_name function for parse_resume to work
def extract_name(text, nlp):
    """Extracts a potential name from the text using spaCy."""
    if nlp:
        doc = nlp(text)
        names = []
        for ent in doc.ents:
            # Assuming PERSON entities are names
            if ent.label_ == "PERSON":
                names.append(ent.text)
        # Return the longest name found, or None if no names found
        if names:
            return max(names, key=len)
    return None # Return None if nlp model is not loaded or no name is found


def parse_resume_gradio(file_path):
    """
    Main parsing function adapted for Gradio.
    Takes a file path, extracts text, and parses resume information.
    """
    if file_path is None:
        return "Please upload a PDF file.", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A" # Return empty fields for Gradio output components

    try:
        with open(file_path, 'rb') as f:
            pdf_stream = io.BytesIO(f.read())

        text = extract_text_from_pdf(pdf_stream)

        if not text:
            return "Could not extract text from PDF.", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A" # Return empty fields

        contact_info = extract_contact_info(text)
        name = extract_name(text, nlp)
        sections = extract_sections(text)

        # Format data for Gradio output
        full_name = name if name else "N/A"
        email = ", ".join(contact_info.get("email", [])) if contact_info.get("email") else "N/A"
        phone = ", ".join(contact_info.get("phone", [])) if contact_info.get("phone") else "N/A"
        linkedin = ", ".join(contact_info.get("linkedin", [])) if contact_info.get("linkedin") else "N/A"
        education = sections.get("education", "N/A")
        skills = sections.get("skills", "N/A")
        work_experience = sections.get("work_experience", "N/A")
        certifications = ", ".join(sections.get("certifications", [])) if sections.get("certifications") else "N/A"


        return "Parsing successful!", full_name, email, phone, linkedin, education, skills, work_experience, certifications

    except Exception as e:
        print(f"An error occurred during Gradio parsing: {e}") # Print error to console
        return f"An error occurred during parsing: {e}", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A" # Return error message and empty fields

print("Gradio parsing function defined.")

spaCy model loaded successfully.
Gradio parsing function defined.


## Build the gradio interface

### Subtask:
Design the user interface using Gradio components, including a file upload component and various components to display the extracted resume information (e.g., text boxes, lists).


**Reasoning**:
Define the Gradio input and output components as specified in the instructions.



In [15]:
import gradio as gr

# Define the input component (File Upload)
resume_file_input = gr.File(
    label="Upload your Resume (PDF only)",
    file_count="single",
    file_types=[".pdf"]
)

# Define the output components for extracted information
output_full_name = gr.Textbox(label="Full Name")
output_email = gr.Textbox(label="Email")
output_phone = gr.Textbox(label="Phone Number")
output_linkedin = gr.Textbox(label="LinkedIn")
output_education = gr.Textbox(label="Education History")
output_skills = gr.Textbox(label="Technical & Soft Skills")
output_work_experience = gr.Textbox(label="Work Experience")
output_certifications = gr.Textbox(label="Certifications")
output_status = gr.Textbox(label="Parsing Status") # Add a status message output

# Create a list of the output components
output_components = [
    output_status,
    output_full_name,
    output_email,
    output_phone,
    output_linkedin,
    output_education,
    output_skills,
    output_work_experience,
    output_certifications,
]

print("Gradio input and output components defined.")

Gradio input and output components defined.


## Integrate parsing logic with the gradio interface

### Subtask:
Connect the file upload component to the parsing functions and link the output of the parsing functions to the display components in the Gradio interface.


**Reasoning**:
I need to define the `parse_resume_gradio` function, the input and output Gradio components, create the Gradio interface using `gr.Interface`, and launch the interface, as per the subtask instructions. I will combine these steps into a single code block since they are sequentially dependent on each other and directly contribute to building and launching the Gradio app. I will ensure all necessary parsing functions are included or accessible in this scope.



In [16]:
import gradio as gr
import fitz # PyMuPDF
import re
import spacy
import io
import os

# Load the spaCy model outside of the request handling
# This assumes the spaCy model "en_core_web_sm" has been downloaded previously
try:
    nlp = spacy.load("en_core_web_sm")
    print("spaCy model loaded successfully.") # Print to console for verification
except Exception as e:
    nlp = None
    print(f"Error loading spaCy model: {e}") # Print error to console

# Redefine the parsing functions to ensure they are available in this scope
def extract_text_from_pdf(pdf_stream):
    """Extracts text from a PDF file stream."""
    text = ""
    try:
        # Use a file-like object directly with fitz.open
        doc = fitz.open(stream=pdf_stream.read(), filetype="pdf")
        for page_num in range(doc.page_count):
            page = doc.load_page(page_num)
            text += page.get_text()
    except Exception as e:
        print(f"Error extracting text from PDF: {e}") # Print error to console
        return None
    return text

def extract_contact_info(text):
    """Extracts email, phone numbers, and LinkedIn profiles using regex."""
    email = re.findall(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', text)
    phone = re.findall(r'\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}', text)
    # Regex for LinkedIn profile URLs
    # This pattern looks for common LinkedIn URL structures
    linkedin = re.findall(r'(?:http(s)?:\/\/)?([\w]+\.)?linkedin\.com\/(pub|in|profile)\/([-a-zA-Z0-9]+)\/?', text)
    # Format the LinkedIn results to get just the URLs or relevant parts
    linkedin_urls = []
    for match in linkedin:
         # Reconstruct the URL parts that were captured
        protocol = match[0] if match[0] else '' # http or https
        subdomain = match[1] if match[1] else '' # www. or empty
        profile_type = match[2] if match[2] else '' # pub, in or profile
        profile_id = match[3] if match[3] else '' # the profile ID
        linkedin_urls.append(f"{'http'+protocol+'://' if protocol or subdomain else ''}{subdomain}linkedin.com/{profile_type}/{profile_id}")


    return {"email": email, "phone": phone, "linkedin": linkedin_urls}


def parse_certifications(certifications_text):
    """Parses the text identified as 'Certifications' to extract individual certifications."""
    certifications_list = []
    if certifications_text:
        # Simple parsing: split by common separators like newlines, commas, or semicolons
        # This is a basic approach and can be improved with more sophisticated pattern matching
        lines = certifications_text.split('\n')
        for line in lines:
            line = line.strip()
            if line:
                # Further refine parsing based on potential patterns within lines
                # For now, just add non-empty lines as individual certifications
                certifications_list.append(line)
    return certifications_list


def extract_sections(text):
    """Extracts sections like Education, Skills, Work Experience, and Certifications."""
    sections = {}
    # Simple keyword-based extraction (can be improved with more sophisticated NLP)
    keywords = {
        "education": ["education", "academic"],
        "skills": ["skills", "proficiencies"],
        "experience": ["experience", "work history", "employment"],
        # Updated keywords for certifications for potentially better matching
        "certifications": ["certifications", "licenses", "professional development", "training", "awards"]
    }

    text_lower = text.lower()

    # Sort keywords by their appearance in the text to better identify section boundaries
    found_keywords = sorted([
        (text_lower.find(kw), section, kw) for section, kws in keywords.items() for kw in kws if kw in text_lower
    ])

    # Extract sections based on the order of keywords
    for i, (start_index, section, kw) in enumerate(found_keywords):
        if start_index != -1:
            # Find the end index for the current section
            end_index = len(text)
            if i + 1 < len(found_keywords):
                next_start_index, _, _ = found_keywords[i+1]
                end_index = next_start_index

            section_text = text[start_index:end_index].strip()

            # Remove the keyword itself from the start of the section text
            # Find the exact match case-insensitively and remove it
            keyword_match = re.search(r'\b' + re.escape(kw) + r'\b', section_text, re.IGNORECASE)
            if keyword_match:
                section_text = section_text[keyword_match.end():].strip()


            sections[section] = section_text.strip()

    # If a section wasn't found by keyword but other sections were,
    # a simple keyword match might still be useful for a fallback
    for section, kws in keywords.items():
        if section not in sections:
             for kw in kws:
                start_index = text_lower.find(kw)
                if start_index != -1:
                    # Simple approach: take text from keyword until the next potential section header or end of text
                    remaining_text = text_lower[start_index:]
                    end_index = len(remaining_text)
                    for other_section, other_kws in keywords.items():
                        if other_section != section:
                            for other_kw in other_kws:
                                other_kw_index = remaining_text.find(other_kw)
                                if other_kw_index != -1 and other_kw_index < end_index:
                                    end_index = other_kw_index
                    section_text = text[start_index:start_index + end_index]

                    # Remove the keyword itself from the start of the section text
                    keyword_match = re.search(r'\b' + re.escape(kw) + r'\b', section_text, re.IGNORECASE)
                    if keyword_match:
                        section_text = section_text[keyword_match.end():].strip()

                    sections[section] = section_text.strip()
                    break # Found a keyword for this section


    # Parse the certifications text specifically
    certifications_text = sections.get("certifications", "")
    parsed_certifications = parse_certifications(certifications_text)
    sections["certifications"] = parsed_certifications # Store as a list of strings

    return sections

# Need to define extract_name function for parse_resume to work
def extract_name(text, nlp):
    """Extracts a potential name from the text using spaCy."""
    if nlp:
        doc = nlp(text)
        names = []
        for ent in doc.ents:
            # Assuming PERSON entities are names
            if ent.label_ == "PERSON":
                names.append(ent.text)
        # Return the longest name found, or None if no names found
        if names:
            return max(names, key=len)
    return None # Return None if nlp model is not loaded or no name is found


def parse_resume_gradio(file_path):
    """
    Main parsing function adapted for Gradio.
    Takes a file path, extracts text, and parses resume information.
    Returns a tuple of parsed data for Gradio output components.
    """
    # Initialize all output fields to N/A or empty strings
    default_output = ("Please upload a PDF file.", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A")

    if file_path is None:
        return default_output

    try:
        with open(file_path, 'rb') as f:
            pdf_stream = io.BytesIO(f.read())

        text = extract_text_from_pdf(pdf_stream)

        if not text:
            return "Could not extract text from PDF.", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A"

        # Check if nlp model is loaded before proceeding with name extraction
        if nlp is None:
             return "NLP model not loaded. Cannot extract name and potentially other info accurately.", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A"


        contact_info = extract_contact_info(text)
        name = extract_name(text, nlp) # Use the nlp model here
        sections = extract_sections(text)

        # Format data for Gradio output
        full_name = name if name else "N/A"
        email = ", ".join(contact_info.get("email", [])) if contact_info.get("email") else "N/A"
        phone = ", ".join(contact_info.get("phone", [])) if contact_info.get("phone") else "N/A"
        linkedin = ", ".join(contact_info.get("linkedin", [])) if contact_info.get("linkedin") else "N/A"
        education = sections.get("education", "N/A")
        skills = sections.get("skills", "N/A")
        work_experience = sections.get("work_experience", "N/A")
        certifications = ", ".join(sections.get("certifications", [])) if sections.get("certifications") else "N/A"


        return "Parsing successful!", full_name, email, phone, linkedin, education, skills, work_experience, certifications

    except Exception as e:
        print(f"An error occurred during Gradio parsing: {e}") # Print error to console
        return f"An error occurred during parsing: {e}", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A", "N/A" # Return error message and empty fields


# Define the input component (File Upload)
resume_file_input = gr.File(
    label="Upload your Resume (PDF only)",
    file_count="single",
    file_types=[".pdf"]
)

# Define the output components for extracted information
output_status = gr.Textbox(label="Parsing Status") # Status message output
output_full_name = gr.Textbox(label="Full Name")
output_email = gr.Textbox(label="Email")
output_phone = gr.Textbox(label="Phone Number")
output_linkedin = gr.Textbox(label="LinkedIn")
output_education = gr.Textbox(label="Education History")
output_skills = gr.Textbox(label="Technical & Soft Skills")
output_work_experience = gr.Textbox(label="Work Experience")
output_certifications = gr.Textbox(label="Certifications")


# Create a list of the output components in the order returned by parse_resume_gradio
output_components = [
    output_status,
    output_full_name,
    output_email,
    output_phone,
    output_linkedin,
    output_education,
    output_skills,
    output_work_experience,
    output_certifications,
]


# Create the Gradio interface
iface = gr.Interface(
    fn=parse_resume_gradio,
    inputs=resume_file_input,
    outputs=output_components,
    title="AI-Powered Resume Parser",
    description="Upload a PDF resume to extract key information."
)

print("Launching Gradio interface...")

# Launch the Gradio interface
# Setting share=True will create a public shareable link (useful for testing/demo)
# Setting share=False will only run locally
iface.launch(share=False)


spaCy model loaded successfully.
Launching Gradio interface...
Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Note: opening Chrome Inspector may crash demo inside Colab notebooks.
* To create a public link, set `share=True` in `launch()`.


<IPython.core.display.Javascript object>



## Summary:

### Data Analysis Key Findings

*   The development process successfully set up the necessary environment by installing `gradio`, `fitz`, and `spacy`, including downloading the `en_core_web_sm` NLP model.
*   A critical `ModuleNotFoundError` related to the `fitz` library was encountered during the attempt to adapt the parsing logic for Gradio, preventing PDF text extraction.
*   The `ModuleNotFoundError` was resolved by forcefully reinstalling `PyMuPDF`.
*   The parsing logic, including functions for extracting text, contact info, sections, and name, was successfully defined and adapted to work with file paths provided by Gradio.
*   Gradio components for file upload and displaying parsed data (name, email, phone, LinkedIn, education, skills, experience, certifications, and status) were correctly defined.
*   The Gradio interface was successfully created and launched locally, integrating the file input, the parsing function, and the output components.
*   The application is ready for manual testing to verify the parsing accuracy across various resume formats.

### Insights or Next Steps

*   Thorough manual testing with a diverse set of resume formats is crucial to evaluate the accuracy and robustness of the keyword-based section extraction and regex patterns.
*   Further refinement of the parsing logic, potentially using more advanced NLP techniques (beyond simple spaCy entity recognition and keyword matching) or machine learning models, could significantly improve the accuracy of extracted information, especially for unstructured sections like skills and work experience.
