# **🚀 Introduction**
With the increasing number of job applications, HR teams and recruiters need automated resume processing tools to extract relevant information from resumes efficiently. PDF Query Tools powered by Generative AI (GenAI), NLP, and Vector Databases allow structured extraction and querying of resume data.

# **📌 1️⃣ Understanding Resume Extraction**
**🔹 What is Resume Extraction?**

Resume extraction is the process of automatically identifying, parsing, and structuring key details from resumes. This includes:

✅ Personal Information (Name, Email, Phone)

✅ Education (Degrees, Universities, Graduation Years)

✅ Work Experience (Companies, Job Titles, Responsibilities)

✅ Skills (Programming, Soft Skills, Certifications)

✅ Projects & Achievements

**🔹 Why Automate Resume Parsing?**

✅ Speeds up recruitment (Extracts relevant details in seconds)

✅ Standardizes resume formats (Handles different resume templates)

✅ Improves searchability (Finds candidates based on skills, experience, etc.)

✅ Integrates with ATS (Applicant Tracking Systems)

#**📌 2️⃣ Applications of Generative AI in Resume Processing**
GenAI enhances resume extraction with:

📝 Resume Summarization: Generates a summary of key points

🔍 Intelligent Search (Semantic Queries): Finds resumes based on skill/experience

📊 Candidate Ranking: Scores resumes based on job fit

🔄 Resume Formatting & Structuring: Converts unstructured resumes into structured data

# **Real-World Applications**

🔹 **1. AI-Powered Recruitment**
* Quickly find the best resumes for a job opening.
* Automate screening and ranking.

🔹 **2. HR Resume Management**

* Organize candidate resumes in a searchable database.
* Query based on specific skills or experience.

🔹 **3. Freelance/Job Portals**

* Help recruiters find freelancers based on project history & expertise.

🔹 **4. Education & Scholarship Screening**

* Extract information from academic resumes for scholarships.

# **Implementation**

In [2]:
!pip install langchain-openai langchain-community

Collecting langchain-openai
  Using cached langchain_openai-0.3.3-py3-none-any.whl.metadata (2.7 kB)
Collecting langchain-community
  Downloading langchain_community-0.3.16-py3-none-any.whl.metadata (2.9 kB)
Collecting langchain-core<0.4.0,>=0.3.33 (from langchain-openai)
  Downloading langchain_core-0.3.33-py3-none-any.whl.metadata (6.3 kB)
Collecting tiktoken<1,>=0.7 (from langchain-openai)
  Downloading tiktoken-0.8.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting httpx-sse<0.5.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting langchain<0.4.0,>=0.3.16 (from langchain-community)
  Downloading langchain-0.3.17-py3-none-any.whl.metadata (7.1 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.

In [3]:
import openai
from google.colab import userdata
import os


openai_api= userdata.get("OPENAI_API_KEY")

In [15]:
import os
from typing import Dict, List
from langchain.document_loaders import PyPDFLoader, TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.chat_models import ChatOpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import spacy
import re
import json

class ResumeParser:
    def __init__(self, openai_api_key: str):
        """Initialize the resume parser with OpenAI API key."""
        self.llm = ChatOpenAI(
            temperature=0,
            model_name="gpt-3.5-turbo",
            openai_api_key=openai_api
        )

        try:
            self.nlp = spacy.load("en_core_web_sm")
        except:
            os.system("python -m spacy download en_core_web_sm")
            self.nlp = spacy.load("en_core_web_sm")

        self.matching_template = PromptTemplate(
            input_variables=["resume_text", "job_description"],
            template="""
            Analyze the following resume against the job description and provide a detailed matching analysis.

            Resume:
            {resume_text}

            Job Description:
            {job_description}

            Provide your response in the following JSON format:
            {{
                "overall_match_percentage": <number between 0 and 100>,
                "matching_requirements": [<list of specific matching skills and requirements>],
                "missing_requirements": [<list of specific missing or partial skills and requirements>],
                "recommendation": "<detailed recommendation based on the analysis>"
            }}

            Ensure all fields have valid values and the response is in proper JSON format.
            Be specific about technical skills, experience, and qualifications in your analysis.
            Consider both exact matches and related/transferable skills.
            """
        )

        self.matching_chain = LLMChain(llm=self.llm, prompt=self.matching_template)

    def load_resume(self, file_path: str) -> str:
        """Load resume from file."""
        file_extension = os.path.splitext(file_path)[1].lower()

        if file_extension == '.pdf':
            loader = PyPDFLoader(file_path)
            pages = loader.load()
            text = ' '.join([page.page_content for page in pages])
        elif file_extension in ['.txt', '.doc', '.docx']:
            loader = TextLoader(file_path)
            text = loader.load()[0].page_content
        else:
            raise ValueError("Unsupported file format. Please provide PDF, TXT, DOC, or DOCX file.")

        return text

    def match_resume_with_job(self, resume_text: str, job_description: str) -> Dict:
        """Match resume against job description with guaranteed response."""
        try:
            # Get match analysis from LLM
            result = self.matching_chain.run({
                "resume_text": resume_text,
                "job_description": job_description
            })

            # Parse JSON response
            try:
                match_result = json.loads(result)
            except json.JSONDecodeError:
                # If JSON parsing fails, try to clean the response
                # Remove any markdown formatting or extra text
                clean_result = re.search(r'\{.*\}', result, re.DOTALL)
                if clean_result:
                    match_result = json.loads(clean_result.group(0))
                else:
                    raise ValueError("Could not parse LLM response as JSON")

            # Validate and ensure all required fields are present
            required_fields = {
                'overall_match_percentage': 0,
                'matching_requirements': [],
                'missing_requirements': [],
                'recommendation': 'No specific recommendation provided.'
            }

            for field, default_value in required_fields.items():
                if field not in match_result or match_result[field] is None:
                    match_result[field] = default_value

            # Ensure percentage is a number between 0 and 100
            try:
                match_result['overall_match_percentage'] = float(match_result['overall_match_percentage'])
                match_result['overall_match_percentage'] = max(0, min(100, match_result['overall_match_percentage']))
            except (ValueError, TypeError):
                match_result['overall_match_percentage'] = 0

            # Ensure lists are actually lists
            if not isinstance(match_result['matching_requirements'], list):
                match_result['matching_requirements'] = []
            if not isinstance(match_result['missing_requirements'], list):
                match_result['missing_requirements'] = []

            # Ensure recommendation is a string
            if not isinstance(match_result['recommendation'], str):
                match_result['recommendation'] = str(match_result['recommendation'])

            return match_result

        except Exception as e:
            # Return a valid response structure even in case of error
            return {
                'overall_match_percentage': 0,
                'matching_requirements': [],
                'missing_requirements': ['Could not analyze requirements due to error'],
                'recommendation': f'Error during analysis: {str(e)}. Please try again.'
            }

def main():
    # Example usage
    OPENAI_API_KEY = openai_api
    parser = ResumeParser(OPENAI_API_KEY)

    # Example job description
    job_description = """
    We are seeking a Machine Learning Engineer with:
    - Strong Python programming skills
    - Experience with ML frameworks (TensorFlow, PyTorch)
    - Knowledge of NLP and Computer Vision
    - Experience with REST APIs and Flask/FastAPI
    - Database experience (SQL, NoSQL)
    - Version control with Git
    """

    # Load resume text from file or use provided text
    resume_text = "/content/HamzaJafri-Resume.pdf"

    # Get match analysis
    match_result = parser.match_resume_with_job(resume_text, job_description)

    # Print results
    print("\n=== Match Analysis ===")
    print(f"Overall Match: {match_result['overall_match_percentage']}%")

    print("\nMatching Requirements:")
    for req in match_result['matching_requirements']:
        print(f"- {req}")

    print("\nMissing Requirements:")
    for req in match_result['missing_requirements']:
        print(f"- {req}")

    print(f"\nRecommendation: {match_result['recommendation']}")

if __name__ == "__main__":
    main()




=== Match Analysis ===
Overall Match: 70.0%

Matching Requirements:
- Strong Python programming skills
- Experience with ML frameworks (TensorFlow, PyTorch)
- Knowledge of NLP and Computer Vision
- Database experience (SQL, NoSQL)

Missing Requirements:
- Experience with REST APIs and Flask/FastAPI
- Version control with Git

Recommendation: The candidate's resume shows a strong match with the key requirements for the Machine Learning Engineer position. They have demonstrated strong Python programming skills and experience with ML frameworks like TensorFlow and PyTorch. Their knowledge of NLP and Computer Vision is also a good fit for the role. However, they are missing experience with REST APIs and Flask/FastAPI, as well as version control with Git. It is recommended that the candidate gains experience in these areas to further enhance their qualifications for the position.


In [7]:
!pip install pypdf

Collecting pypdf
  Downloading pypdf-5.2.0-py3-none-any.whl.metadata (7.2 kB)
Downloading pypdf-5.2.0-py3-none-any.whl (298 kB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/298.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━[0m [32m163.8/298.7 kB[0m [31m4.7 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m298.7/298.7 kB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pypdf
Successfully installed pypdf-5.2.0
