# IntelliApply: Automated Job Matching & Application Preparation Agent

This notebook implements an AI agent system that:
- Ingests a user's resume (PDF, DOCX, or TXT)
- Extracts their profile and job preferences using Gemini
- Fetches relevant remote jobs from the web
- Ranks and filters jobs by fit
- Tailors resume content and generates cover letters
- Exports tailored resume and cover letter as PDFs per job

Built for the Kaggle + Google Agents Intensive Capstone (Enterprise Agents track).

## Install Dependencies

In [1]:
!pip install -q -U google-generativeai pymupdf docx2txt fpdf2

## Imports & Gemini Configuration

In [2]:
import os
import json
import logging
import textwrap
import re
from dataclasses import dataclass, asdict
from typing import List, Optional
from datetime import datetime, timezone

import getpass
import requests
import fitz
import docx2txt
from fpdf import FPDF

import google.generativeai as genai
from google.colab import files


In [3]:
# -------------------------------------------------------------------
# Gemini client configuration
# -------------------------------------------------------------------

def configure_gemini() -> None:
    """Configure the Gemini client using an API key stored in the environment."""
    if not os.environ.get("GEMINI_API_KEY"):
        os.environ["GEMINI_API_KEY"] = getpass.getpass("Gemini API key: ")

    api_key = os.environ["GEMINI_API_KEY"]
    if not api_key:\
        raise ValueError("GEMINI_API_KEY is empty.")

    genai.configure(api_key=api_key)


MODEL_NAME = "models/gemini-2.5-flash"

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("intelliapply")

# simple in-memory session state
USER_PROFILE_STORE = None
APPLIED_JOBS_HISTORY: List[dict] = []

configure_gemini()

# check
_model = genai.GenerativeModel(MODEL_NAME)
print(_model.generate_content("IntelliApply system check: OK").text.strip())

Gemini API key: ··········
Understood. IntelliApply system status is confirmed as OK.


## Data models

In [4]:
# -------------------------------------------------------------------
# Data models
# -------------------------------------------------------------------

@dataclass
class Preferences:
    """Job search preferences for a candidate."""
    target_titles: List[str]
    locations: List[str]
    remote_only: bool = False
    min_salary: Optional[int] = None
    preferred_companies: Optional[List[str]] = None


@dataclass
class ExperienceItem:
    title: str
    company: str
    start_date: Optional[str]  # "YYYY-MM" or None
    end_date: Optional[str]
    bullets: List[str]


@dataclass
class ProjectItem:
    name: str
    description: str
    bullets: List[str]


@dataclass
class UserProfile:
    name: str
    summary: str
    skills: List[str]
    experience: List[ExperienceItem]
    projects: List[ProjectItem]
    education: List[str]


@dataclass
class JobDescription:
    job_id: str
    title: str
    company: str
    location: Optional[str]
    salary_estimate: Optional[str]
    description: str
    requirements: Optional[str]


@dataclass
class MatchResult:
    job: JobDescription
    score: float
    reason: str
    selected: bool


@dataclass
class TailoredResult:
    job_id: str
    tailored_summary: str
    ordered_skills: List[str]


@dataclass
class ApplicationPackage:
    job: JobDescription
    match_score: float
    match_reason: str
    tailored_summary: str
    ordered_skills: List[str]
    cover_letter: str

## Shared utilities

In [5]:
# -------------------------------------------------------------------
# Utility functions
# -------------------------------------------------------------------

def call_gemini_json(prompt: str, system_instruction: str) -> dict:
    """
    Call Gemini with a system instruction and prompt.
    Expect a JSON-only response and return the parsed object.
    """
    model = genai.GenerativeModel(
        MODEL_NAME,
        system_instruction=system_instruction,
    )
    response = model.generate_content(prompt)
    text = response.text.strip()

    if text.startswith("```"):
        # Handle fenced blocks such as ```json ... ```
        text = text.strip("`")
        text = text.replace("json", "", 1).strip()

    return json.loads(text)


def log_match(job_id: str, score: float, selected: bool, reason: str) -> None:
    """Log a single job match result."""
    logger.info(
        "match job_id=%s score=%.2f selected=%s reason=%s",
        job_id,
        score,
        selected,
        reason,
    )


def remember_application(job: JobDescription, decision: str, score: float) -> None:
    """Record the user's decision on a job into the in-memory history."""
    APPLIED_JOBS_HISTORY.append(
        {
            "job_id": job.job_id,
            "title": job.title,
            "company": job.company,
            "decision": decision,
            "score": score,
            # timezone-aware UTC timestamp (fixes utcnow deprecation)
            "timestamp_utc": datetime.now(timezone.utc).isoformat(),
        }
    )


def strip_html(html: str) -> str:
    """Remove HTML tags and collapse whitespace."""
    text = re.sub(r"<[^>]+>", " ", html or "")
    return " ".join(text.split())


def sanitize_pdf_text(text: str) -> str:
    """
    Normalize text for FPDF core fonts:
    - Replace em/en dashes with a simple hyphen
    - Drop characters outside Latin-1
    """
    if text is None:
        return ""
    text = text.replace("—", "-").replace("–", "-")
    # Encode/decode through latin-1 to strip unsupported characters
    return text.encode("latin-1", "ignore").decode("latin-1")


## Resume ingestion (PDF/DOCX/TXT --> text)

In [6]:
# -------------------------------------------------------------------
# Resume ingestion
# -------------------------------------------------------------------

def load_resume_text_from_upload() -> str:
    """
    Upload a resume file (PDF, DOCX, or TXT) via Colab and return its text.
    """
    print("Upload your resume file (PDF, DOCX, or TXT).")
    uploaded = files.upload()
    if not uploaded:
        raise ValueError("No file uploaded.")

    filename = next(iter(uploaded.keys()))
    content = uploaded[filename]
    print(f"Loaded file: {filename}")

    if filename.lower().endswith(".pdf"):
        with fitz.open(stream=content, filetype="pdf") as doc:
            text = "".join(page.get_text() for page in doc)
        return text

    if filename.lower().endswith(".docx"):
        tmp_path = "/tmp/resume.docx"
        with open(tmp_path, "wb") as f:
            f.write(content)
        return docx2txt.process(tmp_path)

    # Fallback: treat as plain text
    return content.decode("utf-8", errors="ignore")

## Profile parsing agent (resume --> UserProfile)

In [7]:
# -------------------------------------------------------------------
# Profile parsing
# -------------------------------------------------------------------

def parse_resume_to_profile(resume_text: str) -> UserProfile:
    """
    Convert free-form resume text into a structured UserProfile.
    The model is instructed not to fabricate experience or skills.
    """
    system_instruction = (
        "Extract a concise, structured profile from a resume. "
        "Do not invent skills, companies, dates, or roles."
    )

    prompt = f"""
Return ONLY valid JSON with this structure:

{{
  "name": "...",
  "summary": "...",
  "skills": ["skill1", "skill2"],
  "experience": [
    {{
      "title": "...",
      "company": "...",
      "start_date": "YYYY-MM" or null,
      "end_date": "YYYY-MM" or null,
      "bullets": ["..."]
    }}
  ],
  "projects": [
    {{
      "name": "...",
      "description": "...",
      "bullets": ["..."]
    }}
  ],
  "education": ["..."]
}}

RESUME:
{resume_text}
"""

    data = call_gemini_json(prompt, system_instruction)

    experience = [
        ExperienceItem(
            title=e["title"],
            company=e["company"],
            start_date=e.get("start_date"),
            end_date=e.get("end_date"),
            bullets=e.get("bullets", []),
        )
        for e in data.get("experience", [])
    ]

    projects = [
        ProjectItem(
            name=p["name"],
            description=p.get("description", ""),
            bullets=p.get("bullets", []),
        )
        for p in data.get("projects", [])
    ]

    profile = UserProfile(
        name=data.get("name", "Candidate"),
        summary=data.get("summary", ""),
        skills=data.get("skills", []),
        experience=experience,
        projects=projects,
        education=data.get("education", []),
    )
    return profile


def setup_profile_and_prefs(resume_text: str, prefs: Preferences) -> UserProfile:
    """Create a UserProfile from resume text and store it in session state."""
    global USER_PROFILE_STORE
    profile = parse_resume_to_profile(resume_text)
    USER_PROFILE_STORE = profile
    logger.info("profile name=%s skills=%d experience_items=%d",
                profile.name, len(profile.skills), len(profile.experience))
    logger.info("preferences=%s", prefs)
    return profile

## Preference extraction agent (from resume)

In [8]:
# -------------------------------------------------------------------
# Preference extraction
# -------------------------------------------------------------------

def extract_preferences_from_resume_text(resume_text: str) -> Preferences:
    """
    Infer job search preferences (titles, locations, remote, salary, companies)
    from the resume text.
    """
    system_instruction = (
        "Read a resume and infer realistic job search preferences for the candidate. "
        "Return only JSON, no explanations."
    )

    prompt = f"""
Return ONLY JSON with this structure:

{{
  "target_titles": ["Software Engineer Intern", "Machine Learning Engineer Intern"],
  "locations": ["United States", "Remote"],
  "remote_only": false,
  "min_salary": null,
  "preferred_companies": ["Google", "Microsoft"]
}}

Rules:
- target_titles: likely roles the candidate is pursuing.
- locations: countries / regions / 'Remote' they are open to.
- remote_only: true only if resume strongly prefers remote work.
- min_salary: integer or null if unclear.
- preferred_companies: names if the resume mentions targets; otherwise [].

RESUME:
{resume_text}
"""

    data = call_gemini_json(prompt, system_instruction)

    prefs = Preferences(
        target_titles=data.get("target_titles", []),
        locations=data.get("locations", []),
        remote_only=data.get("remote_only", False),
        min_salary=data.get("min_salary"),
        preferred_companies=data.get("preferred_companies"),
    )
    logger.info("inferred preferences=%s", prefs)
    return prefs


## Job fetcher agent (Remotive API)

In [9]:
# -------------------------------------------------------------------
# Job fetcher
# -------------------------------------------------------------------

def fetch_jobs_from_remotive(search_term: str, limit: int = 10) -> List[JobDescription]:
    """
    Fetch remote jobs matching a search term from the Remotive API.
    """
    url = "https://remotive.com/api/remote-jobs"
    params = {"search": search_term, "limit": limit}
    response = requests.get(url, params=params, timeout=20)
    response.raise_for_status()
    data = response.json()
    jobs_data = data.get("jobs", [])

    jobs: List[JobDescription] = []
    for j in jobs_data:
        description = strip_html(j.get("description", ""))
        jobs.append(
            JobDescription(
                job_id=str(j.get("id")),
                title=j.get("title", ""),
                company=j.get("company_name", ""),
                location=j.get("candidate_required_location") or "Remote",
                salary_estimate=j.get("salary") or None,
                description=description,
                requirements=None,
            )
        )
    return jobs


def fetch_jobs_for_preferences(prefs: Preferences, per_title: int = 10) -> List[JobDescription]:
    """
    Fetch jobs for each target title in preferences and de-duplicate by job_id.
    """
    all_jobs: List[JobDescription] = []
    seen_ids = set()

    for title in prefs.target_titles:
        for job in fetch_jobs_from_remotive(title, limit=per_title):
            if job.job_id in seen_ids:
                continue
            seen_ids.add(job.job_id)
            all_jobs.append(job)

    logger.info("fetched_jobs=%d titles=%s", len(all_jobs), prefs.target_titles)
    return all_jobs

## Matching and ranking agent

In [10]:
# -------------------------------------------------------------------
# Matching and ranking
# -------------------------------------------------------------------

def compute_match_score(profile: UserProfile, job: JobDescription, prefs: Preferences) -> MatchResult:
    """
    Compute a simple fit score based on skill overlap, title match and remote preference.
    """
    jd_text = (job.description + " " + (job.requirements or "")).lower()

    skill_hits = sum(1 for s in profile.skills if s.lower() in jd_text)
    base_score = skill_hits / max(len(profile.skills), 1)

    title_match = any(t.lower() in job.title.lower() for t in prefs.target_titles)
    if title_match:
        base_score += 0.2

    if prefs.remote_only and job.location and "remote" not in job.location.lower():
        base_score -= 0.3

    score = max(0.0, min(1.0, base_score))
    selected = score >= 0.3

    reason = (
        f"skills_matched={skill_hits}, "
        f"title_match={title_match}, "
        f"remote_only={prefs.remote_only}, "
        f"score={score:.2f}"
    )

    log_match(job.job_id, score, selected, reason)
    return MatchResult(job=job, score=score, reason=reason, selected=selected)


def rank_jobs(profile: UserProfile, prefs: Preferences, jobs: List[JobDescription]) -> List[MatchResult]:
    """Rank jobs in descending order of match score."""
    results = [compute_match_score(profile, job, prefs) for job in jobs]
    results.sort(key=lambda r: r.score, reverse=True)
    return results


## Tailoring and cover letter agents

In [11]:
# -------------------------------------------------------------------
# Tailoring and cover letters
# -------------------------------------------------------------------

def tailor_resume_for_job(profile: UserProfile, job: JobDescription) -> TailoredResult:
    """
    Produce a role-specific summary and a reordered skill list for a job.
    Skills must come from the candidate profile.
    """
    system_instruction = (
        "Tailor resume content for a specific job. "
        "Use only the candidate's existing skills and experience."
    )

    prompt = f"""
Return ONLY valid JSON:

{{
  "tailored_summary": "2-3 sentences targeting this job.",
  "ordered_skills": ["skill1", "skill2", "..."]  // subset + reorder of candidate skills
}}

JOB:
- Title: {job.title}
- Company: {job.company}
- Description: {job.description}
- Requirements: {job.requirements}

CANDIDATE_PROFILE:
{json.dumps(asdict(profile), indent=2)}
"""

    data = call_gemini_json(prompt, system_instruction)

    real_skills = set(profile.skills)
    ordered = [s for s in data.get("ordered_skills", []) if s in real_skills]
    for s in profile.skills:
        if s not in ordered:
            ordered.append(s)

    return TailoredResult(
        job_id=job.job_id,
        tailored_summary=data.get("tailored_summary", ""),
        ordered_skills=ordered,
    )


def generate_cover_letter(profile: UserProfile, job: JobDescription, tailored_summary: str) -> str:
    """
    Generate a short cover letter for a specific job using the candidate profile.
    """
    system_instruction = (
        "Write a concise, honest cover letter based only on the candidate profile."
    )

    prompt = f"""
JOB:
- Title: {job.title}
- Company: {job.company}
- Description: {job.description}

CANDIDATE_PROFILE:
{json.dumps(asdict(profile), indent=2)}

TAILORED_SUMMARY:
{tailored_summary}

Write the cover letter. Do not introduce skills or employers that are not in the profile.
"""

    model = genai.GenerativeModel(MODEL_NAME, system_instruction=system_instruction)
    response = model.generate_content(prompt)
    return response.text.strip()


## PDF generation

In [12]:
# -------------------------------------------------------------------
# PDF generation
# -------------------------------------------------------------------

def build_tailored_resume_pdf(
    profile: UserProfile,
    tailoring: TailoredResult,
    pdf_path: str,
) -> None:
    """Render a compact, ATS-friendly resume PDF."""
    pdf = FPDF()
    pdf.set_auto_page_break(auto=True, margin=15)
    pdf.add_page()

    # Name
    pdf.set_font("Helvetica", "B", 16)
    pdf.cell(0, 10, sanitize_pdf_text(profile.name), ln=1)

    # Summary
    pdf.set_font("Helvetica", "B", 12)
    pdf.cell(0, 8, sanitize_pdf_text("Summary"), ln=1)
    pdf.set_font("Helvetica", "", 11)
    for line in textwrap.wrap(tailoring.tailored_summary, width=90):
        pdf.cell(0, 6, sanitize_pdf_text(line), ln=1)
    pdf.ln(3)

    # Skills
    pdf.set_font("Helvetica", "B", 12)
    pdf.cell(0, 8, sanitize_pdf_text("Skills"), ln=1)
    pdf.set_font("Helvetica", "", 11)
    skills_line = ", ".join(tailoring.ordered_skills)
    for line in textwrap.wrap(skills_line, width=90):
        pdf.cell(0, 6, sanitize_pdf_text(line), ln=1)
    pdf.ln(3)

    # Experience
    if profile.experience:
        pdf.set_font("Helvetica", "B", 12)
        pdf.cell(0, 8, sanitize_pdf_text("Experience"), ln=1)
        pdf.set_font("Helvetica", "", 11)
        for exp in profile.experience:
            title_line = f"{exp.title} - {exp.company}"
            pdf.cell(0, 6, sanitize_pdf_text(title_line), ln=1)
            if exp.start_date or exp.end_date:
                date_line = f"{exp.start_date or ''} - {exp.end_date or 'Present'}"
                pdf.cell(0, 6, sanitize_pdf_text(date_line), ln=1)
            for b in exp.bullets:
                for line in textwrap.wrap("• " + b, width=90):
                    pdf.cell(0, 6, sanitize_pdf_text(line), ln=1)
            pdf.ln(2)

    # Projects
    if profile.projects:
        pdf.set_font("Helvetica", "B", 12)
        pdf.cell(0, 8, sanitize_pdf_text("Projects"), ln=1)
        pdf.set_font("Helvetica", "", 11)
        for proj in profile.projects:
            pdf.cell(0, 6, sanitize_pdf_text(proj.name), ln=1)
            if proj.description:
                for line in textwrap.wrap(proj.description, width=90):
                    pdf.cell(0, 6, sanitize_pdf_text(line), ln=1)
            for b in proj.bullets:
                for line in textwrap.wrap("• " + b, width=90):
                    pdf.cell(0, 6, sanitize_pdf_text(line), ln=1)
            pdf.ln(2)

    # Education
    if profile.education:
        pdf.set_font("Helvetica", "B", 12)
        pdf.cell(0, 8, sanitize_pdf_text("Education"), ln=1)
        pdf.set_font("Helvetica", "", 11)
        for edu in profile.education:
            for line in textwrap.wrap(edu, width=90):
                pdf.cell(0, 6, sanitize_pdf_text(line), ln=1)

    pdf.output(pdf_path)


def build_cover_letter_pdf(
    cover_letter_text: str,
    profile: UserProfile,
    job: JobDescription,
    pdf_path: str,
) -> None:
    """Render a cover letter PDF from plain text."""
    pdf = FPDF()
    pdf.set_auto_page_break(auto=True, margin=15)
    pdf.add_page()

    # Name
    pdf.set_font("Helvetica", "B", 14)
    pdf.cell(0, 10, sanitize_pdf_text(profile.name), ln=1)

    pdf.set_font("Helvetica", "", 11)
    pdf.ln(4)

    # Header line
    header = f"Cover Letter - {job.title} at {job.company}"
    for line in textwrap.wrap(header, width=90):
        pdf.cell(0, 6, sanitize_pdf_text(line), ln=1)

    pdf.ln(4)
    for line in cover_letter_text.splitlines():
        if not line.strip():
            pdf.ln(3)
            continue
        for wrapped in textwrap.wrap(line, width=90):
            pdf.cell(0, 6, sanitize_pdf_text(wrapped), ln=1)

    pdf.output(pdf_path)


## Orchestrator

In [13]:
# -------------------------------------------------------------------
# Orchestrator
# -------------------------------------------------------------------

def run_intelliapply_pipeline(
    resume_text: str,
    preferences: Preferences,
    jobs: List[JobDescription],
    top_k: int = 5,
) -> List[ApplicationPackage]:
    """
    Execute the end-to-end flow:
    - build profile
    - rank jobs
    - for top_k jobs above a threshold: tailor content, generate cover letter
    - capture user decision and export PDFs when requested
    """
    profile = setup_profile_and_prefs(resume_text, preferences)
    ranked = rank_jobs(profile, preferences, jobs)
    candidates = [r for r in ranked if r.selected][:top_k]

    packages: List[ApplicationPackage] = []

    for match in candidates:
        job = match.job

        if any(h["job_id"] == job.job_id for h in APPLIED_JOBS_HISTORY):
            logger.info("job %s already handled in this session", job.job_id)
            continue

        tailoring = tailor_resume_for_job(profile, job)
        cover_letter = generate_cover_letter(profile, job, tailoring.tailored_summary)

        pkg = ApplicationPackage(
            job=job,
            match_score=match.score,
            match_reason=match.reason,
            tailored_summary=tailoring.tailored_summary,
            ordered_skills=tailoring.ordered_skills,
            cover_letter=cover_letter,
        )
        packages.append(pkg)

        print(f"\n=== {job.title} @ {job.company} ===")
        print(f"Location: {job.location}")
        print(f"Match score: {match.score:.2f}")
        print("Reason:", match.reason)
        print("\nTailored summary:\n", tailoring.tailored_summary)
        print("\nOrdered skills:", tailoring.ordered_skills)
        print("\nCover letter:\n", cover_letter)

        decision = input("\nApply to this job? (y/n/save): ").strip().lower()
        if decision not in ("y", "n", "save"):
            decision = "n"

        remember_application(job, decision, match.score)
        print("Decision recorded:", decision)

        if decision in ("y", "save"):
            safe_title = "".join(
                c for c in job.title if c.isalnum() or c in (" ", "_")
            ).strip().replace(" ", "_")
            base_name = f"{safe_title}_{job.job_id}"
            resume_pdf = f"{base_name}_resume.pdf"
            cover_pdf = f"{base_name}_cover_letter.pdf"

            build_tailored_resume_pdf(profile, tailoring, resume_pdf)
            build_cover_letter_pdf(cover_letter, profile, job, cover_pdf)

            print("Generated files:")
            print("  ", resume_pdf)
            print("  ", cover_pdf)

    return packages


## Main execution cell

In [14]:
# -------------------------------------------------------------------
# Main execution
# -------------------------------------------------------------------

resume_text = load_resume_text_from_upload()
print("\nExcerpt from extracted resume text:\n")
print(resume_text[:500])

preferences = extract_preferences_from_resume_text(resume_text)
print("\nInferred preferences:\n", preferences)

jobs = fetch_jobs_for_preferences(preferences, per_title=10)
print(f"\nFetched {len(jobs)} jobs for titles {preferences.target_titles}")
for j in jobs[:5]:
    print(f"- {j.title} @ {j.company} ({j.location})")

packages = run_intelliapply_pipeline(
    resume_text=resume_text,
    preferences=preferences,
    jobs=jobs,
    top_k=5,
)

print("\nApplication history:")
for entry in APPLIED_JOBS_HISTORY:
    print(entry)


Upload your resume file (PDF, DOCX, or TXT).


Saving test_resume.txt to test_resume (2).txt
Loaded file: test_resume (2).txt

Excerpt from extracted resume text:

John Doe  
(555) 987-6543 | john.doe.data@testmail.com  
LinkedIn: linkedin.com/in/johndoe-data  
GitHub: github.com/johndoe  

PROFESSIONAL SUMMARY  
Results-driven Data Engineer with 4+ years of experience building scalable data pipelines, cloud data solutions, and analytics platforms. Strong background in distributed computing, ETL orchestration, and enterprise data modeling. Skilled in Python, SQL, Spark, Azure, AWS, and Databricks. Experienced in delivering end-to-end data systems acr

Inferred preferences:
 Preferences(target_titles=['Data Engineer', 'Data Scientist', 'Analytics Engineer'], locations=['United States', 'Remote'], remote_only=False, min_salary=None, preferred_companies=[])

Fetched 4 jobs for titles ['Data Engineer', 'Data Scientist', 'Analytics Engineer']
- Tech Lead Databricks Data Engineer @ Mitre Media (USA, Canada, USA timezones)
- Senior 

  pdf.cell(0, 10, sanitize_pdf_text(profile.name), ln=1)
  pdf.cell(0, 8, sanitize_pdf_text("Summary"), ln=1)
  pdf.cell(0, 6, sanitize_pdf_text(line), ln=1)
  pdf.cell(0, 8, sanitize_pdf_text("Skills"), ln=1)
  pdf.cell(0, 6, sanitize_pdf_text(line), ln=1)
  pdf.cell(0, 8, sanitize_pdf_text("Experience"), ln=1)
  pdf.cell(0, 6, sanitize_pdf_text(title_line), ln=1)
  pdf.cell(0, 6, sanitize_pdf_text(date_line), ln=1)
  pdf.cell(0, 6, sanitize_pdf_text(line), ln=1)
  pdf.cell(0, 8, sanitize_pdf_text("Projects"), ln=1)
  pdf.cell(0, 6, sanitize_pdf_text(proj.name), ln=1)
  pdf.cell(0, 6, sanitize_pdf_text(line), ln=1)
  pdf.cell(0, 6, sanitize_pdf_text(line), ln=1)
  pdf.cell(0, 8, sanitize_pdf_text("Education"), ln=1)
  pdf.cell(0, 6, sanitize_pdf_text(line), ln=1)
  pdf.cell(0, 10, sanitize_pdf_text(profile.name), ln=1)
  pdf.cell(0, 6, sanitize_pdf_text(line), ln=1)
  pdf.cell(0, 6, sanitize_pdf_text(wrapped), ln=1)



=== Senior Data Engineer (AWS & Python) @ Proxify ===
Location: CET +/- 3 HOURS
Match score: 0.36
Reason: skills_matched=7, title_match=True, remote_only=False, score=0.36

Tailored summary:
 Results-driven Data Engineer with 4+ years of professional experience specializing in modern, cloud-native data platforms, with a strong focus on Amazon Web Services (AWS) and Python. Possessing deep expertise in designing, building, and optimizing highly scalable ETL/ELT pipelines and data warehouses using AWS S3, Glue, EMR, Redshift, and PySpark to power analytics and business intelligence. Adept at advanced SQL for complex query writing and optimization, coupled with a solid understanding of containerization and orchestration using Docker and Kubernetes.

Ordered skills: ['Python', 'AWS S3', 'AWS Glue', 'AWS EMR', 'AWS Redshift', 'SQL', 'PySpark', 'Apache Spark', 'Airflow', 'Docker', 'Kubernetes', 'AWS Lambda', 'AWS CloudWatch', 'SparkSQL', 'PostgreSQL', 'MySQL', 'DBT', 'Azure Databricks', 'Gi

In [16]:
# -------------------------------------------------------------------
# Simple evaluation: agent score vs human labels
# -------------------------------------------------------------------

eval_jobs = [
    JobDescription(
        job_id="e1",
        title="ML Intern",
        company="EvalCo",
        location="Remote",
        salary_estimate=None,
        description="Data or ML intern with Python and Power BI experience.",
        requirements="Python, deep learning, Power BI",
    ),
    JobDescription(
        job_id="e2",
        title="Sales Intern",
        company="NonTechCo",
        location="Onsite",
        salary_estimate=None,
        description="Sales role with no programming or data responsibilities.",
        requirements="communication, sales",
    ),
]

# Ground-truth labels for evaluation
human_labels = {
    "e1": "good_fit",
    "e2": "poor_fit",
}

# Use the existing profile and preferences from the main run
profile_for_eval = USER_PROFILE_STORE or parse_resume_to_profile(resume_text)
results_eval = rank_jobs(profile_for_eval, preferences, eval_jobs)

print("\nEvaluation (agent score vs human label):")
for r in results_eval:
    print(
        f"Job {r.job.job_id} ({r.job.title}): "
        f"score={r.score:.2f}, label={human_labels[r.job.job_id]}"
    )


Evaluation (agent score vs human label):
Job e1 (ML Intern): score=0.07, label=good_fit
Job e2 (Sales Intern): score=0.02, label=poor_fit
