# Chapter 1: Designing the AI Agent's Tools

## Learning Objectives

By the end of this chapter, you will:
- Understand the concept of AI agents and tool-based problem solving
- Build specialized functions that an AI agent can use
- Create a resume skill extraction tool using PydanticAI
- Develop a job matching algorithm with scoring
- Learn best practices for designing AI agent tools

## Introduction to AI Agents and Tools

> **Instructor Cue:** Start with the big picture: "We're building the final piece of our workshop - an AI agent that can read a resume and find the best matching job. Ask the audience: What would such a system need to do?"

An AI agent is a system that can autonomously perform tasks by using available tools. Think of it like a smart assistant that can:

- **Understand** what you're asking for
- **Choose** the right tools for the job
- **Execute** those tools in the correct sequence
- **Combine** results to give you a final answer

### Our Goal: Resume-to-Job Matching Agent

Today we're building an agent that solves this problem:
1. **Input**: A resume (as text)
2. **Process**: Extract skills and match against job database
3. **Output**: The best matching job opportunity

> **Instructor Cue:** Draw this on a whiteboard or screen: Resume → Extract Skills → Score Jobs → Best Match. This visual helps students understand the workflow.

### Breaking Down the Problem

To solve this complex problem, we'll create two specialized tools:

1. **Skill Extraction Tool**: Reads resume text and identifies key technical skills
2. **Job Matching Tool**: Compares skills against job postings and calculates match scores

> **Instructor Cue:** Emphasize that breaking complex problems into smaller, manageable functions is a fundamental programming principle. Each tool has a single, clear responsibility.

## Setting Up Our Tools Framework

### Installing Required Dependencies

Before we start coding, let's make sure we have all the necessary packages installed:

> **Instructor Cue:** Have everyone run these installation commands first. Explain that we're using Google's latest Gemini 2.5 Flash model which is free and has generous usage limits. We use `uv` for fast package management.

In [None]:
%load_ext dotenv
%dotenv

In [None]:
%%writefile app/tools.py

import os
from typing import Any

import pandas as pd
from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel
from pydantic_ai.providers.google import GoogleProvider

# Set up the AI model for our tools
# Note: You'll need to set your Google API key as an environment variable
# Using gemini-2.5-flash - Google's latest, fastest model with excellent reasoning
provider = GoogleProvider(api_key=os.getenv("GOOGLE_API_KEY"))
model = GoogleModel('gemini-2.5-flash')

# Create our agent instance
agent = Agent(model=model)

> **Instructor Cue:** Explain environment variables and API key setup. Show how to set GOOGLE_API_KEY in the terminal or use a .env file. You can get your API key from https://aistudio.google.com/app/apikey.

**Why Google Gemini 2.5 Flash?**
- **Free tier**: 15 requests per minute, 1500 requests per day - generous for learning
- **Latest model**: Gemini 2.5 Flash is Google's newest, fastest model with improved reasoning
- **Excellent text analysis**: Specifically strong at extracting structured information from unstructured text
- **Enhanced performance**: Better accuracy and speed compared to previous versions
- **Easy integration**: Works seamlessly with PydanticAI framework
- **No credit card required**: Perfect for workshops and learning environments

> **Instructor Cue:** Emphasize security - never hardcode API keys! Always use environment variables or secure secret management.

## Tool 1: Skill Extraction from Resume

Our first tool will use AI to extract technical skills from resume text. This is perfect for AI because it requires understanding context and identifying relevant information.

In [None]:
%%writefile -a app/tools.py

# <START> EXTRACT SKILLS

def extract_skills_from_resume(resume_text: str, top_n: int = 10) -> list[str]:
    """
    Extract the top technical skills from a resume using AI.

    Args:
        resume_text (str): The full text content of a resume

    Returns:
        list[str]: A list of the most important technical skills found
    """

    # Create a focused prompt for skill extraction
    prompt = f"""
    Please analyze this resume and extract the top {top_n} most important technical skills.
    Focus on:
    - Programming languages (Python, JavaScript, etc.)
    - Frameworks and libraries (React, Django, etc.)
    - Tools and technologies (Git, Docker, AWS, etc.)
    - Data analysis tools (Pandas, SQL, etc.)
    - Any other technical competencies

    Return only the skill names, one per line, without explanations.
    Be specific (e.g., "Python" not "programming languages").

    Resume text:
    {resume_text}
    """

    # Use the agent to process the prompt
    result = agent.run_sync(prompt)
    skills_text = result.output

    # Split by lines and clean up
    skills = [
        skill.strip().strip('-•*').strip()
        for skill in skills_text.split('\n')
        if skill.strip() and len(skill.strip()) > 1
    ]

    # Remove duplicates while preserving order
    unique_skills = set(skill.lower() for skill in skills)

    return list(unique_skills)[:top_n]  # Return top N skills

# <END> EXTRACT SKILLS

> **Instructor Cue:** Walk through this function step by step. Explain the prompt engineering - how we give specific instructions to get better results. Point out the error handling and fallback mechanism. This is real-world code!

## Testing the Skill Extraction Tool

Let's test our skill extraction with the sample resume:

In [None]:
from pathlib import Path
import nest_asyncio

nest_asyncio.apply()


def test_skill_extraction():
    """Test function to verify our skill extraction works"""

    sample_resume = Path("data/sample_resume.txt").read_text()

    print("Sample Resume Text:")
    print("=" * 50)
    print(sample_resume[:300] + "..." if len(sample_resume) > 300 else sample_resume)
    print("\n" + "=" * 50)

    # Extract skills
    print("Extracting skills...")
    skills = extract_skills_from_resume(sample_resume, 10)

    print(f"\nExtracted Skills ({len(skills)}):")
    for i, skill in enumerate(skills, 1):
        print(f"{i}. {skill}")


# test_skill_extraction()

> **Instructor Cue:** Run this test function live to demonstrate the skill extraction. If API calls fail, use this as a teaching moment about error handling and fallback mechanisms.

## Tool 2: Job Matching Algorithm

Our second tool will score job postings against extracted skills to find the best matches.

In [None]:
%%writefile -a app/tools.py

# <START> JOB MATCHING

def find_best_job_match(skills: list[str], jobs_df: pd.DataFrame) -> dict[str, Any]:
    """
    Find the best matching job based on extracted skills.

    Args:
        skills (list[str]): List of skills extracted from resume
        jobs_df (pd.DataFrame): DataFrame containing job postings

    Returns:
        dict[str, Any]: Information about the best matching job
    """

    if jobs_df.empty or not skills:
        return {
            "error": "No jobs available or no skills provided",
            "job_title": "No match found",
            "company_name": "N/A",
            "location": "N/A",
            "match_score": 0,
            "matched_skills": []
        }

    # Note: skills are already lowercase from extract_skills_from_resume

    # Calculate match scores for each job
    job_scores = []

    for index, job in jobs_df.iterrows():
        score_info = calculate_job_score(job, skills)
        job_scores.append({
            'index': index,
            'score': score_info['score'],
            'matched_skills': score_info['matched_skills'],
            'job_data': job
        })

    # Sort by score (highest first)
    job_scores.sort(key=lambda x: x['score'], reverse=True)

    if not job_scores or job_scores[0]['score'] == 0:
        return {
            "error": "No matching jobs found",
            "job_title": "No suitable matches",
            "company_name": "N/A",
            "location": "N/A",
            "match_score": 0,
            "matched_skills": []
        }

    # Get the best match
    best_match = job_scores[0]
    job_data = best_match['job_data']

    return {
        "job_title": job_data['job_title'],
        "company_name": job_data['company_name'],
        "location": job_data['location'],
        "salary": job_data.get('salary', 'Not specified'),
        "job_description": job_data.get('job_description', '')[:300] + "...",
        "match_score": best_match['score'],
        "matched_skills": best_match['matched_skills'],
        "total_jobs_analyzed": len(job_scores)
    }

def calculate_job_score(job: pd.Series, skills: list[str]) -> dict[str, Any]:
    """
    Calculate how well a job matches the given skills.

    Args:
        job (pd.Series): A single job posting
        skills (list[str]): Skills (already in lowercase)

    Returns:
        dict[str, Any]: Score and matched skills information
    """

    # Get job text for analysis (combine title and description)
    job_text = ""
    if pd.notna(job.get('job_title')):
        job_text += job['job_title'].lower() + " "

    if pd.notna(job.get('job_description')):
        job_text += job['job_description'].lower() + " "

    if not job_text.strip():
        return {"score": 0, "matched_skills": []}

    matched_skills = []
    score = 0

    # Check each skill against job text
    for skill in skills:
        if skill in job_text:
            matched_skills.append(skill)

            # Weight skills differently based on importance
            if len(skill) <= 3:  # Short skills like "SQL", "AWS"
                score += 2
            elif skill in ['python', 'javascript', 'java', 'react', 'django']:
                score += 3  # High-value skills
            else:
                score += 1  # Standard match

    # Bonus for multiple skill matches
    if len(matched_skills) >= 3:
        score += 2
    if len(matched_skills) >= 5:
        score += 3

    return {
        "score": score,
        "matched_skills": matched_skills
    }

# <END> JOB MATCHING

> **Instructor Cue:** Explain the scoring algorithm. Why do we weight some skills more heavily? How does the bonus system work? This teaches algorithmic thinking and business logic implementation.

## Testing the Complete Tool Chain

Let's create a comprehensive test that shows both tools working together:

In [None]:
def test_complete_workflow():
    """Test the complete resume-to-job matching workflow"""

    print("=== TESTING COMPLETE AI AGENT WORKFLOW ===\n")

    # Step 1: Load sample resume
    try:
        with open("data/sample_resume.txt", "r") as file:
            resume_text = file.read()
        print("✅ Sample resume loaded successfully")
    except FileNotFoundError:
        print("❌ Sample resume file not found")
        return

    # Step 2: Load job data
    try:
        jobs_df = pd.read_csv("data/indeed_jobs_combined.csv")
        print(f"✅ Job database loaded: {len(jobs_df)} jobs available")
    except FileNotFoundError:
        print("❌ Job data file not found")
        return

    # Step 3: Extract skills from resume
    print("\n--- STEP 1: SKILL EXTRACTION ---")
    skills = extract_skills_from_resume(resume_text)
    print(f"Extracted {len(skills)} skills:")
    for i, skill in enumerate(skills, 1):
        print(f"  {i}. {skill}")

    # Step 4: Find best job match
    print("\n--- STEP 2: JOB MATCHING ---")
    best_match = find_best_job_match(skills, jobs_df)

    if "error" in best_match:
        print(f"❌ {best_match['error']}")
        return

    # Step 5: Display results
    print("\n--- FINAL RESULTS ---")
    print("🎯 Best Job Match Found!")
    print(f"📋 Position: {best_match['job_title']}")
    print(f"🏢 Company: {best_match['company_name']}")
    print(f"📍 Location: {best_match['location']}")
    print(f"💰 Salary: {best_match['salary']}")
    print(f"⭐ Match Score: {best_match['match_score']}")
    print(f"🔍 Jobs Analyzed: {best_match['total_jobs_analyzed']}")

    print(f"\n🎯 Matched Skills ({len(best_match['matched_skills'])}):")
    for skill in best_match["matched_skills"]:
        print(f"  ✓ {skill}")

    print("\n📝 Job Description Preview:")
    print(best_match["job_description"])

    return best_match


# Uncomment to test the complete workflow:
# test_complete_workflow()

> **Instructor Cue:** Run this test live if possible. If not, walk through what each step would produce. This shows the complete workflow from input to output.

## Enhancing Our Tools

Let's add some improvements to make our tools more robust and useful:

In [None]:
%%writefile -a app/tools.py

# <START> ADDITIONAL TOOLS

def get_skill_statistics(skills: list[str], jobs_df: pd.DataFrame) -> dict[str, Any]:
    """
    Analyze how common each skill is in the job market.

    Args:
        skills (list[str]): List of skills to analyze
        jobs_df (pd.DataFrame): Job postings database

    Returns:
        dict[str, Any]: Statistics about skill demand
    """

    # Combine all job text for analysis
    all_job_text = ""
    for _, job in jobs_df.iterrows():
        if pd.notna(job.get('job_description')):
            all_job_text += job['job_description'].lower() + " "
        if pd.notna(job.get('job_title')):
            all_job_text += job['job_title'].lower() + " "

    skill_stats = {}
    total_jobs = len(jobs_df)

    for skill in skills:
        skill_lower = skill.lower()

        # Count jobs mentioning this skill
        job_count = 0
        for _, job in jobs_df.iterrows():
            job_text = ""
            if pd.notna(job.get('job_description')):
                job_text += job['job_description'].lower() + " "
            if pd.notna(job.get('job_title')):
                job_text += job['job_title'].lower() + " "

            if skill_lower in job_text:
                job_count += 1

        skill_stats[skill] = {
            'jobs_mentioning': job_count,
            'percentage': round((job_count / total_jobs) * 100, 1) if total_jobs > 0 else 0,
            'demand_level': get_demand_level(job_count, total_jobs)
        }

    return skill_stats

def get_demand_level(job_count: int, total_jobs: int) -> str:
    """Categorize skill demand level"""
    if total_jobs == 0:
        return "Unknown"

    percentage = (job_count / total_jobs) * 100

    if percentage >= 50:
        return "Very High"
    elif percentage >= 25:
        return "High"
    elif percentage >= 10:
        return "Medium"
    elif percentage >= 5:
        return "Low"
    else:
        return "Very Low"

def find_alternative_matches(skills: list[str], jobs_df: pd.DataFrame, top_n: int = 5) -> list[dict[str, Any]]:
    """
    Find multiple good job matches, not just the best one.

    Args:
        skills (list[str]): List of skills from resume
        jobs_df (pd.DataFrame): Job postings database
        top_n (int): Number of top matches to return

    Returns:
        list[dict[str, Any]]: Top N matching jobs
    """

    if jobs_df.empty or not skills:
        return []

    skills_lower = [skill.lower() for skill in skills]
    job_scores = []

    # Calculate scores for all jobs
    for index, job in jobs_df.iterrows():
        score_info = calculate_job_score(job, skills_lower)

        if score_info['score'] > 0:  # Only include jobs with some match
            job_scores.append({
                'job_title': job['job_title'],
                'company_name': job['company_name'],
                'location': job['location'],
                'salary': job.get('salary', 'Not specified'),
                'match_score': score_info['score'],
                'matched_skills': score_info['matched_skills']
            })

    # Sort and return top N
    job_scores.sort(key=lambda x: x['match_score'], reverse=True)
    return job_scores[:top_n]

# <END> ADDITIONAL TOOLS

> **Instructor Cue:** Explain how these enhancement functions provide more insight. The skill statistics help understand market demand, and alternative matches give users more options.

## Error Handling and Edge Cases

Good tools must handle various edge cases gracefully:

In [None]:
%%writefile -a app/tools.py

# <START> Error Handling

def validate_resume_text(resume_text: str) -> dict[str, Any]:
    """
    Validate that resume text is suitable for processing.

    Args:
        resume_text (str): Resume text to validate

    Returns:
        dict[str, Any]: Validation results and suggestions
    """

    issues = []
    warnings = []

    # Check basic requirements
    if not resume_text or not resume_text.strip():
        issues.append("Resume text is empty")
        return {"valid": False, "issues": issues, "warnings": warnings}

    # Check length
    word_count = len(resume_text.split())
    if word_count < 50:
        warnings.append("Resume seems quite short - consider adding more detail")
    elif word_count > 2000:
        warnings.append("Resume is very long - extraction might focus on early sections")

    # Check for common sections
    resume_lower = resume_text.lower()
    expected_sections = ['experience', 'skills', 'education', 'work']
    found_sections = [section for section in expected_sections if section in resume_lower]

    if len(found_sections) < 2:
        warnings.append("Resume might be missing common sections (experience, skills, education)")

    # Check for technical content
    tech_indicators = ['python', 'javascript', 'programming', 'software', 'development', 'technical']
    tech_mentions = sum(1 for indicator in tech_indicators if indicator in resume_lower)

    if tech_mentions == 0:
        warnings.append("No technical skills detected - results may be limited")

    return {
        "valid": True,
        "word_count": word_count,
        "sections_found": found_sections,
        "tech_indicators": tech_mentions,
        "issues": issues,
        "warnings": warnings
    }

def safe_extract_skills(resume_text: str) -> dict[str, Any]:
    """
    Safely extract skills with comprehensive error handling.

    Args:
        resume_text (str): Resume text to process

    Returns:
        dict[str, Any]: Results including skills and any issues
    """

    # Validate input
    validation = validate_resume_text(resume_text)

    if not validation["valid"]:
        return {
            "success": False,
            "skills": [],
            "issues": validation["issues"],
            "warnings": validation["warnings"]
        }

    try:
        # Attempt skill extraction
        skills = extract_skills_from_resume(resume_text)

        return {
            "success": True,
            "skills": skills,
            "skill_count": len(skills),
            "word_count": validation["word_count"],
            "issues": validation["issues"],
            "warnings": validation["warnings"]
        }

    except Exception as e:
        return {
            "success": False,
            "skills": [],
            "error": str(e),
            "issues": validation["issues"] + [f"Processing error: {str(e)}"],
            "warnings": validation["warnings"]
        }

# <END> ADDITIONAL TOOLS

> **Instructor Cue:** Emphasize that production code needs robust error handling. These functions make our tools more reliable and provide helpful feedback to users.

## Exercise: Test and Customize Your Tools

> **Instructor Cue:** Give participants 15 minutes to test and experiment with the tools:

Try these exercises to explore your tools:

1. **Test with Different Resume Content:**
   - Modify the sample resume
   - Try a resume focused on a different field
   - Test with very short or very long text

2. **Experiment with Scoring:**
   - Adjust the scoring weights in `calculate_job_score()`
   - Add bonus points for specific skills
   - Try different matching strategies

3. **Analyze Skill Demand:**
   - Use `get_skill_statistics()` to see which skills are most in demand
   - Compare different skill sets

4. **Test Edge Cases:**
   - Empty resume text
   - Resume with no technical skills
   - Very technical resume

## Key Concepts Learned

> **Instructor Cue:** Summarize the important concepts from this chapter:

1. **Tool-Based Problem Solving**: Breaking complex tasks into specialized functions
2. **AI Integration**: Using language models for text analysis and extraction
3. **Scoring Algorithms**: Quantifying matches between different data sets
4. **Error Handling**: Building robust tools that handle edge cases gracefully
5. **Validation**: Ensuring input data meets quality requirements

## Best Practices for AI Agent Tools

> **Instructor Cue:** Share these guidelines for building effective AI tools:

1. **Single Responsibility**: Each tool should do one thing well
2. **Clear Interfaces**: Well-defined inputs and outputs
3. **Error Resilience**: Graceful handling of failures with fallback options
4. **Testability**: Easy to test with different inputs
5. **Documentation**: Clear explanations of what each tool does

## Troubleshooting Common Issues

> **Instructor Cue:** Keep this section handy for addressing problems:

**API Connection Issues:**
- Check Google API key is set correctly
- Verify internet connection
- Try the fallback skill extraction

**Skill Extraction Problems:**
- Resume text might be too short or unstructured
- AI model might be overloaded (use fallback)
- Check for special characters or encoding issues

**Job Matching Issues:**
- Verify job data is loaded correctly
- Check that skills list is not empty
- Ensure job descriptions contain meaningful text

**Performance Issues:**
- Large datasets might be slow to process
- Consider sampling jobs for faster testing
- Cache results when possible

## Next Steps

> **Instructor Cue:** Build excitement for the final chapter:

In the next chapter, we'll integrate these tools into a beautiful Streamlit interface that brings everything together:

- **Interactive Resume Input**: Users can paste their resume directly
- **Real-time Skill Extraction**: See skills extracted instantly
- **Dynamic Job Matching**: Find matches with visual feedback
- **Professional Results Display**: Beautiful presentation of results
- **Enhanced Features**: Skill demand analysis and alternative matches

Our tools are the engine - next we'll build the user interface that makes them accessible to everyone!

> **Instructor Cue:** Remind students to save their complete tools.py file - they'll need it for the next chapter!