# 🚀 HR Resume Search using Assistant API File Search

This notebook demonstrates how to use the OpenAI Assistant API's File Search capability
to analyze resumes more effectively. The implementation follows best practices for
file handling and annotation support.

## Prerequisites

- Azure OpenAI resource with Assistant API access
- Semantic Kernel v1.16+
- Python 3.8+

## Setup

First, let's install the required packages:


In [None]:
!pip install semantic-kernel==1.16.0 python-dotenv aiofiles nest_asyncio azure-search-documents


## Import Dependencies and Configure Environment
 
Let's import all necessary libraries and set up our environment.

In [25]:
import asyncio
import os
from pathlib import Path
from typing import List
from dataclasses import dataclass

from semantic_kernel.kernel import Kernel
from semantic_kernel.agents.open_ai.azure_assistant_agent import AzureAssistantAgent
from semantic_kernel.contents.chat_message_content import ChatMessageContent
from semantic_kernel.contents.streaming_annotation_content import StreamingAnnotationContent
from semantic_kernel.contents.utils.author_role import AuthorRole
from dotenv import load_dotenv

# Enable notebook async support
import nest_asyncio
nest_asyncio.apply()

## Configuration Management

We'll create a configuration class to manage our Azure OpenAI settings.

In [34]:
import os
from dataclasses import dataclass
from dotenv import load_dotenv


@dataclass
class AzureConfig:
    """Configuration for Azure OpenAI Assistant."""

    api_key: str
    endpoint: str
    deployment_name: str
    api_version: str = "2024-10-01-preview"

    @classmethod
    def from_env(cls) -> "AzureConfig":
        """Load configuration from environment variables."""
        load_dotenv()

        # Get environment variables
        api_key = os.getenv("AZURE_OPENAI_API_KEY")
        endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
        deployment_name = os.getenv("AZURE_OPENAI_CHAT_COMPLETION_DEPLOYED_MODEL_NAME")

        if not all([api_key, endpoint, deployment_name]):
            missing = [
                var
                for var, val in {
                    "AZURE_OPENAI_API_KEY": api_key,
                    "AZURE_OPENAI_ENDPOINT": endpoint,
                    "AZURE_OPENAI_CHAT_COMPLETION_DEPLOYED_MODEL_NAME": deployment_name,
                }.items()
                if not val
            ]
            raise ValueError(
                f"Missing required environment variables: {', '.join(missing)}"
            )

        # Clean up endpoint URL
        if endpoint:
            endpoint = endpoint.rstrip("/")

        return cls(
            api_key=api_key,
            endpoint=endpoint,
            deployment_name=deployment_name,
            api_version="2024-10-01-preview",
        )

    """Configuration for Azure OpenAI Assistant."""
    api_key: str
    endpoint: str
    deployment_name: str
    api_version: str = "2024-10-01-preview"

    @classmethod
    def from_env(cls) -> "AzureConfig":
        """Load configuration from environment variables."""
        load_dotenv()

        # Get environment variables
        api_key = os.getenv("AZURE_OPENAI_API_KEY")
        endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
        deployment_name = os.getenv("AZURE_OPENAI_CHAT_COMPLETION_DEPLOYED_MODEL_NAME")
        api_version = os.getenv("AZURE_OPENAI_API_VERSION", "2024-10-01-preview")

        # Validate required variables
        if not all([api_key, endpoint, deployment_name]):
            missing = [
                var
                for var, val in {
                    "AZURE_OPENAI_API_KEY": api_key,
                    "AZURE_OPENAI_ENDPOINT": endpoint,
                    "AZURE_OPENAI_CHAT_COMPLETION_DEPLOYED_MODEL_NAME": deployment_name,
                }.items()
                if not val
            ]
            raise ValueError(
                f"Missing required environment variables: {', '.join(missing)}"
            )

        return cls(
            api_key=api_key,
            endpoint=endpoint,
            deployment_name=deployment_name,
            api_version=api_version,
        )

 ## File Management
 
 Set up utilities to manage resume files and their paths.

In [35]:
def get_filepath_for_filename(filename: str) -> str:
    """Get the full path for a given filename."""
    base_directory = Path.cwd() / "resumes"
    base_directory.mkdir(exist_ok=True)
    return str(base_directory / filename)


# Sample resume data
sample_resumes = {
    "john_doe.txt": """
    John Doe
    Senior Software Engineer

    Experience:
    - Lead Developer at TechCorp (2019-Present)
      * Led team of 5 developers on cloud migration project
      * Implemented MLOps pipeline reducing deployment time by 60%
      * Mentored junior developers and conducted code reviews
    
    - Senior Software Engineer at InnovSoft (2016-2019)
      * Developed machine learning models for predictive maintenance
      * Architected microservices infrastructure using Kubernetes
      * Improved system performance by 40%

    Skills:
    - Programming: Python, Java, Go
    - Cloud & DevOps: Kubernetes, Docker, AWS
    - Machine Learning: TensorFlow, PyTorch, MLOps
    - Leadership: Team Management, Mentoring
    
    Education:
    - M.S. Computer Science, Tech University (2016)
    - B.S. Computer Science, State University (2014)
    """,
    
    "jane_smith.txt": """
    Jane Smith
    AI Research Engineer

    Experience:
    - AI Research Lead at DataMinds (2020-Present)
      * Published 3 papers on NLP architectures
      * Developed novel attention mechanism improving accuracy by 25%
      * Led research team of 3 PhD candidates
    
    - ML Engineer at AITech (2018-2020)
      * Implemented computer vision models for autonomous systems
      * Reduced model inference time by 35%
      * Collaborated with cross-functional teams

    Skills:
    - Deep Learning: PyTorch, TensorFlow
    - NLP: Transformers, BERT, GPT
    - Research: Paper Writing, Experimentation
    - Languages: Python, C++
    
    Education:
    - PhD in Machine Learning, Tech Institute (2020)
    - M.S. AI, Data University (2018)
    """
}


# Save resumes to files
resume_files = []
for filename, content in sample_resumes.items():
    filepath = get_filepath_for_filename(filename)
    Path(filepath).write_text(content, encoding="utf-8")
    resume_files.append(filename)

 ## Initialize Assistant
 
 Create an Azure Assistant Agent with file search capability enabled.

In [36]:
async def create_hr_assistant() -> AzureAssistantAgent:
    """Create and configure the HR Assistant."""
    try:
        config = AzureConfig.from_env()
        kernel = Kernel()

        print(f"Initializing Assistant with:")
        print(f"- Deployment: {config.deployment_name}")
        print(f"- Endpoint: {config.endpoint}")
        print(f"- API Version: {config.api_version}")

        # Get file paths for resumes
        resume_paths = [get_filepath_for_filename(f) for f in resume_files]
        print(f"- Resume files: {resume_paths}")

        # Create the assistant with required configuration
        agent = await AzureAssistantAgent.create(
            kernel=kernel,
            deployment_name=config.deployment_name,
            endpoint=config.endpoint,
            api_key=config.api_key,
            api_version=config.api_version,
            name="HR_Resume_Analyzer",
            instructions="""
            You are an expert HR assistant specialized in analyzing resumes and providing 
            detailed candidate evaluations.
            
            Guidelines:
            1. Always analyze the resumes in the document store for your answers
            2. Provide specific evidence and quotes from the resumes
            3. Format responses using markdown for better readability
            4. Compare candidates objectively based on their documented experience
            5. Highlight quantifiable achievements and metrics
            6. Include relevant education and certification details
            """,
            enable_file_search=True,
            vector_store_filenames=resume_paths,
            ai_model_id=config.deployment_name,  # Required parameter
            metadata={
                "type": "hr_assistant",
                "version": "1.0",
                "capabilities": "resume_analysis,candidate_comparison",
            },
            temperature=0.7,
            top_p=0.95,
        )

        return agent

    except Exception as e:
        print(f"Error creating assistant: {str(e)}")
        print(f"Full exception details: {type(e).__name__}: {str(e)}")
        raise

 ## Query Interface
 
 Create an interface to interact with the assistant and handle responses with citations.

In [37]:
async def analyze_resumes():
    """Main function to interact with the HR Assistant."""
    print("Initializing HR Assistant...")
    agent = await create_hr_assistant()
    
    print("Creating conversation thread...")
    thread_id = await agent.create_thread()
    
    try:
        while True:
            user_input = input("\nEnter your question (or 'exit' to quit): ")
            if not user_input or user_input.lower() == "exit":
                break
            
            await agent.add_chat_message(
                thread_id=thread_id,
                message=ChatMessageContent(
                    role=AuthorRole.USER,
                    content=user_input
                )
            )
            
            print("\nAnalyzing resumes...\n")
            footnotes: List[StreamingAnnotationContent] = []
            
            async for response in agent.invoke_stream(thread_id=thread_id):
                footnotes.extend([
                    item for item in response.items 
                    if isinstance(item, StreamingAnnotationContent)
                ])
                print(response.content, end="", flush=True)
            
            if footnotes:
                print("\n\nCitations:")
                for note in footnotes:
                    print(f"\n• From {note.file_id}:")
                    print(f'  "{note.quote}"')
    
    finally:
        print("\nCleaning up resources...")
        if agent:
            for file_id in agent.file_search_file_ids:
                await agent.delete_file(file_id)
            await agent.delete_thread(thread_id)
            await agent.delete()

## Run the Analysis

Execute the resume analysis interface to evaluate candidates. Each cell can be run multiple times for new analysis sessions.

### Key Demo Questions:
1. "Create a technical competency matrix for both candidates focusing on: AI/ML expertise, cloud infrastructure, and leadership. Rate as Basic/Intermediate/Advanced with evidence."
2. "What are the most recent and relevant projects each candidate has led? Include team sizes and outcomes."
3. "Compare both candidates' experience with ML systems deployment and provide evidence of successful implementations."
4. "Create a final recommendation table with: top 3 strengths, growth areas, and risk factors for each candidate."

# Additional Questions by Category (for reference):
'''
Initial Screening:
- "Summarize each candidate's minimum requirements compliance and flag any gaps."
- "Extract all quantifiable achievements (percentages, team sizes, metrics) from both resumes."
- "Compare their educational backgrounds and relevant certifications."

Technical Deep-Dive:
- "What evidence exists of performance optimization? Include specific metrics."
- "Detail their experience with modern AI/ML tools and frameworks."
- "Compare their cloud and infrastructure experience with specific examples."

Leadership Assessment:
- "Analyze their progression into leadership roles and project complexities."
- "What examples show stakeholder management and cross-functional collaboration?"
- "Compare their mentoring experience and team development outcomes."

Final Evaluation:
- "Calculate a job fit score (1-10) across technical skills, leadership, and innovation."
- "Which candidate shows stronger evidence of scaling AI systems in production?"
- "Based on the role requirements, who would you shortlist? Provide supporting evidence."
'''

In [40]:
await analyze_resumes()

Initializing HR Assistant...
Initializing Assistant with:
- Deployment: gpt-4o-mini
- Endpoint: https://fsunavala-openai-eus.openai.azure.com/
- API Version: 2024-10-01-preview
- Resume files: ['c:\\Dev\\azure-ai-search-python-playground\\resumes\\john_doe.txt', 'c:\\Dev\\azure-ai-search-python-playground\\resumes\\jane_smith.txt']
Creating conversation thread...

Analyzing resumes...

Here's a technical competency matrix for both candidates focusing on their expertise in AI/ML, cloud infrastructure, and leadership. Each competency is rated as Basic, Intermediate, or Advanced based on the evidence from their resumes.

### Technical Competency Matrix

| Competency          | Jane Smith                | John Doe                 |
|---------------------|---------------------------|--------------------------|
| **AI/ML Expertise** | **Advanced**              | **Intermediate**         |
|                     | - Lead AI Research at DataMinds, published 3 papers on NLP architectures.  | - D