# Lab: AI-Powered Resume Matcher (BASIC VERSION)

### **Objective**
This lab demonstrates the "shallow" approach to resume matching using only **Keywords**. 

In recruitment tech, this is how many older ATS (Applicant Tracking Systems) work. It looks for exact word matches between a job description and a resume.

### **Educational Goal**
See how a high-quality resume can receive a **low score** simply because it uses different terminology than the job posting.

## Step 0: Setup

We need to import our libraries and load our IBM Watson NLU credentials.

In [None]:
import os
import pandas as pd
import matplotlib.pyplot as plt
from dotenv import load_dotenv
from ibm_watson import NaturalLanguageUnderstandingV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
from ibm_watson.natural_language_understanding_v1 import Features, KeywordsOptions
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import fitz  # PyMuPDF

# Load credentials
load_dotenv()
iam_key = os.getenv("IAM_KEY")
service_url = os.getenv("SERVICE_URL")

authenticator = IAMAuthenticator(iam_key)
nlu = NaturalLanguageUnderstandingV1(version='2022-04-07', authenticator=authenticator)
nlu.set_service_url(service_url)

print("Setup complete!")

## Step 1: Industry Keywords (Job Analysis)

We will extract technical keywords from the job description file.

In [None]:
JOB_FILE = "docs/job-5653.txt"

with open(JOB_FILE, 'r') as f:
    job_text = f.read()

job_response = nlu.analyze(
    text=job_text,
    features=Features(keywords=KeywordsOptions(limit=20))
).get_result()

job_keywords = [kw['text'] for kw in job_response.get('keywords', [])]
print(f"Extracted {len(job_keywords)} keywords from the Job Description.")
print("Top Job Keywords:", job_keywords[:10])

## Step 2: Resume Keywords

Now we extract keywords from the resume PDF.

In [None]:
RESUME_PATH = "docs/PortillaCV-08-2023.pdf"

def extract_text(path):
    text = ""
    with fitz.open(path) as doc:
        for page in doc: text += page.get_text()
    return text

resume_text = extract_text(RESUME_PATH)
resume_response = nlu.analyze(
    text=resume_text,
    features=Features(keywords=KeywordsOptions(limit=20))
).get_result()

resume_keywords = [kw['text'] for kw in resume_response.get('keywords', [])]
print(f"Extracted {len(resume_keywords)} keywords from the Resume.")
print("Top Resume Keywords:", resume_keywords[:10])

## Step 3: The Shallow Match Result

We use TF-IDF to calculate how many words overlap.

In [None]:
def get_score(kw1, kw2):
    vectorizer = TfidfVectorizer()
    matrix = vectorizer.fit_transform([' '.join(kw1), ' '.join(kw2)])
    return round(cosine_similarity(matrix[0], matrix[1])[0][0] * 100, 2)

match_score = get_score(job_keywords, resume_keywords)

print(f"\nFINAL BASIC MATCH SCORE: {match_score}%")

plt.figure(figsize=(8, 2))
plt.barh(['Keyword Overlap'], [match_score], color='salmon')
plt.xlim(0, 100)
plt.title(f'Basic Keyword Match: {match_score}%')
plt.show()

### **Discussion Questions**
1. Is this score lower than you expected? Why?
2. Look at the Keyword lists. Are there synonyms (e.g., "Leader" vs "Manager") that the computer missed because they aren't *exact* matches?
3. **What's Missing?** Now, open the **Advanced Lab** to see how Semantic Intelligence solves this problem.