# AI Builders Week 3 Homework
## Resume matcher

In [128]:
import pymupdf
import requests

response = requests.get('https://miguel-vila.github.io/resume_EN.pdf')
response.raise_for_status()
pdf_bytes = response.content
resume = pymupdf.open(stream=pdf_bytes, filetype='pdf')

In [129]:
print("Number of pages:", len(resume))
resume_str = '\n'.join([page.get_text() for page in resume])
resume_str

Number of pages: 3


'MIGUEL VIL´A GONZ´ALEZ\nmiguel-vila @ github\nPROFESSIONAL EXPERIENCE\nSenior Software Engineer\nRemote - US East Coast Timezone\nSiriusXM\nNovember 2022 - Present\n· Member of the API tooling team, part of the platform services enablement organization.\n· Develop tooling using Smithy, an AWS DDL language, empowering teams to describe, implement, and consume\ntheir services.\n· Key projects include:\n· Developed a system for describing which Smithy services are implemented and consumed by applications,\nenabling package management and dependency tracing across the platform.\n· Implemented compatibility checks to verify service changes against potential data processing breakages.\n· Contributed to a team implementing semantic search using vector embeddings stored in OpenSearch.\n· Set up infrastructure to ingest catalog data and generate embeddings using different models.\n· Set up a client to query the different indexes using the embeddings.\nLead Software Engineer ←Senior Software En

In [130]:
import os
from openai import OpenAI

openAI = OpenAI()

job_posting_format="""
# Title

{{job title}}

## Description

{{general job description}}

## Responsibilities

{{responsibilities}}

## Qualifications

{{qualifications}}

## Bonus qualifications

{{bonus qualifications}}
"""

fake_job_posting_response = openAI.responses.create(
    model="gpt-4o",
    input=
        f"""The following is a resume for a specific person.
        From that resume create a job listing that would fit that person.
        Don't "overfit": use the described experience but don't reproduce the CV.
        You don't have to mention specific companies.
        Only mention the most commonly mentioned technologies or the ones that appear first in lists.
        Output in the following format:
        {job_posting_format}:
        {resume_str}""",
)

In [131]:
from IPython.display import display, Markdown

fake_job_posting = fake_job_posting_response.output[0].content[0].text
display(Markdown(fake_job_posting))

# Title

Senior Software Engineer - API Tooling and Cloud Services

## Description

Seeking a dynamic Senior Software Engineer to lead and innovate in API tooling and platform services. This role focuses on developing cutting-edge tooling using AWS DDL languages to enhance service implementation and consumption across the organization. You will play a critical role in system architecture, performance optimization, and cross-functional collaboration.

## Responsibilities

- Develop and maintain API tooling, ensuring seamless integration and service consumption.
- Lead projects involving semantic search and catalog data management using OpenSearch.
- Implement compatibility checks to ensure data processing stability.
- Collaborate with engineering teams to design and enhance subscription management systems.
- Mentor team members and participate in technical decision-making.
- Contribute to the development of user authentication and account management services.
- Engage in open-source projects, driving improvements and feature development.

## Qualifications

- Expertise in Scala, Java, and AWS services.
- Proven experience with cloud infrastructure, particularly in AWS.
- Strong background in developing and maintaining enterprise-level services.
- Excellent communication and collaboration skills for cross-functional teamwork.
- Ability to decompose projects into actionable tasks and deliverables.
- Experience in designing API components and improving data quality.

## Bonus Qualifications

- Experience with Smithy, semantic search, and vector embeddings.
- Exposure to financial and subscription service platforms.
- Open-source software contributions and experience with technical proposals.
- Knowledge of funding transactions and corporate finance principles.

In [132]:
import pandas as pd

jobs = pd.read_csv('jobs.csv')

jobs.head()

Unnamed: 0,id,site,job_url,job_url_direct,title,company,location,date_posted,job_type,salary_source,...,company_addresses,company_num_employees,company_revenue,company_description,skills,experience_range,company_rating,company_reviews_count,vacancy_count,work_from_home_type
0,in-764d02c34833d113,indeed,https://www.indeed.com/viewjob?jk=764d02c34833...,https://jobs.ashbyhq.com/yumaai/cdc8c768-e82b-...,AI Product Focused - Senior Fullstack / Rails ...,Yuma AI,"Boston, MA, US",2025-09-02,fulltime,direct_data,...,,,,,,,,,,
1,in-82e200b3b54e6886,indeed,https://www.indeed.com/viewjob?jk=82e200b3b54e...,https://jobs.ashbyhq.com/gallatin/b7ed9bbb-496...,DevOps Engineer,Gallatin,"El Segundo, CA, US",2025-09-02,fulltime,direct_data,...,,,,,,,,,,
2,in-c9747635cab5e167,indeed,https://www.indeed.com/viewjob?jk=c9747635cab5...,https://grnh.se/mwr1jefn2us,"Staff Software Engineer, Growth Products",Lyft,"San Francisco, CA, US",2025-09-02,,direct_data,...,,,,Multiply your earnings when you drive with Lyf...,,,,,,
3,in-07678b0399d77ed6,indeed,https://www.indeed.com/viewjob?jk=07678b0399d7...,https://grnh.se/e8e4gmqg2us,"Staff Software Engineer, Growth Products",Lyft,"New York, NY, US",2025-09-02,,direct_data,...,,,,Multiply your earnings when you drive with Lyf...,,,,,,
4,in-f89ce04ae801a7c3,indeed,https://www.indeed.com/viewjob?jk=f89ce04ae801...,https://grnh.se/nxoofj1z2us,"Staff Software Engineer, Growth Products",Lyft,"Seattle, WA, US",2025-09-02,,direct_data,...,,,,Multiply your earnings when you drive with Lyf...,,,,,,


In [133]:
from sentence_transformers import SentenceTransformer
import torch
import numpy as np
import json
from math import ceil
import tiktoken
import os

def normalize_description_prompt(description: str) -> str:
    return f"""
        Take the following job posting description and format it in the following way:
        {job_posting_format}
        Apply it to the following job description:
        {description}
    """
    
gpt_model = "gpt-4o-mini"

def normalize_description(description: str) -> str:
    return openAI.responses.create(
        model=gpt_model,
        input=normalize_description_prompt(description)
    ).output[0].content[0].text

# takes a long time so we save it
if os.path.exists('jobs_normalized.csv'):
    jobs = pd.read_csv('jobs_normalized.csv')
else:
    jobs['normalized_description'] = jobs['description'].apply(normalize_description)
    jobs.to_csv("jobs_normalized.csv", index=False, quoting=csv.QUOTE_NONNUMERIC)

In [137]:
display(Markdown(jobs['normalized_description'].head()[0]))

# Title

Senior Backend Engineer

## Description

Yuma AI is developing a cutting-edge orchestration platform that deploys autonomous AI agents for customer support in e-commerce. As a leader in this space, we provide an advanced solution that automates over 80% of support tickets for our top merchants. Our team, led by experienced founder Guillaume Luccisano, is dedicated to delivering exceptional value through innovative AI technology.

## Responsibilities

- Build and maintain large-scale Rails applications.
- Work on product iterations, A/B testing, and LLM research.
- Develop new features and optimize performance.
- Collaborate with a small team of engineers to deliver high-quality product outcomes.
- Ingest and process millions of support tickets and e-commerce data points.
- Embrace a fast-paced startup environment and foster a culture of excellence.

## Qualifications

- Proven experience in building and leading large Rails applications.
- Strong understanding of backend technologies including Ruby on Rails, PostgreSQL, Redis, and Sidekiq.
- Familiarity with frontend technologies such as Next.js and React.
- Recent hands-on experience with LLMs and AI integration in daily tasks.
- Ability to balance speed of iteration with high product quality.

## Bonus qualifications

- Experience managing non-deterministic output when working with LLMs.
- A passion for continuous improvement and delivering value to users.
- Willingness to take ownership of projects and embrace new challenges.

In [138]:
MODEL_NAME = "sentence-transformers/all-mpnet-base-v2"

device = "cuda" if torch.cuda.is_available() else "cpu"
model = SentenceTransformer(MODEL_NAME, device=device)

embeddings = model.encode(
    jobs['normalized_description'],
    batch_size=32,
    convert_to_tensor=True,
    normalize_embeddings=True,
    show_progress_bar=False
)

jobs['embedding'] = [emb.detach().cpu().numpy() for emb in embeddings]

In [139]:
import torch.nn.functional as F

resume_embedding = model.encode(
    resume_str,
    batch_size=32,
    convert_to_tensor=True,
    normalize_embeddings=True,
    show_progress_bar=False
)

fake_job_posting_embedding = model.encode(
    fake_job_posting,
    batch_size=32,
    convert_to_tensor=True,
    normalize_embeddings=True,
    show_progress_bar=False
)

def get_cos_sims(target_embedding):
    cos_sims = [
        F.cosine_similarity(
            torch.tensor(embedding, dtype=torch.float32).unsqueeze(0),
            target_embedding.unsqueeze(0)
        ) for embedding in jobs['embedding']
    ]
    return [cos_sim[0].detach().cpu().numpy() for cos_sim in cos_sims]

jobs['cos_sim_resume'] = get_cos_sims(resume_embedding)
jobs['cos_sim_fake_job_posting'] = get_cos_sims(fake_job_posting_embedding)

def sort_by_similarity_and_log(column_name):
    sorted_jobs = jobs.sort_values(column_name, ascending=False)
    most_similar = sorted_jobs.head()
    # print(most_similar)
    # print(most_similar.shape())
    for closest in most_similar.itertuples(index=False):
        print(f'Title: {closest.title}')
        # print(f'Description: {closest.description}')
        print(f'Url: {closest.job_url}')
        # print(f'Similarity: {closest.cos_sim}')
    print('_'*15)
    most_disimilar = sorted_jobs.tail()
    for closest in most_disimilar.itertuples(index=False):
        print(f'Title: {closest.title}')
        # print(f'Description: {closest.description}')
        print(f'Url: {closest.job_url}')
        # print(f'Similarity: {closest.cos_sim}')

sort_by_similarity_and_log('cos_sim_resume')

print('*'*35)

sort_by_similarity_and_log('cos_sim_fake_job_posting')


Title: Sr. Software Engineer II - StreamingTV
Url: https://www.indeed.com/viewjob?jk=4d149be11348a3ef
Title: Developer Advocate, Developer Productivity, DevEx
Url: https://www.indeed.com/viewjob?jk=f0857bc1b48b4ead
Title: Sr. Software Engineer, Apple TV Service
Url: https://www.indeed.com/viewjob?jk=06f534caa62bac27
Title: Senior, Software Engineer
Url: https://www.indeed.com/viewjob?jk=d51fa646d094737d
Title: Developer Advocate, Developer Productivity, DevEx
Url: https://www.indeed.com/viewjob?jk=da15865a44121cbd
_______________
Title: Sr Software Development Engineer
Url: https://www.indeed.com/viewjob?jk=6432203d0189962c
Title: Sr Software Development Engineer
Url: https://www.indeed.com/viewjob?jk=937d2fe307ce55e3
Title: Senior Backend Engineer (Blockchain)
Url: https://www.indeed.com/viewjob?jk=aabd8b03b8b760c7
Title: Staff Software Development Engineer
Url: https://www.indeed.com/viewjob?jk=dcf3ae00b92a9ed5
Title: Campus Graduate Summer Internship Program - 2026 Software Engineer