# AI Builders Week 3 Homework
## Resume matcher

In [1]:
import pymupdf
import requests

response = requests.get('https://miguel-vila.github.io/resume_EN.pdf')
response.raise_for_status()
pdf_bytes = response.content
resume = pymupdf.open(stream=pdf_bytes, filetype='pdf')

In [2]:
print("Number of pages:", len(resume))
resume_str = '\n'.join([page.get_text() for page in resume])
resume_str

Number of pages: 3


'MIGUEL VIL´A GONZ´ALEZ\nmiguel-vila @ github\nPROFESSIONAL EXPERIENCE\nSenior Software Engineer\nRemote - US East Coast Timezone\nSiriusXM\nNovember 2022 - Present\n· Member of the API tooling team, part of the platform services enablement organization.\n· Develop tooling using Smithy, an AWS DDL language, empowering teams to describe, implement, and consume\ntheir services.\n· Key projects include:\n· Developed a system for describing which Smithy services are implemented and consumed by applications,\nenabling package management and dependency tracing across the platform.\n· Implemented compatibility checks to verify service changes against potential data processing breakages.\n· Contributed to a team implementing semantic search using vector embeddings stored in OpenSearch.\n· Set up infrastructure to ingest catalog data and generate embeddings using different models.\n· Set up a client to query the different indexes using the embeddings.\nLead Software Engineer ←Senior Software En

In [3]:
import os
from openai import OpenAI

openAI = OpenAI()

fake_job_posting_response = openAI.responses.create(
    model="gpt-4o",
    input=
        f"""The following is a resume for a specific person.
        From that resume create a job listing that would fit that person.
        Don't "overfit": use the described experience but don't reproduce the CV.
        You don't have to mention specific companies.
        Only mention the most commonly mentioned technologies or the ones that appear first in lists.
        Include sections like 'About the role'/'Job description', 'Qualifications', 'Preferred Qualifications':
        {resume_str}""",
)

In [4]:
from IPython.display import display, Markdown

fake_job_posting = fake_job_posting_response.output[0].content[0].text
display(Markdown(fake_job_posting))

**Job Title:** Senior Software Engineer - Cloud & API Tooling

**Location:** Remote - US East Coast Timezone

---

**About the Role:**

We are seeking a highly skilled Senior Software Engineer to join our innovative team. You will play a crucial role in developing and optimizing tooling for cloud-based services using cutting-edge technologies like Scala, Python, and AWS. The ideal candidate will leverage their extensive experience in API development and cloud infrastructure to enhance our platform services and contribute to impactful projects.

**Key Responsibilities:**

- Develop and maintain tooling using AWS and Smithy to support service description, implementation, and consumption.
- Lead projects focused on semantic search and vector-based data processing using OpenSearch.
- Create and optimize infrastructure for data integration and dependency management.
- Guide cross-functional teams in feature delivery and system migration.
- Craft technical proposals and drive decision-making for system evolution.

**Qualifications:**

- Proven experience in software development with a focus on Scala, Python, and AWS services.
- Strong background in API development and management.
- Experience with service compatibility and data processing validation.
- Knowledge in semantic search implementation and vector embeddings.
- Excellent communication and collaboration skills for working with cross-functional teams.

**Preferred Qualifications:**

- Experience in subscription lifecycle management and e-commerce systems.
- Background in streaming platforms and media services engineering.
- Contributions to open-source projects and experience with functional programming.
- Familiarity with teaching or mentoring team members, improving internal documentation, and onboard new members efficiently.
- Participation in competitive programming contests or other technical competitions.

Join us in shaping the future of platform services and empowering developers to reach new heights with innovative cloud solutions. Apply today and be part of a team that values growth, collaboration, and excellence.

In [5]:
import pandas as pd

jobs = pd.read_csv('jobs.csv')

jobs.head()

Unnamed: 0,id,site,job_url,job_url_direct,title,company,location,date_posted,job_type,salary_source,...,company_addresses,company_num_employees,company_revenue,company_description,skills,experience_range,company_rating,company_reviews_count,vacancy_count,work_from_home_type
0,in-764d02c34833d113,indeed,https://www.indeed.com/viewjob?jk=764d02c34833...,https://jobs.ashbyhq.com/yumaai/cdc8c768-e82b-...,AI Product Focused - Senior Fullstack / Rails ...,Yuma AI,"Boston, MA, US",2025-09-02,fulltime,direct_data,...,,,,,,,,,,
1,in-82e200b3b54e6886,indeed,https://www.indeed.com/viewjob?jk=82e200b3b54e...,https://jobs.ashbyhq.com/gallatin/b7ed9bbb-496...,DevOps Engineer,Gallatin,"El Segundo, CA, US",2025-09-02,fulltime,direct_data,...,,,,,,,,,,
2,in-c9747635cab5e167,indeed,https://www.indeed.com/viewjob?jk=c9747635cab5...,https://grnh.se/mwr1jefn2us,"Staff Software Engineer, Growth Products",Lyft,"San Francisco, CA, US",2025-09-02,,direct_data,...,,,,Multiply your earnings when you drive with Lyf...,,,,,,
3,in-07678b0399d77ed6,indeed,https://www.indeed.com/viewjob?jk=07678b0399d7...,https://grnh.se/e8e4gmqg2us,"Staff Software Engineer, Growth Products",Lyft,"New York, NY, US",2025-09-02,,direct_data,...,,,,Multiply your earnings when you drive with Lyf...,,,,,,
4,in-f89ce04ae801a7c3,indeed,https://www.indeed.com/viewjob?jk=f89ce04ae801...,https://grnh.se/nxoofj1z2us,"Staff Software Engineer, Growth Products",Lyft,"Seattle, WA, US",2025-09-02,,direct_data,...,,,,Multiply your earnings when you drive with Lyf...,,,,,,


In [6]:
from sentence_transformers import SentenceTransformer
import torch
import numpy as np

MODEL_NAME = "sentence-transformers/all-mpnet-base-v2"

device = "cuda" if torch.cuda.is_available() else "cpu"
model = SentenceTransformer(MODEL_NAME, device=device)

embeddings = model.encode(
    jobs['description'],
    batch_size=32,
    convert_to_tensor=True,
    normalize_embeddings=True,
    show_progress_bar=False
)

jobs['embedding'] = [emb.detach().cpu().numpy() for emb in embeddings]

In [8]:
import torch.nn.functional as F

resume_embedding = model.encode(
    resume_str,
    batch_size=32,
    convert_to_tensor=True,
    normalize_embeddings=True,
    show_progress_bar=False
)

fake_job_posting_embedding = model.encode(
    fake_job_posting,
    batch_size=32,
    convert_to_tensor=True,
    normalize_embeddings=True,
    show_progress_bar=False
)

def get_cos_sims(target_embedding):
    cos_sims = [
        F.cosine_similarity(
            torch.tensor(embedding, dtype=torch.float32).unsqueeze(0),
            target_embedding.unsqueeze(0)
        ) for embedding in jobs['embedding']
    ]
    return [cos_sim[0].detach().cpu().numpy() for cos_sim in cos_sims]

jobs['cos_sim_resume'] = get_cos_sims(resume_embedding)
jobs['cos_sim_fake_job_posting'] = get_cos_sims(fake_job_posting_embedding)

def sort_by_similarity_and_log(column_name):
    sorted_jobs = jobs.sort_values(column_name, ascending=False)
    most_similar = sorted_jobs.head()
    # print(most_similar)
    # print(most_similar.shape())
    for closest in most_similar.itertuples(index=False):
        print(f'Title: {closest.title}')
        # print(f'Description: {closest.description}')
        print(f'Url: {closest.job_url}')
        # print(f'Similarity: {closest.cos_sim}')
    print('_'*15)
    most_disimilar = sorted_jobs.tail()
    for closest in most_disimilar.itertuples(index=False):
        print(f'Title: {closest.title}')
        # print(f'Description: {closest.description}')
        print(f'Url: {closest.job_url}')
        # print(f'Similarity: {closest.cos_sim}')

sort_by_similarity_and_log('cos_sim_resume')

print('*'*35)

sort_by_similarity_and_log('cos_sim_fake_job_posting')


Title: Senior Software Engineer, Applied AI
Url: https://www.indeed.com/viewjob?jk=a53c64fadb77ee9b
Title: Software Engineer - Analytics Platforms & Experiences (APX)
Url: https://www.indeed.com/viewjob?jk=5693ba7970271f3f
Title: Sr. Software Engineer II - StreamingTV
Url: https://www.indeed.com/viewjob?jk=4d149be11348a3ef
Title: Software Engineer - Fullstack
Url: https://www.indeed.com/viewjob?jk=f75d0e7489b9c668
Title: Developer Advocate, Developer Productivity, DevEx
Url: https://www.indeed.com/viewjob?jk=da15865a44121cbd
_______________
Title: Sr Software Development Engineer
Url: https://www.indeed.com/viewjob?jk=50ebc522b8d09ed9
Title: Software Engineer (7300U) - Berkeley Seismological Lab
Url: https://www.indeed.com/viewjob?jk=535907521ac63569
Title: Senior Backend Engineer (Blockchain)
Url: https://www.indeed.com/viewjob?jk=aabd8b03b8b760c7
Title: Internship, Software Engineer, Factory Software (Winter/Spring 2026)
Url: https://www.indeed.com/viewjob?jk=691aab2a0abc6c5c
Title: 