# DataSlush Assignment: Prototype Recommender

# DataSlush Prototype Recommender
This notebook prototypes a recommendation system for the DataSlush AI Data Engineer assessment.  
Steps:
1. Install dependencies  
2. Upload dataset (`talent_samples.csv` and `jobs.json`)  
3. Explore dataset  
4. Define helper functions  
5. Implement baseline recommender  
6. Test results (Top 10 candidates per job)  


In [None]:
!pip install -q scikit-learn pandas

In [None]:
from google.colab import files
uploaded=files.upload()


Saving jobs.json to jobs.json
Saving talent_samples.csv to talent_samples.csv


LOADING DATA

In [None]:
import pandas as pd
import json

# loading csv file
talents=pd.read_csv("talent_samples.csv")

# loading json file
jobs= json.load(open("jobs.json","r"))

print("Number of Candidates:",len(talents))
print("Number of Jobs",len(jobs))

# show top 5 candidates data
talents.head()

Number of Candidates: 500
Number of Jobs 3


Unnamed: 0,First Name,Last Name,Profile Description,Gender,City,Country,Job Types,Skills,Software,Content Verticals,Creative Styles,Platforms,Past Creators,Monthly Rate,Hourly Rate,# of Views by Creators
0,Sarah,Moore,"Hello! Sarah, the Podcast Editor from Seattle,...",Female,Seattle,United States,"Podcast Editor, Social Media Manager, Finance","Reconciliation, Scheduling posts, Project mana...","Slack, Quickbooks, Xero, TaxWise, Wave, Adobe ...","Food & Cooking, Automotive & Cars, IRL, Kids &...",,"Facebook, TikTok, Apple, Twitter, Instagram","Trek Trendy, Institute of Human Anatomy, Hey N...",8079,54,373
1,Michael,Miller,Greetings from Phoenix! Michael at your servic...,Female,Phoenix,United States,"In-House Creator, Legal Counsel, Voiceover Artist","Filming, Voice Acting, Contract review & redli...","Final Cut Pro, Frame.io, Adobe Audition, Excel","Finance & Business, Scripted & Skits",,"Twitter, Spotify, Instagram, Facebook, TikTok","Hobo Ahle, Living the Van Life, Ted-Ed, Michae...",9578,58,305
2,William,Taylor,"William here, a Ideation Strategist rocking it...",Male,Austin,United States,"Ideation Strategist, Researcher","Copywriting, Sourcing Stock Footage, Research,...","G-suite, Figma, Asana, Trello, Notion","Travel, Food & Cooking, Lifestyle & Vlogs, How...","Laid Back, Peaceful, High Production","Apple Podcasts, Snapchat, YouTube, Apple, Face...","Nomadic Matt, Hobo Ahle, PBS Eons, SmarterEver...",7132,93,166
3,Mary,Taylor,"Mary here, a Digital Products rocking it in To...",Male,Tokyo,India,Digital Products,Selling digital products,Figma,"Automotive & Cars, Sports & Fitness, Food & Co...",Peaceful,"Spotify, Facebook","High On Life, Hey Nadine, Wolters World, The B...",6561,98,127
4,William,Lopez,Greetings from Washington! William at your ser...,Male,Washington,United States,"Photographer, HR & People","Employee training, Health insurance, On/off-bo...","Workday, Adobe Lightroom, Adobe Photoshop, Opt...","Travel, Kids & Family, Beauty & Fashion, Sport...","Energetic, Aesthetic, Helpful, Peaceful, Talki...","Snapchat, YouTube, Facebook, Apple, Twitter, S...","Bill Nye, Living the Van Life, Veritasium, Tre...",7645,63,174


In [None]:
# Let's look at the first job (index 0)
job = jobs[0]

print("Job ID:", job["id"])
print("Title:", job["title"])
print("Creator:", job["creator"])
print("Description:", job["description"])
print("Required Skills:", job["top_required_skills"])
print("Preferred Location:", job["preferred_location"])
print("Budget:", job.get("monthly_budget", job.get("hourly_budget")))


Job ID: 0
Title: Video Editor
Creator: https://www.youtube.com/channel/UCi2qHfRMVEI_yHH90gZBevQ
Description: Looking for a talented Video Editor with experience in Adobe Premiere Pro. Categories: Entertainment/Lifestyle & Vlogs. Content form: short-form and long-form.
Required Skills: ['splice & dice', 'rough cut & sequencing', '2d animation']
Preferred Location: asia
Budget: 2500


In [None]:
# Show column names
print("Columns in dataset:", list(talents.columns))

# Show first 5 rows of all columns
talents.head()


Columns in dataset: ['First Name', 'Last Name', 'Profile Description', 'Gender', 'City', 'Country', 'Job Types', 'Skills', 'Software', 'Content Verticals', 'Creative Styles', 'Platforms', 'Past Creators', 'Monthly Rate', 'Hourly Rate', '# of Views by Creators']


Unnamed: 0,First Name,Last Name,Profile Description,Gender,City,Country,Job Types,Skills,Software,Content Verticals,Creative Styles,Platforms,Past Creators,Monthly Rate,Hourly Rate,# of Views by Creators
0,Sarah,Moore,"Hello! Sarah, the Podcast Editor from Seattle,...",Female,Seattle,United States,"Podcast Editor, Social Media Manager, Finance","Reconciliation, Scheduling posts, Project mana...","Slack, Quickbooks, Xero, TaxWise, Wave, Adobe ...","Food & Cooking, Automotive & Cars, IRL, Kids &...",,"Facebook, TikTok, Apple, Twitter, Instagram","Trek Trendy, Institute of Human Anatomy, Hey N...",8079,54,373
1,Michael,Miller,Greetings from Phoenix! Michael at your servic...,Female,Phoenix,United States,"In-House Creator, Legal Counsel, Voiceover Artist","Filming, Voice Acting, Contract review & redli...","Final Cut Pro, Frame.io, Adobe Audition, Excel","Finance & Business, Scripted & Skits",,"Twitter, Spotify, Instagram, Facebook, TikTok","Hobo Ahle, Living the Van Life, Ted-Ed, Michae...",9578,58,305
2,William,Taylor,"William here, a Ideation Strategist rocking it...",Male,Austin,United States,"Ideation Strategist, Researcher","Copywriting, Sourcing Stock Footage, Research,...","G-suite, Figma, Asana, Trello, Notion","Travel, Food & Cooking, Lifestyle & Vlogs, How...","Laid Back, Peaceful, High Production","Apple Podcasts, Snapchat, YouTube, Apple, Face...","Nomadic Matt, Hobo Ahle, PBS Eons, SmarterEver...",7132,93,166
3,Mary,Taylor,"Mary here, a Digital Products rocking it in To...",Male,Tokyo,India,Digital Products,Selling digital products,Figma,"Automotive & Cars, Sports & Fitness, Food & Co...",Peaceful,"Spotify, Facebook","High On Life, Hey Nadine, Wolters World, The B...",6561,98,127
4,William,Lopez,Greetings from Washington! William at your ser...,Male,Washington,United States,"Photographer, HR & People","Employee training, Health insurance, On/off-bo...","Workday, Adobe Lightroom, Adobe Photoshop, Opt...","Travel, Kids & Family, Beauty & Fashion, Sport...","Energetic, Aesthetic, Helpful, Peaceful, Talki...","Snapchat, YouTube, Facebook, Apple, Twitter, S...","Bill Nye, Living the Van Life, Veritasium, Tre...",7645,63,174


In [None]:
# Create a simpler DataFrame with just the columns we need
talents_clean = pd.DataFrame({
    "name": talents["First Name"] + " " + talents["Last Name"],
    "location": talents["Country"],
    "skills": talents["Skills"],
    "job_types": talents["Job Types"],
    "bio": talents["Profile Description"]
})

# Show first 5 rows
talents_clean.head()


Unnamed: 0,name,location,skills,job_types,bio
0,Sarah Moore,United States,"Reconciliation, Scheduling posts, Project mana...","Podcast Editor, Social Media Manager, Finance","Hello! Sarah, the Podcast Editor from Seattle,..."
1,Michael Miller,United States,"Filming, Voice Acting, Contract review & redli...","In-House Creator, Legal Counsel, Voiceover Artist",Greetings from Phoenix! Michael at your servic...
2,William Taylor,United States,"Copywriting, Sourcing Stock Footage, Research,...","Ideation Strategist, Researcher","William here, a Ideation Strategist rocking it..."
3,Mary Taylor,India,Selling digital products,Digital Products,"Mary here, a Digital Products rocking it in To..."
4,William Lopez,United States,"Employee training, Health insurance, On/off-bo...","Photographer, HR & People",Greetings from Washington! William at your ser...


In [None]:
def recommend(job_id, top_n=10):
    job = jobs[job_id]
    required_skills = [skill.lower().strip() for skill in job["top_required_skills"]]
    preferred_location = job["preferred_location"].lower()

    results = []

    for i, row in talents_clean.iterrows():
        candidate_name = row["name"]
        candidate_location = str(row["location"]).lower()

        # Candidate skills as one lowercase string
        candidate_skills = str(row["skills"]).lower()

        # Count a match if required skill is a substring inside candidate skills
        matches = sum(1 for skill in required_skills if skill in candidate_skills)

        # Location score
        location_score = 1 if preferred_location and candidate_location == preferred_location else 0

        # Final score
        final_score = matches + location_score

        results.append({
            "name": candidate_name,
            "location": row["location"],
            "skills": row["skills"],
            "matches": matches,
            "location_score": location_score,
            "final_score": final_score
        })

    results_df = pd.DataFrame(results)
    top_candidates = results_df.sort_values(by="final_score", ascending=False).head(top_n)

    print("Job:", job["title"])
    print("Required Skills:", job["top_required_skills"])
    print("Preferred Location:", job["preferred_location"])
    print("---- TOP CANDIDATES ----")

    return top_candidates



In [None]:
# JOB 1: top 10 candidates
recommend(0)

Job: Video Editor
Required Skills: ['splice & dice', 'rough cut & sequencing', '2d animation']
Preferred Location: asia
---- TOP CANDIDATES ----


Unnamed: 0,name,location,skills,matches,location_score,final_score
88,James Williams,United States,"2D Animation, Animation, Sound Designing, 3D A...",3,0,3
304,Jennifer Jackson,United States,"Color Grading, 2D Animation, Project managemen...",3,0,3
47,Patricia Davis,United States,"Color Grading, Copywriting, 2D Animation, Mana...",3,0,3
343,Charles Martin,India,"2D Animation, Animation, Sound Designing, 3D A...",3,0,3
40,Robert Brown,United States,"Color Grading, 2D Animation, 3D Animation, Rou...",3,0,3
344,David Martinez,United States,"Color Grading, 2D Animation, Employee training...",3,0,3
112,Jessica Brown,Mexico,"Color Grading, 2D Animation, Backend systems, ...",3,0,3
345,Elizabeth Jones,United States,"Color Grading, Copywriting, 2D Animation, Run ...",3,0,3
117,Richard Taylor,United States,"Color Grading, 2D Animation, Selling digital p...",3,0,3
407,Susan Anderson,United States,"Color Grading, 2D Animation, Investigation, SE...",3,0,3


In [None]:
# JOB 2: top 10 candidates
recommend(1)

Job: Producer/Video Editor
Required Skills: ['storyboarding', 'sound designing', 'rough cut & sequencing', 'filming']
Preferred Location: new york
---- TOP CANDIDATES ----


Unnamed: 0,name,location,skills,matches,location_score,final_score
381,Sarah Miller,Brazil,"Filming, 2D Animation, Color Grading, Run & Gu...",4,0,4
482,Robert Jones,United States,"Color Grading, Filming, Project management, St...",4,0,4
34,Linda Smith,United States,"Filming, Copywriting, Subtitling, Storyboardin...",3,0,3
6,Linda Martinez,United States,"Filming, Project management, Run & Gun Filming...",3,0,3
32,Elizabeth Jones,Thailand,"Filming, Scheduling posts, Project management,...",3,0,3
442,Elizabeth Jackson,United States,"Filming, Storyboarding, Backend systems, Sound...",3,0,3
21,John Williams,United Kingdom,"Filming, Management agency negotiations, Run &...",3,0,3
45,Barbara Wilson,United States,"Filming, Storyboarding, CTR Optimization, Soun...",3,0,3
450,David Wilson,United States,"Filming, Storyboarding, Sound Designing, CTR O...",3,0,3
418,Karen Smith,United States,"Filming, Storyboarding, Sound Designing, CTR O...",3,0,3


In [None]:
# JOB 3: top 10 candidates
recommend(2)

Job: Chief Operating Officer
Required Skills: ['strategy', 'consulting', 'business operations', 'development']
Preferred Location: global
---- TOP CANDIDATES ----


Unnamed: 0,name,location,skills,matches,location_score,final_score
317,Elizabeth Davis,Russia,"Operations, Selling digital products, Audience...",1,0,1
203,Linda Garcia,India,"Project management, Selling digital products, ...",1,0,1
113,Joseph Jackson,Thailand,"Filming, Contract review & redline, CTR Optimi...",1,0,1
390,Richard Martin,China,"Filming, Community Management, Project managem...",1,0,1
313,Elizabeth Moore,Japan,"Project management, Run & Gun Filming, Storybo...",1,0,1
312,Richard Smith,Singapore,"Project management, Operations, Budgeting, Neg...",1,0,1
311,Barbara Johnson,United States,"Community Management, Run & Gun Filming, Audie...",1,0,1
419,James Taylor,United States,"Filming, Selling digital products, Storyboardi...",1,0,1
229,Charles Jones,Argentina,"Selling digital products, Storyboarding, Voice...",1,0,1
354,Joseph Jones,United States,"Community Management, Storyboarding, CTR Optim...",1,0,1


In [None]:
job1_top10 = recommend(job_id=0).head(10)
job2_top10 = recommend(job_id=1).head(10)
job3_top10 = recommend(job_id=2).head(10)


Job: Video Editor
Required Skills: ['splice & dice', 'rough cut & sequencing', '2d animation']
Preferred Location: asia
---- TOP CANDIDATES ----
Job: Producer/Video Editor
Required Skills: ['storyboarding', 'sound designing', 'rough cut & sequencing', 'filming']
Preferred Location: new york
---- TOP CANDIDATES ----
Job: Chief Operating Officer
Required Skills: ['strategy', 'consulting', 'business operations', 'development']
Preferred Location: global
---- TOP CANDIDATES ----


In [None]:
job1_top10["Job"] = "Video Editor"
job2_top10["Job"] = "Producer/Video Editor"
job3_top10["Job"] = "Chief Operating Officer"

all_jobs_top10 = pd.concat([job1_top10, job2_top10, job3_top10], ignore_index=True)
all_jobs_top10.to_csv("all_jobs_top10.csv", index=False)

files.download("all_jobs_top10.csv")


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>