# Job descriptions

Given a particular company's career page:
- Download+cache all open job descriptions
- Feed all job descriptions through an LLM extractor
- Standardize the job description information

Experiment ideas:
- Try using a Langsmith dataset and experiment
- Try building out tools for the popular ATS systems to best handle their data formats
- Extend the scraping code to pick up a career page dynamically and shift to ATS-specific parsers


In [1]:
from core import init

init()

In [2]:
# try testing it out on Gable
from data_sources.job_listings import Workable, SmartRecruiters, Ashby

gable_listings = await Workable.crawl_jobs("https://apply.workable.com/gable/")

gable_listings

[32m2024-11-25 17:09:43.013[0m | [1mINFO    [0m | [36mdata_sources.job_listings[0m:[36mscrape_job_links[0m:[36m84[0m - [1mScraped https://apply.workable.com/gable/[0m
[32m2024-11-25 17:09:43.037[0m | [1mINFO    [0m | [36mdata_sources.job_listings[0m:[36mscrape_job_descriptions[0m:[36m44[0m - [1mScraping ['https://apply.workable.com/gable/j/27509D233B/', 'https://apply.workable.com/gable/j/E176E721AD/', 'https://apply.workable.com/gable/j/CE8AC0E609/', 'https://apply.workable.com/gable/j/99A4F53602/', 'https://apply.workable.com/gable/j/1A48FFBF3B/'][0m
[32m2024-11-25 17:09:43.538[0m | [1mINFO    [0m | [36mdata_sources.job_listings[0m:[36mscrape_job_descriptions[0m:[36m55[0m - [1mScraped https://apply.workable.com/gable/j/1A48FFBF3B/[0m
[32m2024-11-25 17:09:44.040[0m | [1mINFO    [0m | [36mdata_sources.job_listings[0m:[36mscrape_job_descriptions[0m:[36m55[0m - [1mScraped https://apply.workable.com/gable/j/99A4F53602/[0m
[32m2024-11-25 17:

[<scrapfly.api_response.ScrapeApiResponse at 0x7f2510210610>,
 <scrapfly.api_response.ScrapeApiResponse at 0x7f250f83b5d0>,
 <scrapfly.api_response.ScrapeApiResponse at 0x7f250f84c1d0>,
 <scrapfly.api_response.ScrapeApiResponse at 0x7f250f858d50>,
 <scrapfly.api_response.ScrapeApiResponse at 0x7f250f861810>]

In [3]:
gable_results = [Workable.parse_job(job) for job in gable_listings]
gable_results

[{'url': 'https://apply.workable.com/gable/j/1A48FFBF3B/',
  'job_description_text': "Share this job\xa0SVGs not supported by this browser.DescriptionAbout us:Gable.ai is a Seattle-based startup revolutionizing the data industry. Through our data communication, change management, and collaboration platform, we empower developers to build and manage data assets, bridging the gap between data producers and consumers to upscale data quality. Fresh out of stealth mode and backed by prominent venture partners, our mission is to reshape data management by fostering collaboration and innovation. Join us in transforming the landscape of the data industry!As a Static Code Analysis Expert at Gable.ai, you will be at the forefront of developing and integrating static code analysis tools that are core to our product offerings. Your role will involve designing, implementing, and maintaining static analysis tools and features that help improve the quality, security, and maintainability of our client

In [4]:
{k: len(v) for k, v in gable_results[0].items()}

{'url': 46,
 'job_description_text': 3857,
 'job_description_html': 7327,
 'job_description_md': 4084}

In [5]:
abridge_listings = await Ashby.crawl_jobs("https://jobs.ashbyhq.com/Abridge")
abridge_listings

[32m2024-11-25 17:09:46.134[0m | [1mINFO    [0m | [36mdata_sources.job_listings[0m:[36mscrape_job_descriptions[0m:[36m44[0m - [1mScraping ['https://jobs.ashbyhq.com/Abridge/c37f7f5c-ec63-4983-8f3f-b13bb85e088d', 'https://jobs.ashbyhq.com/Abridge/77e38354-bf42-42de-b404-ed2648414d23', 'https://jobs.ashbyhq.com/Abridge/0481e7b5-7252-472d-b8be-63d347bd2198', 'https://jobs.ashbyhq.com/Abridge/d980a314-1c5f-422e-99f9-d36bda21f49d', 'https://jobs.ashbyhq.com/Abridge/52f68350-2209-4327-bd4d-63eba4a564d5', 'https://jobs.ashbyhq.com/Abridge/f71cb8cf-d160-478c-8391-f7db60582e5e', 'https://jobs.ashbyhq.com/Abridge/25bfeaa6-7d0f-4026-85aa-cdab2aa5b725', 'https://jobs.ashbyhq.com/Abridge/47b16b43-be73-4a97-bb42-79b47d0feb92', 'https://jobs.ashbyhq.com/Abridge/e9af8bf2-21c6-458d-adc8-b0d59d6a9061', 'https://jobs.ashbyhq.com/Abridge/7a28a84b-6756-4fe6-8af9-501fc8772a62', 'https://jobs.ashbyhq.com/Abridge/a8a2b7af-992c-4121-b12c-81149e871469', 'https://jobs.ashbyhq.com/Abridge/03699ed8-5cf5

[<scrapfly.api_response.ScrapeApiResponse at 0x7f250f74bc10>,
 <scrapfly.api_response.ScrapeApiResponse at 0x7f250f787190>,
 <scrapfly.api_response.ScrapeApiResponse at 0x7f250f79a310>,
 <scrapfly.api_response.ScrapeApiResponse at 0x7f250f783e10>,
 <scrapfly.api_response.ScrapeApiResponse at 0x7f250f747490>,
 <scrapfly.api_response.ScrapeApiResponse at 0x7f250f756c50>,
 <scrapfly.api_response.ScrapeApiResponse at 0x7f250f772690>,
 <scrapfly.api_response.ScrapeApiResponse at 0x7f250f729f90>,
 <scrapfly.api_response.ScrapeApiResponse at 0x7f250f730c10>,
 <scrapfly.api_response.ScrapeApiResponse at 0x7f250f76af50>,
 <scrapfly.api_response.ScrapeApiResponse at 0x7f250f7111d0>,
 <scrapfly.api_response.ScrapeApiResponse at 0x7f250f6f5b90>,
 <scrapfly.api_response.ScrapeApiResponse at 0x7f250f6fb0d0>,
 <scrapfly.api_response.ScrapeApiResponse at 0x7f250f6fbd50>,
 <scrapfly.api_response.ScrapeApiResponse at 0x7f250f7d9d10>,
 <scrapfly.api_response.ScrapeApiResponse at 0x7f25101f28d0>,
 <scrapf

In [7]:
logic2020_listings = await SmartRecruiters.crawl_jobs("https://careers.smartrecruiters.com/Logic2020Inc")

[32m2024-11-25 17:11:38.282[0m | [1mINFO    [0m | [36mdata_sources.job_listings[0m:[36mscrape_job_descriptions[0m:[36m44[0m - [1mScraping ['https://jobs.smartrecruiters.com/Logic2020Inc/744000027953455-senior-business-development-executive', 'https://jobs.smartrecruiters.com/Logic2020Inc/744000025287191-sr-business-development-executive', 'https://jobs.smartrecruiters.com/Logic2020Inc/744000025114686-sr-business-development-executive', 'https://jobs.smartrecruiters.com/Logic2020Inc/744000025115865-sr-business-development-executive', 'https://jobs.smartrecruiters.com/Logic2020Inc/744000024645686-manager-sap-s-4hana-functional-analyst', 'https://jobs.smartrecruiters.com/Logic2020Inc/744000014624384-consulting-manager-energy-utilities', 'https://jobs.smartrecruiters.com/Logic2020Inc/744000014623806-consulting-manager-energy-utilities', 'https://jobs.smartrecruiters.com/Logic2020Inc/744000011980069-senior-consultant-energy-utilities', 'https://jobs.smartrecruiters.com/Logic2020I

In [11]:

import pandas as pd

job_descriptions = [
    Workable.parse_job(response) for response in gable_listings
] + [
    Ashby.parse_job(response) for response in abridge_listings
] + [
    SmartRecruiters.parse_job(response) for response in logic2020_listings
]

data = pd.DataFrame(job_descriptions)
data.to_csv("scraped_job_descriptions.csv", index=False)
data

Unnamed: 0,url,job_description_text,job_description_html,job_description_md
0,https://apply.workable.com/gable/j/1A48FFBF3B/,Share this job SVGs not supported by this brow...,"<main class=""styles--2d3Fz"" role=""main"">\n <di...",\nShare this job\n\n\nSVGs not supported by th...
1,https://apply.workable.com/gable/j/99A4F53602/,Share this job SVGs not supported by this brow...,"<main class=""styles--2d3Fz"" role=""main"">\n <di...",\nShare this job\n\n\nSVGs not supported by th...
2,https://apply.workable.com/gable/j/CE8AC0E609/,Share this job SVGs not supported by this brow...,"<main class=""styles--2d3Fz"" role=""main"">\n <di...",\nShare this job\n\n\nSVGs not supported by th...
3,https://apply.workable.com/gable/j/E176E721AD/,Share this job SVGs not supported by this brow...,"<main class=""styles--2d3Fz"" role=""main"">\n <di...",\nShare this job\n\n\nSVGs not supported by th...
4,https://apply.workable.com/gable/j/27509D233B/,Share this job SVGs not supported by this brow...,"<main class=""styles--2d3Fz"" role=""main"">\n <di...",\nShare this job\n\n\nSVGs not supported by th...
5,https://jobs.ashbyhq.com/Abridge/15c327d9-fdf0...,Abridge was founded in 2018 with the mission o...,"<div aria-labelledby=""job-overview"" class=""_de...",\n\nAbridge was founded in 2018 with the missi...
6,https://jobs.ashbyhq.com/Abridge/d9234ea7-6052...,Abridge was founded in 2018 with the mission o...,"<div aria-labelledby=""job-overview"" class=""_de...",\n\nAbridge was founded in 2018 with the missi...
7,https://jobs.ashbyhq.com/Abridge/4e740f28-085b...,Abridge was founded in 2018 with the mission o...,"<div aria-labelledby=""job-overview"" class=""_de...",\n\nAbridge was founded in 2018 with the missi...
8,https://jobs.ashbyhq.com/Abridge/ce26ef6a-0d94...,Abridge was founded in 2018 with the mission o...,"<div aria-labelledby=""job-overview"" class=""_de...",\n\nAbridge was founded in 2018 with the missi...
9,https://jobs.ashbyhq.com/Abridge/d5893b75-8f76...,Abridge was founded in 2018 with the mission o...,"<div aria-labelledby=""job-overview"" class=""_de...",\n\nAbridge was founded in 2018 with the missi...


In [None]:
# data.to_csv("scraped_job_descriptions.csv", index=False)