## LinkedIn Jobs API Demo

This notebook details a local demonstration on how to use the web-scraping Scraping Dog API to access LinkedIn job postings. Using a web-scraping API is easy to use (no CAPTCHAs), bypasses proxy limits, and avoids any legal trouble. However, we are limited by API rates and the current functionality of the API. Official documentation can be found with this [link](https://docs.scrapingdog.com/linkedin-jobs-scraper).

In [2]:
import os
import pandas as pd
from datetime import datetime

import utils
import scrape
import get_data

You will have to write your own `secrets.toml` file containing your own Scraping Dog API key if you wish to use our custom function. Otherwise, you can simply set the `SCRAPING_DOG_API_KEY` to your key as a string. The Scraping Dog API key can be obtained in the [member area](https://api.scrapingdog.com/login).

In [3]:
# Step 1: Obtain API key.
SCRAPING_DOG_API_KEY = utils.get_scraping_dog_api_key()

# Alternatively, uncomment this line and insert your own API key.
# SCRAPING_DOG_API_KEY = "" 

In [6]:
# Step 2: Grab job postings.

SAMPLE_PATH = 'sample.csv'
FIELD = 'data scientist'
SORT_BY = 'day'

if os.path.isfile(SAMPLE_PATH):
    jobs_df = pd.read_csv(SAMPLE_PATH)
else:
    # jobs_df = get_data.get_linkedin_jobs_data(
    #     api_key=SCRAPING_DOG_API_KEY,
    #     field=FIELD,
    #     geoid='102095887',
    #     page=1,
    #     sort_by=SORT_BY # Optional parameter to sort jobs based on their posting date: {`day`, `week`, `month`}
    # )

    jobs_df = pd.DataFrame(scrape.sd_fetch_jobs_from_page(
        api_key=SCRAPING_DOG_API_KEY,
        field=FIELD,
        geoid='102095887',
        page=1,
        sort_by=SORT_BY, # Optional parameter to sort jobs based on their posting date: {`day`, `week`, `month`}
        verbose=True
    ))
    
    
    with open('data/.csv', 'w') as write_file:
        timestamp = datetime.now().strftime(r'%Y%m%d_%H%M%S')
        field = '_'.join([w for w in FIELD.split()])
        jobs_df.to_csv('data/{}_{}_linkedin_jobs.csv'.format(timestamp, field), index=False, lineterminator='\n')

Successful call to Scraping Dog API
Returning 25 job listings


In [9]:
jobs_df

Unnamed: 0,job_position,job_link,job_id,company_name,company_profile,job_location,job_posting_date
0,Junior Data Scientist,https://www.linkedin.com/jobs/view/junior-data...,3842926205,Flexon Technologies Inc.,https://www.linkedin.com/company/flexon-techno...,"Pleasanton, CA",2024-02-29
1,Junior Data Scientist,https://www.linkedin.com/jobs/view/junior-data...,3793915483,Astrana Health,https://www.linkedin.com/company/astranahealth...,"Alhambra, CA",2024-02-29
2,Data Scientist,https://www.linkedin.com/jobs/view/data-scient...,3839533301,"Milestone Technologies, Inc.",https://www.linkedin.com/company/milestone-tec...,"Thousand Oaks, CA",2024-02-29
3,Machine Learning- Engineer,https://www.linkedin.com/jobs/view/machine-lea...,3825424780,TigerGraph,https://www.linkedin.com/company/tigergraph?tr...,"Redwood City, CA",2024-02-29
4,Data Scientist - Marketing Analytics,https://www.linkedin.com/jobs/view/data-scient...,3840452767,The Select Group,https://www.linkedin.com/company/the-select-gr...,"Culver City, CA",2024-02-28
5,AI/ML Developer,https://www.linkedin.com/jobs/view/ai-ml-devel...,3840468785,PETADATA,https://www.linkedin.com/company/petadata?trk=...,"San Francisco, CA",2024-02-28
6,"Data Scientist, Autonomous Vehicle Infrastruct...",https://www.linkedin.com/jobs/view/data-scient...,3842698966,NVIDIA,https://www.linkedin.com/company/nvidia?trk=pu...,"California, United States",2024-02-29
7,AI/ML Developer with Python,https://www.linkedin.com/jobs/view/ai-ml-devel...,3840468767,PETADATA,https://www.linkedin.com/company/petadata?trk=...,"San Francisco, CA",2024-02-28
8,Machine Learning Engineer,https://www.linkedin.com/jobs/view/machine-lea...,3842845137,Probably Genetic,https://www.linkedin.com/company/probablygenet...,"San Francisco, CA",2024-02-29
9,Machine Learning Engineer,https://www.linkedin.com/jobs/view/machine-lea...,3842889761,orbit,https://ae.linkedin.com/company/weorbit-tech?t...,San Francisco Bay Area,2024-02-29


In [10]:
jobs_df['job_link'].tolist()

['https://www.linkedin.com/jobs/view/junior-data-scientist-at-flexon-technologies-inc-3842926205?refId=qnDD9xCUjN1AKpSlZjvfng%3D%3D&trackingId=dMA%2F3y9hvlf7ckxD4XfuaQ%3D%3D&position=1&pageNum=0&trk=public_jobs_jserp-result_search-card',
 'https://www.linkedin.com/jobs/view/junior-data-scientist-at-astrana-health-3793915483?refId=qnDD9xCUjN1AKpSlZjvfng%3D%3D&trackingId=uZQKeJJ1v7svc20goLUgcA%3D%3D&position=2&pageNum=0&trk=public_jobs_jserp-result_search-card',
 'https://www.linkedin.com/jobs/view/data-scientist-at-milestone-technologies-inc-3839533301?refId=qnDD9xCUjN1AKpSlZjvfng%3D%3D&trackingId=VO%2Ffn6l2Vme52LlhmFBjkw%3D%3D&position=3&pageNum=0&trk=public_jobs_jserp-result_search-card',
 'https://www.linkedin.com/jobs/view/machine-learning-engineer-at-tigergraph-3825424780?refId=qnDD9xCUjN1AKpSlZjvfng%3D%3D&trackingId=P%2F%2FGgacSZH4x27uhF8dsVA%3D%3D&position=4&pageNum=0&trk=public_jobs_jserp-result_search-card',
 'https://www.linkedin.com/jobs/view/data-scientist-marketing-analytic