## LinkedIn Jobs API Demo

This notebook details a local demonstration on how to use the web-scraping Scraping Dog API to access LinkedIn job postings. Using a web-scraping API is easy to use (no CAPTCHAs), bypasses proxy limits, and avoids any legal trouble. However, we are limited by API rates and the current functionality of the API. Official documentation can be found with this [link](https://docs.scrapingdog.com/linkedin-jobs-scraper).

In [2]:
import pandas as pd

import get_linkedin_jobs_data

You will have to write your own `secrets.toml` file containing your own Scraping Dog API key if you wish to use our custom function. Otherwise, you can simply set the `SCRAPING_DOG_API_KEY` to your key as a string. The Scraping Dog API key obtained in the [member area](https://api.scrapingdog.com/dashboard/65cdb16ffac36d5508c81b26).

In [3]:
# Step 1: Obtain API key.
SCRAPING_DOG_API_KEY = get_linkedin_jobs_data.get_scraping_dog_api_key()

# Alternatively, uncomment this line and insert your own API key.
# SCRAPING_DOG_API_KEY = "" 

In [4]:
# Step 2: Grab job postings.
data = get_linkedin_jobs_data.fetch_jobs_from_page(
    api_key=SCRAPING_DOG_API_KEY,
    field='data scientist',
    geoid='102095887',
    page=1,
    sort_by='day' # Optional parameter to sort jobs based on their posting date: {`day`, `week`, `month`}
)

In [12]:
# do web scraping ourselves and obtain the job description
# filter on the years of experience required
for job in data:
    job_link = job['job_link']
    
    

https://www.linkedin.com/jobs/view/data-scientist-marketing-at-notion-3834722481?position=1&pageNum=0&refId=vnasdgjLV%2FNfzYwIMcV5uQ%3D%3D&trackingId=XLj6LIm3ra15oXYfbJcRfw%3D%3D&trk=public_jobs_jserp-result_search-card
https://www.linkedin.com/jobs/view/data-scientist-product-at-notion-3834718979?position=2&pageNum=0&refId=vnasdgjLV%2FNfzYwIMcV5uQ%3D%3D&trackingId=Nd4okBLYgH9GoLnrHQNxNQ%3D%3D&trk=public_jobs_jserp-result_search-card
https://www.linkedin.com/jobs/view/data-scientist-monetization-at-notion-3834718980?position=3&pageNum=0&refId=vnasdgjLV%2FNfzYwIMcV5uQ%3D%3D&trackingId=tRk%2FAfyXTXOVa57QNnFhBA%3D%3D&trk=public_jobs_jserp-result_search-card
https://www.linkedin.com/jobs/view/machine-learning-engineer-at-tigergraph-3835748265?position=4&pageNum=0&refId=vnasdgjLV%2FNfzYwIMcV5uQ%3D%3D&trackingId=%2FypOvaA70rkifznAF%2Bkpkw%3D%3D&trk=public_jobs_jserp-result_search-card
https://www.linkedin.com/jobs/view/biology-scientist-at-pacer-staffing-llc-3834753039?position=5&pageNum=0&r

In [7]:
jobs_df = pd.DataFrame(data)

In [9]:
jobs_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 7 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   job_position      10 non-null     object
 1   job_link          10 non-null     object
 2   job_id            10 non-null     object
 3   company_name      10 non-null     object
 4   company_profile   10 non-null     object
 5   job_location      10 non-null     object
 6   job_posting_date  10 non-null     object
dtypes: object(7)
memory usage: 688.0+ bytes
