# **Collecting Job Data Using APIs**


## Objectives


After completing this lab, you will be able to:


*   Collect job data from GitHub Jobs API
*   Store the collected data into an excel spreadsheet.


To run the actual lab, firstly we need to click on the [Jobs_API](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/Jobs_API.ipynb) notebook link. The file contains flask code which is required to run the Jobs API data.


## Dataset Used in this Assignment

The dataset used comes from the following source: https://www.kaggle.com/promptcloud/jobs-on-naukricom under the under a **Public Domain license**.
 

let's find the number of job postings for the following locations using the API:

* Los Angeles
* New York
* San Francisco
* Washington DC
* Seattle
* Austin
* Detroit


In [24]:
#Import required libraries
import pandas as pd
import json


In [9]:
# function to collect job_postings
def get_number_of_jobs_T(technology):
    api_url="http://127.0.0.1:5000/data"
    payload = { "Key Skills":technology}
    response = requests.get(api_url, params=payload)
    response.raise_for_status()
    data = response.json()
    job_postings=len(data)

    return technology,job_postings

Called the function for Python and checking if it works.


In [25]:
get_number_of_jobs_T("Python")

('Python', 1173)

 Created a function to find number of jobs in US for a location of one's choice


In [26]:
def get_number_of_jobs_L(locations):
    payload = {"Location": locations}
    response = requests.get(api_url, params=payload)
    response.raise_for_status()
    data = response.json()
    job_postings = len(data)
    return locations, job_postings

In [43]:
get_number_of_jobs_L("Los Angeles")

('Los Angeles', 3)

 Now let's Store the results in an excel file


In [44]:
# python list of all techs for which we need to find the number of jobs postings
locations = ["Los Angeles", "New York", "San Francisco", "Washington DC", "Seattle", "Austin", "Detroit"]

In [45]:
# required  libraries are imported to create excel spreadsheet.
%pip install openpyxl
from openpyxl import Workbook

Note: you may need to restart the kernel to use updated packages.


Create a workbook and select the active worksheet


In [47]:
#  a new Excel file
wb = Workbook()
ws = wb.active

# the header row
ws['A1'] = 'Job Title'
ws['B1'] = 'Job Experience Required'
ws['C1'] = 'Key Skills'
ws['D1'] = 'Role Category'
ws['E1'] = 'Location'
ws['F1'] = 'Functional Area'
ws['G1'] = 'Industry'
ws['H1'] = 'Role'

Now let's find the number of jobs postings for each of the technology in the above list and save the technology name &  number of jobs postings into the excel spreadsheet.


In [48]:
def get_job_listings(state):
    api_url="http://127.0.0.1:5000/data"
    params = {
        "Location": state
    }
    try:
        response = requests.get(api_url, params=params)
        job_listings = response.json()
        return job_listings
        
    except requests.RequestException as e:
        print(f"Error making API request: {e}")
        return []

In [49]:
for state in locations:
    job_listings =  get_job_listings(state)
    print(state,len(job_listings))
    for job in job_listings:
        ws.append([job['Job Title'], job['Job Experience Required'], job['Key Skills'], job['Role Category'], job['Location'], job['Functional Area'], job['Industry'], job['Role']])

Los Angeles 640
New York 3226
San Francisco 435
Washington DC 5316
Seattle 3375
Austin 434
Detroit 3945


Save into an excel spreadsheet named 'github-job-postings.xlsx'.

In [50]:
wb.save('github-job-postings.xlsx')

Similiarly Collected the number of job postings for the following languages using the API:

*   C
*   C#
*   C++
*   Java
*   JavaScript
*   Python
*   Scala
*   Oracle
*   SQL Server
*   MySQL Server
*   PostgreSQL
*   MongoDB


In [51]:
tech=["C","C#","C++","Java","JavaScript","Python","Scala","Oracle","SQL Server","MySQL Server","PostgreSQL","MongoDB"]


# Create a new worksheet
ws = wb.create_sheet("MyNewSheet")
# Set the header row
ws['A1'] = 'Job Title'
ws['B1'] = 'Job Experience Required'
ws['C1'] = 'Key Skills'
ws['D1'] = 'Role Category'
ws['E1'] = 'Location'
ws['F1'] = 'Functional Area'
ws['G1'] = 'Industry'
ws['H1'] = 'Role'

In [52]:
def get_number_of_jobs(technology):
    api_url="http://127.0.0.1:5000/data"
    payload = { "Key Skills":technology}
    response = requests.get(api_url, params=payload)
    response.raise_for_status()
    job_postings = response.json()
    #your code goes here
    return job_postings

In [53]:
for i in range(len(tech)):
    job_listings = get_number_of_jobs(tech[i])
    print(tech[i], len(job_listings))
    for job in job_listings:
        ws.append([job['Job Title'], job['Job Experience Required'], job['Key Skills'], job['Role Category'], job['Location'], job['Functional Area'], job['Industry'], job['Role']])

C 13498
C# 333
C++ 305
Java 2609
JavaScript 355
Python 1173
Scala 33
Oracle 784
SQL Server 250
MySQL Server 0
PostgreSQL 10
MongoDB 174


In [54]:
wb.save('github-job-postings.xlsx')

<!--| Date (YYYY-MM-DD) | Version | Changed By        | Change Description                 |
| ----------------- | ------- | ----------------- | ---------------------------------- | 
| 2022-01-19        | 0.3     | Lakshmi Holla        | Added changes in the markdown      |
| 2021-06-25        | 0.2     | Malika            | Updated GitHub job json link       |
| 2020-10-17        | 0.1     | Ramesh Sannareddy | Created initial version of the lab |--!>
