# **Collecting Job Data Using APIs**


## Objectives


*   Collect job data.
*   Store the collected data into an excel spreadsheet.


## Dataset Used

The dataset used in this lab comes from the following source: [https://www.kaggle.com/promptcloud/jobs-on-naukricom](https://www.kaggle.com/promptcloud/jobs-on-naukricom?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDA0321ENSkillsNetwork21426264-2021-01-01) under the under a **Public Domain license**.



## Warm-Up Exercise


Using an API, let us find out who currently are on the International Space Station (ISS).<br> The API at [http://api.open-notify.org/astros.json](http://api.open-notify.org/astros.json?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDA0321ENSkillsNetwork21426264-2021-01-01&cm_mmc=Email_Newsletter-\_-Developer_Ed%2BTech-\_-WW_WW-\_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ) gives us the information of astronauts currently on ISS in json format.<br>
You can read more about this API at [http://open-notify.org/Open-Notify-API/People-In-Space/](http://open-notify.org/Open-Notify-API/People-In-Space?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDA0321ENSkillsNetwork21426264-2021-01-01&cm_mmc=Email_Newsletter-\_-Developer_Ed%2BTech-\_-WW_WW-\_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ)


In [None]:
import requests # you need this module to make an API call
import pandas as pd

In [None]:
api_url = "http://api.open-notify.org/astros.json" # this url gives use the astronaut data

In [None]:
response = requests.get(api_url) # Call the API using the get method and store the
                                # output of the API call in a variable called response.

In [None]:
if response.ok:             # if all is well() no errors, no network timeouts)
    data = response.json()  # store the result in json format in a variable called data
                            # the variable data is of type dictionary.

In [None]:
print(data)   # print the data just to check the output or for debugging

{'message': 'success', 'number': 11, 'people': [{'name': 'Raja Chari', 'craft': 'ISS'}, {'name': 'Tom Marshburn', 'craft': 'ISS'}, {'name': 'Kayla Barron', 'craft': 'ISS'}, {'name': 'Matthias Maurer', 'craft': 'ISS'}, {'name': 'Oleg Artemyev', 'craft': 'ISS'}, {'name': 'Denis Matveev', 'craft': 'ISS'}, {'name': 'Sergey Korsakov', 'craft': 'ISS'}, {'name': 'Kjell Lindgren', 'craft': 'ISS'}, {'name': 'Bob Hines', 'craft': 'ISS'}, {'name': 'Samantha Cristoforetti', 'craft': 'ISS'}, {'name': 'Jessica Watkins', 'craft': 'ISS'}]}


Print the number of astronauts currently on ISS.


In [None]:
print(data.get('number'))
print(data['people'])

11
[{'name': 'Raja Chari', 'craft': 'ISS'}, {'name': 'Tom Marshburn', 'craft': 'ISS'}, {'name': 'Kayla Barron', 'craft': 'ISS'}, {'name': 'Matthias Maurer', 'craft': 'ISS'}, {'name': 'Oleg Artemyev', 'craft': 'ISS'}, {'name': 'Denis Matveev', 'craft': 'ISS'}, {'name': 'Sergey Korsakov', 'craft': 'ISS'}, {'name': 'Kjell Lindgren', 'craft': 'ISS'}, {'name': 'Bob Hines', 'craft': 'ISS'}, {'name': 'Samantha Cristoforetti', 'craft': 'ISS'}, {'name': 'Jessica Watkins', 'craft': 'ISS'}]


Print the names of the astronauts currently on ISS.


In [None]:
astronauts = data.get('people')
print(f"There are {len(astronauts)} astronauts on ISS")
print("And their names are :")
for i, astronaut in enumerate(astronauts):
    #print(astronaut.get('name'))
    print(f'{i+1}: {astronaut["name"]}, craft: {astronaut["craft"]}')

There are 11 astronauts on ISS
And their names are :
1: Raja Chari, craft: ISS
2: Tom Marshburn, craft: ISS
3: Kayla Barron, craft: ISS
4: Matthias Maurer, craft: ISS
5: Oleg Artemyev, craft: ISS
6: Denis Matveev, craft: ISS
7: Sergey Korsakov, craft: ISS
8: Kjell Lindgren, craft: ISS
9: Bob Hines, craft: ISS
10: Samantha Cristoforetti, craft: ISS
11: Jessica Watkins, craft: ISS


# Collecting Jobs Data using Jobs API


### Objective: Determine the number of jobs currently open for various technologies  and for various locations


Collect the number of job postings for the following locations using the API:

*   Los Angeles
*   New York
*   San Francisco
*   Washington DC
*   Seattle
*   Austin
*   Detroit


In [None]:
#Import required libraries
import pandas as pd
import json

#### Write a function to get the number of jobs for the Python technology.<br>

> Note: While using the lab you need to pass the **payload** information for the **params** attribute in the form of **key** **value** pairs.

Refer the ungraded **rest api lab** in the course **Python for Data Science, AI & Development**  <a href="https://www.coursera.org/learn/python-for-applied-data-science-ai/ungradedLti/P6sW8/hands-on-lab-access-rest-apis-request-http?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDA0321ENSkillsNetwork21426264-2021-01-01">link</a>

##### The keys in the json are

*   Job Title

*   Job Experience Required

*   Key Skills

*   Role Category

*   Location

*   Functional Area

*   Industry

*   Role

You can also view  the json file contents  from the following <a href = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/jobs.json?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDA0321ENSkillsNetwork21426264-2021-01-01">json</a> URL.


In [None]:
#using json link

json_url="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/jobs.json?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDA0321ENSkillsNetwork21426264-2021-01-01"

def get_number_of_jobs_T(technology):
    #your code goes here
    number_of_jobs = 0
    df = pd.read_json(json_url)
    for skills in df['Key Skills']:
        if technology in skills:
            number_of_jobs += 1
    return number_of_jobs

#api_url="http://127.0.0.1:5000/data"
#def get_number_of_jobs_T(technology):
#    number_of_jobs = {technology:0}
#    #your code goes here
#    payload = {'description':technology,'page':1}           #parameters to search the api
#    while True:                                             #Loop to get the data of multiples pages
#        r = requests.get(api_url, params=payload)           #Response to be saved
#        if r.ok:                                            #to discard errors
#            data = r.json()                                 #retrieved data to be counted
#            if technology in data['Key Skills']:
#             number_of_jobs[technology] += 1     #count of data
#            payload["page"] = payload["page"] + 1           #go to the next page
#        else:
#            break                                           #exit to the loop in case the request is not "ok"
#        if len(data) != 50:
#            break                                           #exit the loop if the data is less than 50 elements, meaning that is the last page.
#    return number_of_jobs
#

Calling the function for Python and checking if it works.


In [None]:
print(get_number_of_jobs_T("Python"))

1173


#### Write a function to find number of jobs in US for a location of your choice


In [None]:
def get_number_of_jobs_L(location):
    number_of_jobs = 0
    df = pd.read_json(json_url)
    for city in df['Location']:
        if location in city:
            number_of_jobs += 1    
    #your coe goes here
    return location, number_of_jobs

Call the function for Los Angeles and check if it is working.


In [None]:
#your code goes here
location = 'Los Angeles'
print(get_number_of_jobs_L(location)[1])

# Question 3 - Of all the locations in the list below, which has the maximum number of job postings?
L= ["Los Angeles", "New York", "San Francisco", "Washington DC", "Seattle"]
def get_number_jobs_city(cities):
    number_of_jobs = {}
    df = pd.read_json(json_url)
    for city in cities:
        number_of_jobs[city] = 0
    for city in cities:
        for location in df['Location']:
            if location == city:
                number_of_jobs[city] += 1
    return number_of_jobs

print(get_number_jobs_city(L))

640
{'Los Angeles': 640, 'New York': 3226, 'San Francisco': 435, 'Washington DC': 5316, 'Seattle': 3375}


### Store the results in an excel file


Call the API for all the given technologies above and write the results in an excel spreadsheet.


Create a python list of all locations for which you need to find the number of jobs postings.


In [None]:
#your code goes here
locations = ['Los Angeles', 'New York', 'San Francisco', 'Washington DC', 'Seattle', 'Austin', 'Detroit']

Import libraries required to create excel spreadsheet


In [None]:
# your code goes here
from openpyxl import Workbook

Create a workbook and select the active worksheet


In [None]:
# your code goes here
wb = Workbook()
active_ws = wb.active
print(type(active_ws))

<class 'openpyxl.worksheet.worksheet.Worksheet'>


Find the number of jobs postings for each of the location in the above list.
Write the Location name and the number of jobs postings into the excel spreadsheet.


In [None]:
#your code goes here
job_postings = get_number_jobs_city(locations)
job_postings
col = 0
for city in job_postings:
    roww = 1
    col += 1
    c = active_ws.cell(row = roww, column = col)
    c.value = city
    c = active_ws.cell(row = roww+1, column = col)
    c.value = job_postings[city]
    
    

Save into an excel spreadsheet named 'job-postings.xlsx'.


In [None]:
#your code goes here
wb.save('job-postings.xlsx')  

#### In the similar way, you can try for below given technologies and results  can be stored in an excel sheet.


Collect the number of job postings for the following languages using the API:

*   C
*   C#
*   C++
*   Java
*   JavaScript
*   Python
*   Scala
*   Oracle
*   SQL Server
*   MySQL Server
*   PostgreSQL
*   MongoDB


In [None]:


def get_number_jobs_langs(languages):
    number_of_jobs = {}
    df = pd.read_json(json_url)
    for language in languages:
        number_of_jobs[language] = 0
    for language in languages:
        for skill in df['Key Skills']:
            if language.lower() in skill.lower():
                number_of_jobs[language] += 1
    return number_of_jobs



{'C': 25114,
 'C#': 526,
 'C++': 506,
 'Java': 3428,
 'JavaScript': 2248,
 'MongoDB': 208,
 'MySQL': 952,
 'Oracle': 899,
 'PostgreSQL': 86,
 'Python': 1173,
 'SQL Server': 423,
 'Scala': 138}

In [None]:
langs = ['C','C#', 'C++', 'Java', 'JavaScript', 'Python', 'Scala', 'Oracle', 'SQL Server', 'MySQL', 'PostgreSQL', 'MongoDB']
jobs_languages = get_number_jobs_langs(langs)

#creating a workbook
wb1 = Workbook()
active_ws1 = wb1.active
col = 0
for lang in langs:
  roww = 1
  col += 1
  c = active_ws1.cell(row = roww, column = col)
  c.value = lang
  c = active_ws1.cell(row = roww+1, column = col)
  c.value = jobs_languages[lang]

#saving
wb1.save('job-postings-langs.xlsx')


## Author


Ayushi Jain


### Other Contributors


Rav Ahuja

Lakshmi Holla

Malika


## Change Log


| Date (YYYY-MM-DD) | Version | Changed By        | Change Description                 |
| ----------------- | ------- | ----------------- | ---------------------------------- |
| 2022-01-19        | 0.3     | Lakshmi Holla     | Added changes in the markdown      |
| 2021-06-25        | 0.2     | Malika            | Updated GitHub job json link       |
| 2020-10-17        | 0.1     | Ramesh Sannareddy | Created initial version of the lab |


Copyright © 2022 IBM Corporation. All rights reserved.
