# **Collecting Data Using APIs**


In this notebook, we will:
*   Collect job data from Jobs API
*   Store the collected data into an excel spreadsheet.


**Dataset Used in this Assignment**

The dataset used in this notebook comes from the following source: [https://www.kaggle.com/promptcloud/jobs-on-naukricom](https://www.kaggle.com/promptcloud/jobs-on-naukricom?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDA0321ENSkillsNetwork21426264-2022-01-01) under the under a **Public Domain license**.


The original dataset is a csv. But it's already converted to json.

### Collecting Jobs Data using Jobs API


We are going to determine the number of jobs currently open for various technologies  and for various locations


First, we will collect the number of job postings for the following locations using the API:

*   New York
*   Los Angeles
*   San Francisco
*   Washington DC
*   Seattle
*   Austin
*   Detroit


In [1]:
#We import required libraries
import pandas as pd
import json
import requests

A function to get the number of jobs for the Python technology.<br>


##### The keys in the json are

*   Job Title

*   Job Experience Required

*   Key Skills

*   Role Category

*   Location

*   Functional Area

*   Industry

*   Role

We can also view  the json file contents  from the following <a href = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/jobs.json?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDA0321ENSkillsNetwork21426264-2022-01-01">json</a> URL.


In [2]:
api_url="http://127.0.0.1:5000/data/all"
def get_number_of_jobs_T(technology):
    
    response_api = requests.get(api_url)

    if response_api.ok:            
        jobs = response_api.json()
        
    number_of_jobs = 0
    
    for job in jobs:
        key = job.get('Key Skills')

        if key.find(technology) > -1 :
            number_of_jobs = number_of_jobs + 1

    return technology,number_of_jobs

Let's call the function for Python and checking if it works.


In [3]:
get_number_of_jobs_T("Python")

('Python', 1173)

We write a function to find number of jobs in US for a location of your choice


In [4]:
def get_number_of_jobs_L(location):
    
    response_api = requests.get(api_url)

    if response_api.ok:            
        jobs = response_api.json()

    number_of_jobs = 0
    for job in jobs:
        loc = job.get('Location')
    
        if loc.find(location) > -1 :
            loc.find(location)
            number_of_jobs = number_of_jobs + 1

    return location,number_of_jobs

Let's call the function for New York and check if it is working.


In [5]:
get_number_of_jobs_L("New York")

('New York', 3226)

### Saving the data

In [6]:
# List of locations for which to find the number of job postings
locations = ['New York', 'Los Angeles', 'San Francisco', 'Washington DC', 'Seattle', 'Austin', 'Detroit']

# List of languages for which to find the number of job postings
languages = ['C', 'C#', 'C++', 'Java', 'JavaScript', 'Python', 'Scala', 'Oracle', 'SQL Server', 'MySQL Server', 'PostgreSQL', 'MongoDB']

In [7]:
# Creating a dataframe to store the results
job_loc = []
job_lang = []

# Collecting data for locations
for location in locations:
    job_loc.append(get_number_of_jobs_L(location))

# Collecting data for languages
for language in languages:
    job_lang.append(get_number_of_jobs_T(language))

In [8]:
df1 = pd.DataFrame(job_loc, columns=['Location', 'Number of Jobs'])
df2 = pd.DataFrame(job_lang, columns=['Language', 'Number of Jobs'])

In [9]:
df1.to_csv('1.1 job_postings_data_location.csv', index=False)
df2.to_csv('1.1 job_postings_data_language.csv', index=False)