<p style="text-align:center">
    <a href="https://skills.network" target="_blank">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="200" alt="Skills Network Logo">
    </a>
</p>


# **Collecting Job Data Using APIs**


Estimated time needed: **30** minutes


## Objectives


After completing this lab, you will be able to:


*   Collect job data using Jobs API
*   Store the collected data into an excel spreadsheet.


><strong>Note: Before starting with the assignment make sure to read all the instructions and then move ahead with the coding part.</strong>


#### Instructions


To run the actual lab, firstly you need to click on the [Jobs_API](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/Jobs_API.ipynb) notebook link. The file contains flask code which is required to run the Jobs API data.

Now, to run the code in the file that opens up follow the below steps.

Step1: Download the file. 

Step2: Upload the file into your current Jupyter environment using the upload button in your Jupyter interface. Ensure that the file is in the same folder as your working .ipynb file.

Step 2: If working in a local Jupyter environment, use the "Upload" button in your Jupyter interface to upload the Jobs_API notebook into the same folder as your current .ipynb file.

<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/Upload.PNG">

Step3:  Open the Jobs_API notebook, and run all the cells to start the Flask application. Once the server is running, you can access the API from the URL provided in the notebook.

If you want to learn more about flask, which is optional, you can click on this link [here](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/FLASK_API.md.html).

Once you run the flask code, you can start with your assignment.


## Dataset Used in this Assignment

The dataset used in this lab comes from the following source: https://www.kaggle.com/promptcloud/jobs-on-naukricom under the under a **Public Domain license**.

> Note: We are using a modified subset of that dataset for the lab, so to follow the lab instructions successfully please use the dataset provided with the lab, rather than the dataset from the original source.

The original dataset is a csv. We have converted the csv to json as per the requirement of the lab.


## Warm-Up Exercise


Before you attempt the actual lab, here is a fully solved warmup exercise that will help you to learn how to access an API.


Using an API, let us find out who currently are on the International Space Station (ISS).<br> The API at [http://api.open-notify.org/astros.json](http://api.open-notify.org/astros.json?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDA0321ENSkillsNetwork21426264-2021-01-01&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ) gives us the information of astronauts currently on ISS in json format.<br>
You can read more about this API at [http://open-notify.org/Open-Notify-API/People-In-Space/](http://open-notify.org/Open-Notify-API/People-In-Space?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDA0321ENSkillsNetwork21426264-2021-01-01&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ)


In [1]:
import requests # you need this module to make an API call
import pandas as pd

In [2]:
api_url = "http://api.open-notify.org/astros.json" # this url gives use the astronaut data

In [3]:
response = requests.get(api_url) # Call the API using the get method and store the
                                # output of the API call in a variable called response.

In [4]:
if response.ok:             # if all is well() no errors, no network timeouts)
    data = response.json()  # store the result in json format in a variable called data
                            # the variable data is of type dictionary.

In [5]:
print(data)   # print the data just to check the output or for debugging

{'people': [{'craft': 'ISS', 'name': 'Oleg Kononenko'}, {'craft': 'ISS', 'name': 'Nikolai Chub'}, {'craft': 'ISS', 'name': 'Tracy Caldwell Dyson'}, {'craft': 'ISS', 'name': 'Matthew Dominick'}, {'craft': 'ISS', 'name': 'Michael Barratt'}, {'craft': 'ISS', 'name': 'Jeanette Epps'}, {'craft': 'ISS', 'name': 'Alexander Grebenkin'}, {'craft': 'ISS', 'name': 'Butch Wilmore'}, {'craft': 'ISS', 'name': 'Sunita Williams'}, {'craft': 'Tiangong', 'name': 'Li Guangsu'}, {'craft': 'Tiangong', 'name': 'Li Cong'}, {'craft': 'Tiangong', 'name': 'Ye Guangfu'}], 'number': 12, 'message': 'success'}


Print the number of astronauts currently on ISS.


In [6]:
print(data.get('number'))

12


Print the names of the astronauts currently on ISS.


In [7]:
astronauts = data.get('people')
print("There are {} astronauts on ISS".format(len(astronauts)))
print("And their names are :")
for astronaut in astronauts:
    print(astronaut.get('name'))

There are 12 astronauts on ISS
And their names are :
Oleg Kononenko
Nikolai Chub
Tracy Caldwell Dyson
Matthew Dominick
Michael Barratt
Jeanette Epps
Alexander Grebenkin
Butch Wilmore
Sunita Williams
Li Guangsu
Li Cong
Ye Guangfu


Hope the warmup was helpful. Good luck with your next lab!


## Lab: Collect Jobs Data using Jobs API


### Objective: Determine the number of jobs currently open for various technologies  and for various locations


Collect the number of job postings for the following locations using the API:

* Los Angeles
* New York
* San Francisco
* Washington DC
* Seattle
* Austin
* Detroit


In [8]:
#Import required libraries
import pandas as pd
import json
import requests


https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/jobs.json#### Write a function to get the number of jobs for the Python technology.<br>
> Note: While using the lab you need to pass the **payload** information for the **params** attribute in the form of **key** **value** pairs.
  Refer the ungraded **rest api lab** in the course **Python for Data Science, AI & Development**  <a href="https://www.coursera.org/learn/python-for-applied-data-science-ai/ungradedLti/P6sW8/hands-on-lab-access-rest-apis-request-http?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDA0321ENSkillsNetwork928-2022-01-01">link</a>
  
 ##### The keys in the json are 
 * Job Title
 
 * Job Experience Required
 
 * Key Skills
 
 * Role Category
 
 * Location
 
 * Functional Area
 
 * Industry
 
 * Role 
 
You can also view  the json file contents  from the following <a href = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/jobs.json">json</a> URL.



In [14]:
api_url="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/jobs.json"
import re

# I added the following due to the checklist.
## Note to self: Add a post request to the end of get_number of technologies look at Python for Data Science Ai and Development.
## Used the notes 5 API and Data Collection
def get_number_of_jobs_T1(technology):
    url_post = "http://httpbin.org/post"
    response = requests.get(url_post)
    payload = {"Key Skills": technology}
    r_post = requests.post(url_post, data = payload)    
    if response.ok:
        data = response.json()
    print("This is the post request: ", r_post)
    print("This is the get request: ", response)
    print(r_post.json()['form'])
    

def get_number_of_jobs_T(technology):
    #your code goes here
    # Used link to answer this.
    # Used previous lab as well.
    ## Used data and notes from other courses like Python for Data Science, Ai and Development for this. It seems like the payload is not being taken into account.
    ## Stopped here. Code works without error but it doesn't get the job info for only one key skill. 
    ## Seems like the payload isn't working or the params. I tried replacing Key Skills with something else and the result was the same when it shouldn't have been.
    ## Find a way so that this takes into account that it doesn't just have to have technology but other things included too.
    ## Used notes from a previous course for this one and an old lab specifically the String Operations one.
    ### The code below now works.
    payload = {"Key Skills": technology}
    #print(type(payload))    # This is just a check. a
    #print(type(technology))   # This is also a check.
    response1 = requests.get(api_url, params = payload)
    if response1.ok:
        data = response1.json()
    #print(data[0])    # This is just a check nothing more.
    #print(len(data))
    #skills = data.get('Key Sklls') 
    #print(skills.head())
    
    ## This is because for some odd reason the params aren't really working on my end.
    data1 = []
    pattern = technology
    #result = re.search(pattern, data[0]["Key Skills"]) ### This part is to check for a few things.
    #print(result)     # This is to check for a few things.
    
    ## This part I made specifically for C or C++
    #x = data[0]["Key Skills"].split()
    #print(x)
    if(pattern == "C" or pattern == "C++"):
        for i in range(0, len(data)):
            x = data[i]["Key Skills"].split()
            if(pattern == "C"):
                for j in range(0, len(x)):
                    if x[j] == pattern:
                        data1.append(data[i])
                    elif x[j] == "c" or x[j] == "c|":
                        data1.append(data[i])
                    elif x[j] == "C|":
                        data1.append(data[i])
                    else:
                        continue
            else:
                for j in range(0, len(x)):
                    if x[j] == pattern:
                        data1.append(data[i])
                    elif x[j] == "c++" or x[j] == "c|":
                        data1.append(data[i])
                    elif x[j] == "C++|":
                        data1.append(data[i])
                    else:
                        continue
    elif(pattern == "Java"):
        for i in range(0, len(data)):
            x = data[i]["Key Skills"].split()
            for j in range(0, len(x)):
                if x[j] == pattern:
                    data1.append(data[i])
                elif x[j] == "java" or x[j] == "java|":
                    data1.append(data[i])
                elif x[j] == "Java|":
                    data1.append(data[i])
                else:
                    continue
    else:
        for i in range(0,len(data)):
            result = re.search(pattern, data[i]["Key Skills"])
            if result:
                data1.append(data[i])
            else:
                continue
    
    #print(len(data1)) # This is just a check to make sure things worked.
    #print(data1[0])      # This is also a check.
    number_of_jobs = len(data1)
            
    #print(len(data1)) # This is just a check to make sure things worked.
    #print(x)  # This is a check.
    
    # The line below was there before I just commented it out for debugging purposes.
    return technology,number_of_jobs
    #return data1[0]   # This line is just for testing.


Calling the function for Python and checking if it works.


In [15]:
get_number_of_jobs_T("Python")
get_number_of_jobs_T1("Python")
# These are some other checks that I added.
#get_number_of_jobs_T("Java")            # May have to work on this later.
#get_number_of_jobs_T("JavaScript")
#### Trying to figure out how to write C++ for a later part of the project.
#string = "C++"
#print(string)
#get_number_of_jobs_T(string)    
#print(x)
# Trying to figure out how to write C++ and C for the function I made. Used the JSON file. 
# Best I could do was just get it to search for C+ since for all I know there is no C+ language. Doesn't actually work since it didn't take a job with the coding langauge C.
#
# Below is how I got my code to work for C and C++ using what was in the original code. Used Hands-On Lab: String Operations from Python Basics, Data Science with AI and Python to solve this.
#result = re.search(string, x["Key Skills"])
#print(x["Key Skills"].find(string))
#if result == None:
#    print("Not Found")
#else:
#    print("Found")

This is the post request:  <Response [200]>
This is the get request:  <Response [405]>
{'Key Skills': 'Python'}


#### Write a function to find number of jobs in US for a location of your choice


In [16]:
#api_url1="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/jobs.json/get"

def get_number_of_jobs_L(location):
    # This is my answer.
    ## Used data and notes from other courses like Python for Data Science, Ai and Development for this. It seems like the payload is not being taken into account.
    ## Already tried a for loop to get the correct location and save the answer but it didn't work.
    ### The code below now works.
    payload1 = {"Location": location}
    #print(payload1)
    response2 = requests.get(api_url, params = payload1)
    #print(response2.url)     # This part is also a check.
    if response2.ok:
        data = response2.json()
    #print(data[2000])  # This is just a check.
    #print(len(data))  # This is a check.
    #print(data[0]["Location"])  # This was for a plan I had.
    #number_of_jobs = len(data)
    
    ## This is because for some odd reason the params aren't really working on my end.
    data1 = []
    for i in range(0,len(data)):
        if data[i]["Location"] == location:
            data1.append(data[i])
        else:
            continue
    #print(len(data1)) # This is just a check to make sure things worked.
    #print(data1[3])  # This is just a check.
    number_of_jobs = len(data1)
    
    ## This is just a check.
    ## Note: The data has a total of 27005 total entries.
    #response2 = requests.get(api_url)
    #if response2.ok:
    #    data = response2.json()
    #print(len(data))
    #####
    
    #your coe goes here The line below was there before it's just commented out for debugging purposes.
    return location,number_of_jobs

# Add a function here that can find the number of job postings for a job in a specific location. This doesn't work.
#def get_number_of_jobs_TL(technology, location):
#    # Same problem as before it seems to take every entry.
#    payload = {"Technology":technology, "Location":location}
#    response = requests.get(api_url, params = payload)
#    if response.ok:
#        data = response.json()
#    print(data.get({"Key Skills"}))
    
    

Call the function for Los Angeles and check if it is working.


In [18]:
#your code goes here
#get_number_of_jobs_L("Los Angeles")
get_number_of_jobs_L("Los Angeles")  # This is for getting the number of jobs for both technology and location.

('Los Angeles', 640)

### Store the results in an excel file


Call the API for all the given technologies above and write the results in an excel spreadsheet.


If you do not know how create excel file using python, double click here for **hints**.

<!--

from openpyxl import Workbook        # import Workbook class from module openpyxl
wb=Workbook()                        # create a workbook object
ws=wb.active                         # use the active worksheet
ws.append(['Country','Continent'])   # add a row with two columns 'Country' and 'Continent'
ws.append(['Eygpt','Africa'])        # add a row with two columns 'Egypt' and 'Africa'
ws.append(['India','Asia'])          # add another row
ws.append(['France','Europe'])       # add another row
wb.save("countries.xlsx")            # save the workbook into a file called countries.xlsx


-->


Create a python list of all technologies for which you need to find the number of jobs postings.


In [19]:
#your code goes here
## Used hints and the json list.
list_T = ["Python", "C", "C#", "C++", "Java", "JavaScript", "Scala", "Oracle", "SQL Server", "MySQL Server", "PostgreSQL", "MongoDB",
          "web technologies", "digital", "Electronics", "Git", "Azure", "Splunk", "xml", "website", "Google analytics", "email", 
          "facebook", "media"]  # Note to self: Add more if you feel like there are more technolgies you need to find.

Import libraries required to create excel spreadsheet


In [20]:
# your code goes here
# Used the hint and notes from _Data Analysis with Python course,Data Visualization with Python and Python for Data Science AI and Development__ for this.
!pip install openpyxl
from openpyxl import Workbook


Collecting openpyxl
  Downloading openpyxl-3.1.3-py2.py3-none-any.whl (251 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m251.3/251.3 kB[0m [31m19.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting et-xmlfile (from openpyxl)
  Downloading et_xmlfile-1.1.0-py3-none-any.whl (4.7 kB)
Installing collected packages: et-xmlfile, openpyxl
Successfully installed et-xmlfile-1.1.0 openpyxl-3.1.3


Create a workbook and select the active worksheet


In [21]:
# your code goes here
# Used the hint and notes from _Data Analysis with Python course, Data Visualization course and Python for Data Science AI and Development_ for this.
wb = Workbook()
ws = wb.active


Find the number of jobs postings for each of the technology in the above list.
Write the technology name and the number of jobs postings into the excel spreadsheet.


In [22]:
#your code goes here
# Used the hint and notes from _Data Analysis with Python course, Data Visualization course and Python for Data Science AI and Development_ for this.
# Also used the JSON file.
#get_number_of_jobs_T(list_T[0])  # This is a check.
### Stopped here. Run this after you make the code for the last part.
for i in range(0, len(list_T)):
    ws.append(get_number_of_jobs_T(list_T[i]))
    #print(get_number_of_jobs_T(list_T[i]))
    
# This part is a check. Having trouble with getting this to identify C++ form the string.
#get_number_of_jobs_T(list_T[3])
#x = get_number_of_jobs_T("C++")



Save into an excel spreadsheet named **job-postings.xlsx**.


In [23]:
#your code goes here
# Used the hint and notes from _Data Analysis with Python course, Data Visualization course and Python for Data Science AI and Development_ for this.
wb.save("job-postings.xlsx")
wb.save("github-job-postings.xlsx")


#### In the similar way, you can try for below given technologies and results  can be stored in an excel sheet.


Collect the number of job postings for the following languages using the API:

*   C
*   C#
*   C++
*   Java
*   JavaScript
*   Python
*   Scala
*   Oracle
*   SQL Server
*   MySQL Server
*   PostgreSQL
*   MongoDB


In [24]:
# your code goes here
# Used the hint and notes from _Data Analysis with Python course, Data Visualization course and Python for Data Science AI and Development_ for this.
wb1 = Workbook()
ws1 = wb1.active

# These are for the technologies that are needed.
list_T1 = ["C", "C#", "C++", "Java", "JavaScript", "Python", "Scala", "Oracle", "SQL Server", "MySQL Server", "PostgreSQL", "MongoDB"]

# This is appending the job postings to an excel spread sheet.
for i in range(0, len(list_T1)):
    ws1.append(get_number_of_jobs_T(list_T1[i]))
    
# This is for saving it to another file.
wb1.save("job-postings1.xlsx")
wb1.save("github-job-postings.xlsx")
    


## Authors


Ayushi Jain


### Other Contributors


Rav Ahuja

Lakshmi Holla

Malika


Copyright © IBM Corporation.


<!--## Change Log


<!--| Date (YYYY-MM-DD) | Version | Changed By        | Change Description                 |
| ----------------- | ------- | ----------------- | ---------------------------------- | 
| 2022-01-19        | 0.3     | Lakshmi Holla        | Added changes in the markdown      |
| 2021-06-25        | 0.2     | Malika            | Updated GitHub job json link       |
| 2020-10-17        | 0.1     | Ramesh Sannareddy | Created initial version of the lab |--!>
