<p style="text-align:center">
    <a href="https://skills.network" target="_blank">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="200" alt="Skills Network Logo">
    </a>
</p>


# **Collecting Job Data Using APIs**


Estimated time needed: **45 to 60** minutes


## Objectives


After completing this lab, you will be able to:


*   Collect job data from GitHub Jobs API
*   Store the collected data into an excel spreadsheet.


><strong>Note: Before starting with the assignment make sure to read all the instructions and then move ahead with the coding part.</strong>


#### Instructions


To run the actual lab, firstly you need to click on the [Jobs_API](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/Jobs_API.ipynb) notebook link. The file contains flask code which is required to run the Jobs API data.

Now, to run the code in the file that opens up follow the below steps.

Step1: Download the file. 

Step2: Upload it on the IBM Watson studio. (If IBM Watson Cloud service does not work in your system, follow the alternate Step 2 below)

Step2(alternate): Upload it in your SN labs environment using the upload button which is highlighted in red in the image below:
Remember to upload this Jobs_API file in the same folder as your current .ipynb file

<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/Upload.PNG">

Step3:  Run all the cells of the Jobs_API file. (Even if you receive an asterik sign after running the last cell, the code works fine.)

If you want to learn more about flask, which is optional, you can click on this link [here](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/FLASK_API.md.html).

Once you run the flask code, you can start with your assignment.


## Dataset Used in this Assignment

The dataset used in this lab comes from the following source: https://www.kaggle.com/promptcloud/jobs-on-naukricom under the under a **Public Domain license**.

> Note: We are using a modified subset of that dataset for the lab, so to follow the lab instructions successfully please use the dataset provided with the lab, rather than the dataset from the original source.

The original dataset is a csv. We have converted the csv to json as per the requirement of the lab.


## Warm-Up Exercise


Before you attempt the actual lab, here is a fully solved warmup exercise that will help you to learn how to access an API.


Using an API, let us find out who currently are on the International Space Station (ISS).<br> The API at [http://api.open-notify.org/astros.json](http://api.open-notify.org/astros.json?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDA0321ENSkillsNetwork21426264-2021-01-01&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ) gives us the information of astronauts currently on ISS in json format.<br>
You can read more about this API at [http://open-notify.org/Open-Notify-API/People-In-Space/](http://open-notify.org/Open-Notify-API/People-In-Space?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDA0321ENSkillsNetwork21426264-2021-01-01&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ)


In [1]:
import requests # you need this module to make an API call
import pandas as pd

In [2]:
api_url = "http://api.open-notify.org/astros.json" # this url gives use the astronaut data

In [3]:
response = requests.get(api_url) # Call the API using the get method and store the
                                # output of the API call in a variable called response.

In [6]:
if response.ok:             # if all is well() no errors, no network timeouts)
    data = response.json()  # store the result in json format in a variable called data
                            # the variable data is of type dictionary.

In [9]:
print(data)   # print the data just to check the output or for debugging

{'people': [{'craft': 'ISS', 'name': 'Oleg Kononenko'}, {'craft': 'ISS', 'name': 'Nikolai Chub'}, {'craft': 'ISS', 'name': 'Tracy Caldwell Dyson'}, {'craft': 'ISS', 'name': 'Matthew Dominick'}, {'craft': 'ISS', 'name': 'Michael Barratt'}, {'craft': 'ISS', 'name': 'Jeanette Epps'}, {'craft': 'ISS', 'name': 'Alexander Grebenkin'}, {'craft': 'ISS', 'name': 'Butch Wilmore'}, {'craft': 'ISS', 'name': 'Sunita Williams'}, {'craft': 'Tiangong', 'name': 'Li Guangsu'}, {'craft': 'Tiangong', 'name': 'Li Cong'}, {'craft': 'Tiangong', 'name': 'Ye Guangfu'}], 'number': 12, 'message': 'success'}


Print the number of astronauts currently on ISS.


In [None]:
print(data.get('number'))

Print the names of the astronauts currently on ISS.


In [None]:
astronauts = data.get('people')
print("There are {} astronauts on ISS".format(len(astronauts)))
print("And their names are :")
for astronaut in astronauts:
    print(astronaut.get('name'))

Hope the warmup was helpful. Good luck with your next lab!


## Lab: Collect Jobs Data using GitHub Jobs API


### Objective: Determine the number of jobs currently open for various technologies  and for various locations


Collect the number of job postings for the following locations using the API:

* Los Angeles
* New York
* San Francisco
* Washington DC
* Seattle
* Austin
* Detroit


In [23]:
%reset

Once deleted, variables cannot be recovered. Proceed (y/[n])?  y


#### Write a function to get the number of jobs for the Python technology.<br>
> Note: While using the lab you need to pass the **payload** information for the **params** attribute in the form of **key** **value** pairs.
  Refer the ungraded **rest api lab** in the course **Python for Data Science, AI & Development**  <a href="https://www.coursera.org/learn/python-for-applied-data-science-ai/ungradedLti/P6sW8/hands-on-lab-access-rest-apis-request-http?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDA0321ENSkillsNetwork928-2022-01-01">link</a>
  
 ##### The keys in the json are 
 * Job Title
 
 * Job Experience Required
 
 * Key Skills
 
 * Role Category
 
 * Location
 
 * Functional Area
 
 * Industry
 
 * Role 
 
You can also view  the json file contents  from the following <a href = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/jobs.json">json</a> URL.



In [None]:
api_url="http://127.0.0.1:5000/data"
def get_number_of_jobs_T(technology):
    
    #your code goes here
    return technology,number_of_jobs

Calling the function for Python and checking if it works.


In [None]:
get_number_of_jobs_T("Python")

#### Write a function to find number of jobs in US for a location of your choice


In [3]:
import pandas as pd
import requests
import json
api_url="http://127.0.0.1:5000/data/all"
response = requests.get(api_url)
data = {}
if response.ok:
    print('encoding is ', response.encoding)
    print('response.headers=', response.headers)
    print('response.request.headers=', response.request.headers)
    print('type of response.text = ', type(response.text))
    print('type of response.content = ', type(response.content))
    print('type of RESPONSE.JSON() = ', type(response.json()))  # it says type list
    data = response.json()
    # print("r.json()['args'] is ", response.json()['args'])
    dicy = dict(enumerate(data))  # # turn list into dictionary
    print('type of dicy is ', type(dicy))
    #print('DICT = ', dicy, '\n')  # THIS ONE WORKS, but looks like a long string
    '''
    {0: {'Functional Area': 'Marketing , Advertising , MR , PR , Media Planning', 'Id': 0, 'Industry': 'Advertising, PR, MR, Event Managemen', 'Job Experience Required': '5 - 10 yrs', 'Job Title': 'Digital Media Planner', 'Key Skills': 'Media Planning| Digital Media', 'Location': 'Los Angeles', 'Role': 'Media Planning Executive/Manager', 'Role Category': 'Advertising'}, 1: {'Functional Area': 'Sales , Retail , Business Developmen', 'Id': 1, 'Industry': 'IT-Software, Software Services', 'Job Experience Required': '2 - 5 yrs', 'Job Title': 'Online Bidding Executive', 'Key Skills': 'pre sales| closing| software knowledge| clients| requirements| negotiating| client| online bidding| good communication| technology', 'Location': 'New York', 'Role': 'Sales Executive/Officer', 'Role Category': 'Retail Sales'}, 2: {'Functional Area': 'Engineering Design , R&D', 'Id': 2, 'Industry': 'Recruitment, Staffing', 'Job Experience Required': '0 - 1 yrs', 'Job Title': 'Trainee Research/ Research Executive- Hi- Tech Operations', 'Key Skills': 'Computer science| Fabrication| Quality check| Intellectual property| Electronics| Support services| Research| Management| Human resource management| Research Executive', 'Location': 'San Francisco', 'Role': 'R&D Executive', 'Role Category': 'R&D'}, 3: {'Functional Area': 'IT Software - Application Programming , Maintenance', 'Id': 3, 'Industry': 'IT-Software, Software Services', 'Job Experience Required': '0 - 5 yrs', 'Job Title': 'Technical Suppor', 'Key Skills': 'Technical Suppor', 'Location': 'Washington DC', 'Role': 'Technical Support
    ''';
    for i in dicy.items():
        print('DICY DICT.ITEM is ', i, '\n')
else:
    print('Not OK, response.status_code=', response.status_code)

#locsort = sorted(data, key=lambda x: x['Location'])
#print(locsort[:3])  # is []

encoding is  utf-8
response.headers= {'Server': 'Werkzeug/3.0.4 Python/3.12.3', 'Date': 'Fri, 27 Sep 2024 23:20:04 GMT', 'Content-Type': 'application/json', 'Content-Length': '774', 'Connection': 'close'}
response.request.headers= {'User-Agent': 'python-requests/2.32.3', 'Accept-Encoding': 'gzip, deflate, br, zstd', 'Accept': '*/*', 'Connection': 'keep-alive'}
type of response.text =  <class 'str'>
type of response.content =  <class 'bytes'>
type of RESPONSE.JSON() =  <class 'list'>
type of dicy is  <class 'dict'>
DICY DICT.ITEM is  (0, {'Functional Area': 'Marketing , Advertising , MR , PR , Media Planning', 'Id': '0', 'Industry': 'Advertising, PR, MR, Event Management', 'Job Experience Required': '5 - 10 yrs', 'Job Title': 'Digital Media Planner', 'Key Skills': 'Media Planning| Digital Media', 'Location': 'Los Angeles', 'Role': 'Media Planning Executive/Manager', 'Role Category': 'Advertising'}) 

DICY DICT.ITEM is  (1, {'Functional Area': 'Sales , Retail , Business Development', 'Id

In [5]:

    # r.json()['args']
    print('dict list data.items is \n', dict(list(enumerate(data))[:3]))

    #[0]dicy = {k:v for e in data for(k,v) in enumerate(e)}  # comprehension list into dictionary
    dicy = {k:v for e in data for(k,v) in enumerate(e)}  # comprehension list into dictionary
    print('dicy = ', type(dicy), dicy)
    print(dict(list(dicy.items())[:3]))

    #print(dict(list(d.items())[:3]))
    #print('type(dicy) is ', type(dicy))
    #print(dicy)  # it only has key names, or first row
    de = dict(enumerate(data))
    #di = dict.items()  # no works
    
    '''
    for k in d.keys():
        newdict = d.values()
    print(type(newdict))
    #print(newdict)
    dicy = {}
    dicy = {k:v for e newdict for(k,v) in enumerate(e)}  # comprehension list into dictionary
    for k in list(newdict.keys()):
        print('key is ', k)
    print('type(dicy) is ', type(dicy))
    print(dicy)

    kl = []
    vl = []
    kl = list(dicy.keys())
    vl = dicy.get('Location')
    print(kl,'\n', vl)
    ''';


dict list data.items is 
 {0: {'Functional Area': 'Marketing , Advertising , MR , PR , Media Planning', 'Id': '0', 'Industry': 'Advertising, PR, MR, Event Management', 'Job Experience Required': '5 - 10 yrs', 'Job Title': 'Digital Media Planner', 'Key Skills': 'Media Planning| Digital Media', 'Location': 'Los Angeles', 'Role': 'Media Planning Executive/Manager', 'Role Category': 'Advertising'}, 1: {'Functional Area': 'Sales , Retail , Business Development', 'Id': '1', 'Industry': 'IT-Software, Software Services', 'Job Experience Required': '2 - 5 yrs', 'Job Title': 'Online Bidding Executive', 'Key Skills': 'pre sales| closing| software knowledge| clients| requirements| negotiating| client| online bidding| good communication| technology', 'Location': 'New York', 'Role': 'Sales Executive/Officer', 'Role Category': 'Retail Sales'}}
dicy =  <class 'dict'> {0: 'Functional Area', 1: 'Id', 2: 'Industry', 3: 'Job Experience Required', 4: 'Job Title', 5: 'Key Skills', 6: 'Location', 7: 'Role', 

In [7]:
    # If supposed to use payload and params, then not expected to use DataFrames?
    # print(dicy)  # THIS ONE WORKS, but looks like a long string
    dfwhole = pd.json_normalize(data)  # THIS ONE WORKS
    df = dfwhole[['Id', 'Key Skills', 'Location', 'Job Experience Required']]
    print('type of df is ', type(df))
    df.loc[:, 'Key Skills'] = df.loc[:, ('Key Skills')].str.lower()


type of df is  <class 'pandas.core.frame.DataFrame'>


In [9]:
def get_number_of_jobs_T(technology):
    technology = technology.lower()
    df1 = df[df['Key Skills'].str.contains(technology) ]
    #print(df1.shape)
    # print(df1.shape[0], df1.shape[1])
    # print(df1.head(3))
    number_of_jobs = df1['Id'].count()
    return technology, int(number_of_jobs)


In [11]:
print(get_number_of_jobs_T('python'))
print(get_number_of_jobs_T('javascript'))
print(get_number_of_jobs_T('excel'))
print(get_number_of_jobs_T('database'))
print(get_number_of_jobs_T('Artificial Intelligence'))
print(get_number_of_jobs_T('Machine Learning'))
print(get_number_of_jobs_T('Data Science'))
print(get_number_of_jobs_T('Data Analy'))
print(get_number_of_jobs_T('Data Engineer'))


('python', 0)
('javascript', 0)
('excel', 0)
('database', 0)
('artificial intelligence', 0)
('machine learning', 0)
('data science', 0)
('data analy', 0)
('data engineer', 0)


In [13]:
locs = df.groupby('Location', as_index=True)[['Id']].count()
locs = locs.transpose()
def get_number_of_jobs_L(location):
    count = locs[location]
    #if not count.empty:
    #if count.any():
    count = int(count.iloc[0])
    return location, count


Call the function for Los Angeles and check if it is working.


In [15]:
#your code goes here
# #Call the function for Los Angeles and check if it is working.
print(get_number_of_jobs_L('Los Angeles'))
'''
print(get_number_of_jobs_L('San Francisco'))
print(get_number_of_jobs_L('Washington DC'))
print(get_number_of_jobs_L('Seattle'))
print(get_number_of_jobs_L('Austin'))
print(get_number_of_jobs_L('Detroit'))
print(get_number_of_jobs_L('New York'))
''';

('Los Angeles', 1)


### Store the results in an excel file


Call the API for all the given technologies above and write the results in an excel spreadsheet.


If you do not know how create excel file using python, double click here for **hints**.

<!--

from openpyxl import Workbook        # import Workbook class from module openpyxl
wb=Workbook()                        # create a workbook object
ws=wb.active                         # use the active worksheet
ws.append(['Country','Continent'])   # add a row with two columns 'Country' and 'Continent'
ws.append(['Eygpt','Africa'])        # add a row with two columns 'Egypt' and 'Africa'
ws.append(['India','Asia'])          # add another row
ws.append(['France','Europe'])       # add another row
wb.save("countries.xlsx")            # save the workbook into a file called countries.xlsx


-->


In [89]:
# but also, how to get rid of excess columns? either slice, match, 
print('TESTING  CELL')

dicy = dict(enumerate(data))
print(data)
elist = enumerate(data)
print(elist)

cities = ['Los Angeles','San Francisco','Washington DC','Seattle','Austin','Detroit','New York']
techs = ['python','javascript','excel','database','Artificial Intelligence',
    'Machine Learning','Data Science','Data Analy','Data Engineer']
cities = ['Los Angeles']
techs = ['python']
outp = []
'''
def collectitems(sourcelist, item, filteredlist):
    elist = enumerate(sourcelist)
    for i in sourcelist:
        #print('i=',i, 'sourcelist[i]', enumerate(sourcelist[i]))  # only getting 1st line
        for loc in cities:
            #print('for loc in cities ', loc)
            if loc in i:
                filteredlist.append([i])
    return len(filteredlist)
print(collectitems(edata, 3, outp))

myfile = 'shorter.json'
pathfn=os.path.join(os.getcwd(), myfile)

# JSONDecodeError: Expecting property name enclosed in double quotes: line 2 column 1 (char 10)

#ll   # or ll[0] = []
for lo in range(0, len(cities)-1):
    filtd[lo] = []
    coli = collectitems (data, lo, ll[lo])  #  or ll[0], not quite right
    tl = []
    for t in techs:
        collectitems (ll[lo], t, tl[t])
df.to_excel(fn, tl)  # or tl.to_excel(fn) ???

cnt = 0
for i in data:
    cnt += 1
print(cnt)  # 27001
''';
api_url="http://127.0.0.1:5000/data/all"
response = requests.get(api_url)
data = {}
if response.ok:
    print( response.headers)
    print( response.request.headers)
    print( type(response.text))
    print( type(response.content))
    print( type(response.json()))  # it says type list
    x = response.json()
    
    print('DUMP x')
    json.dumps(x)
print('DUMP dicy')
json.dumps(dicy)


TESTING  CELL
[{'Functional Area': 'Marketing , Advertising , MR , PR , Media Planning', 'Id': '0', 'Industry': 'Advertising, PR, MR, Event Management', 'Job Experience Required': '5 - 10 yrs', 'Job Title': 'Digital Media Planner', 'Key Skills': 'Media Planning| Digital Media', 'Location': 'Los Angeles', 'Role': 'Media Planning Executive/Manager', 'Role Category': 'Advertising'}, {'Functional Area': 'Sales , Retail , Business Development', 'Id': '1', 'Industry': 'IT-Software, Software Services', 'Job Experience Required': '2 - 5 yrs', 'Job Title': 'Online Bidding Executive', 'Key Skills': 'pre sales| closing| software knowledge| clients| requirements| negotiating| client| online bidding| good communication| technology', 'Location': 'New York', 'Role': 'Sales Executive/Officer', 'Role Category': 'Retail Sales'}]
<enumerate object at 0x000001B737752D90>
{'Server': 'Werkzeug/3.0.4 Python/3.12.3', 'Date': 'Fri, 27 Sep 2024 22:24:28 GMT', 'Content-Type': 'application/json', 'Content-Length'

  '''


Create a python list of all technologies for which you need to find the number of jobs postings.


In [93]:
'''
table = {}
table.append(get_number_of_jobs_L('Los Angeles'))
table.append(get_number_of_jobs_L('San Francisco'))
table.append(get_number_of_jobs_L('Washington DC'))
table.append(get_number_of_jobs_L('Seattle'))
table.append(get_number_of_jobs_L('Austin'))
table.append(get_number_of_jobs_L('Detroit'))
table.append(get_number_of_jobs_L('New York'))
''';

In [None]:
#your code goes here
# pd.to_excel(table)


Import libraries required to create excel spreadsheet


In [None]:
# your code goes here

Create a workbook and select the active worksheet


In [None]:
# your code goes here

Find the number of jobs postings for each of the technology in the above list.
Write the technology name and the number of jobs postings into the excel spreadsheet.


In [None]:
#your code goes here

Save into an excel spreadsheet named 'github-job-postings.xlsx'.


In [None]:
#your code goes here

#### In the similar way, you can try for below given technologies and results  can be stored in an excel sheet.


Collect the number of job postings for the following languages using the API:

*   C
*   C#
*   C++
*   Java
*   JavaScript
*   Python
*   Scala
*   Oracle
*   SQL Server
*   MySQL Server
*   PostgreSQL
*   MongoDB


In [None]:
# your code goes here


## Authors


Ayushi Jain


### Other Contributors


Rav Ahuja

Lakshmi Holla

Malika


Copyright © 2020 IBM Corporation. This notebook and its source code are released under the terms of the [MIT License](https://cognitiveclass.ai/mit-license?utm_medium=Exinfluencer\&utm_source=Exinfluencer\&utm_content=000026UJ\&utm_term=10006555\&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDA0321ENSkillsNetwork21426264-2021-01-01\&cm_mmc=Email_Newsletter-\_-Developer_Ed%2BTech-\_-WW_WW-\_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264\&cm_mmca1=000026UJ\&cm_mmca2=10006555\&cm_mmca3=M12345678\&cvosrc=email.Newsletter.M12345678\&cvo_campaign=000026UJ).


<!--## Change Log


<!--| Date (YYYY-MM-DD) | Version | Changed By        | Change Description                 |
| ----------------- | ------- | ----------------- | ---------------------------------- | 
| 2022-01-19        | 0.3     | Lakshmi Holla        | Added changes in the markdown      |
| 2021-06-25        | 0.2     | Malika            | Updated GitHub job json link       |
| 2020-10-17        | 0.1     | Ramesh Sannareddy | Created initial version of the lab |--!>
