<p style="text-align:center">
    <a href="https://skills.network" target="_blank">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="200" alt="Skills Network Logo">
    </a>
</p>


# **Collecting Job Data Using APIs**


Estimated time needed: **30** minutes


## Objectives


After completing this lab, you will be able to:


*   Collect job data using Jobs API
*   Store the collected data into an excel spreadsheet.


><strong>Note: Before starting with the assignment make sure to read all the instructions and then move ahead with the coding part.</strong>


#### Instructions


To run the actual lab, firstly you need to click on the [Jobs_API](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/Jobs_API.ipynb) notebook link. The file contains flask code which is required to run the Jobs API data.

Now, to run the code in the file that opens up follow the below steps.

Step1: Download the file. 

Step2: Upload the file into your current Jupyter environment using the upload button in your Jupyter interface. Ensure that the file is in the same folder as your working .ipynb file.

Step 2: If working in a local Jupyter environment, use the "Upload" button in your Jupyter interface to upload the Jobs_API notebook into the same folder as your current .ipynb file.

<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/Upload.PNG">

Step3:  Open the Jobs_API notebook, and run all the cells to start the Flask application. Once the server is running, you can access the API from the URL provided in the notebook.

If you want to learn more about flask, which is optional, you can click on this link [here](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/FLASK_API.md.html).

Once you run the flask code, you can start with your assignment.


## Dataset Used in this Assignment

The dataset used in this lab comes from the following source: https://www.kaggle.com/promptcloud/jobs-on-naukricom under the under a **Public Domain license**.

> Note: We are using a modified subset of that dataset for the lab, so to follow the lab instructions successfully please use the dataset provided with the lab, rather than the dataset from the original source.

The original dataset is a csv. We have converted the csv to json as per the requirement of the lab.


## Warm-Up Exercise


Before you attempt the actual lab, here is a fully solved warmup exercise that will help you to learn how to access an API.


Using an API, let us find out who currently are on the International Space Station (ISS).<br> The API at [http://api.open-notify.org/astros.json](http://api.open-notify.org/astros.json?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDA0321ENSkillsNetwork21426264-2021-01-01&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ) gives us the information of astronauts currently on ISS in json format.<br>
You can read more about this API at [http://open-notify.org/Open-Notify-API/People-In-Space/](http://open-notify.org/Open-Notify-API/People-In-Space?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDA0321ENSkillsNetwork21426264-2021-01-01&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ)


In [1]:
import requests # you need this module to make an API call
import pandas as pd

In [2]:
api_url = "http://api.open-notify.org/astros.json" # this url gives use the astronaut data

In [None]:
response = requests.get(api_url) # Call the API using the get method and store the
                                # output of the API call in a variable called response.

In [None]:
if response.ok:             # if all is well() no errors, no network timeouts)
    data = response.json()  # store the result in json format in a variable called data
                            # the variable data is of type dictionary.

In [None]:
print(data)   # print the data just to check the output or for debugging

Print the number of astronauts currently on ISS.


In [None]:
print(data.get('number'))

Print the names of the astronauts currently on ISS.


In [None]:
astronauts = data.get('people')
print("There are {} astronauts on ISS".format(len(astronauts)))
print("And their names are :")
for astronaut in astronauts:
    print(astronaut.get('name'))

Hope the warmup was helpful. Good luck with your next lab!


## Lab: Collect Jobs Data using Jobs API


### Objective: Determine the number of jobs currently open for various technologies  and for various locations


Collect the number of job postings for the following locations using the API:

* Los Angeles
* New York
* San Francisco
* Washington DC
* Seattle
* Austin
* Detroit


In [3]:
import pandas as pd
import json
import requests


https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/jobs.json#### Write a function to get the number of jobs for the Python technology.<br>
> Note: While using the lab you need to pass the **payload** information for the **params** attribute in the form of **key** **value** pairs.
  Refer the ungraded **rest api lab** in the course **Python for Data Science, AI & Development**  <a href="https://www.coursera.org/learn/python-for-applied-data-science-ai/ungradedLti/P6sW8/hands-on-lab-access-rest-apis-request-http?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDA0321ENSkillsNetwork928-2022-01-01">link</a>
  
 ##### The keys in the json are 
 * Job Title
 
 * Job Experience Required
 
 * Key Skills
 
 * Role Category
 
 * Location
 
 * Functional Area
 
 * Industry
 
 * Role 
 
You can also view  the json file contents  from the following <a href = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/jobs.json">json</a> URL.



In [4]:
url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/jobs.json"

# Veriyi çek ve DataFrame'e çevir
response = requests.get(url)
data = response.json()
df = pd.DataFrame(data)

# Şehir listesi
cities = ["Los Angeles", "New York", "San Francisco", "Washington DC", "Seattle", "Austin", "Detroit"]

# Fonksiyon: Belirli şehir ve teknoloji için iş ilanı sayısı
def count_jobs(city, technology="Python"):
    filtered = df[(df['Location'] == city) & (df['Key Skills'].str.contains(technology, case=False))]
    return len(filtered)

# Sonuçları toplama ve gösterme
results = {city: count_jobs(city) for city in cities}
for city, count in results.items():
    print(f"{city}: {count} Python job(s)")

Los Angeles: 24 Python job(s)
New York: 143 Python job(s)
San Francisco: 17 Python job(s)
Washington DC: 258 Python job(s)
Seattle: 133 Python job(s)
Austin: 15 Python job(s)
Detroit: 170 Python job(s)


Calling the function for Python and checking if it works.


In [5]:
for city in cities:
    print(f"{city}: {count_jobs(city)} Python job(s)")


Los Angeles: 24 Python job(s)
New York: 143 Python job(s)
San Francisco: 17 Python job(s)
Washington DC: 258 Python job(s)
Seattle: 133 Python job(s)
Austin: 15 Python job(s)
Detroit: 170 Python job(s)


#### Write a function to find number of jobs in US for a location of your choice


In [15]:
url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/jobs.json"

# Veriyi çek ve DataFrame'e çevir
response = requests.get(url)
data = response.json()
df = pd.DataFrame(data)

# Fonksiyon: Belirli şehir ve teknoloji için iş ilanı sayısı
def count_jobs_in_location(city, technology="Python"):
    """
    city: Şehir adı (örn. "New York")
    technology: Aranacak teknoloji (default "Python")
    """
    filtered = df[(df['Location'] == city) & (df['Key Skills'].str.contains(technology, case=False))]
    return len(filtered)

# Örnek kullanım: New York için Python işleri say
city_name = "New York"
num_jobs = count_jobs_in_location(city_name, "Python")
print(f"{city_name}: {num_jobs} Python job(s)")

New York: 143 Python job(s)


Call the function for Los Angeles and check if it is working.


In [16]:
url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/jobs.json"

# Veriyi çek ve DataFrame'e çevir
response = requests.get(url)
data = response.json()
df = pd.DataFrame(data)

# Fonksiyon: Belirli şehir ve teknoloji için iş ilanı sayısı
def count_jobs_in_location(city, technology="Python"):
    filtered = df[(df['Location'] == city) & (df['Key Skills'].str.contains(technology, case=False, na=False))]
    return len(filtered)

# Fonksiyonu Los Angeles için çağır
city_name = "Los Angeles"
num_jobs = count_jobs_in_location(city_name, "Python")

print(f"{city_name}: {num_jobs} Python job(s) found.")

Los Angeles: 24 Python job(s) found.


### Store the results in an excel file


Call the API for all the given technologies above and write the results in an excel spreadsheet.


If you do not know how create excel file using python, double click here for **hints**.

<!--

from openpyxl import Workbook        # import Workbook class from module openpyxl
wb=Workbook()                        # create a workbook object
ws=wb.active                         # use the active worksheet
ws.append(['Country','Continent'])   # add a row with two columns 'Country' and 'Continent'
ws.append(['Eygpt','Africa'])        # add a row with two columns 'Egypt' and 'Africa'
ws.append(['India','Asia'])          # add another row
ws.append(['France','Europe'])       # add another row
wb.save("countries.xlsx")            # save the workbook into a file called countries.xlsx


-->


Create a python list of all technologies for which you need to find the number of jobs postings.


In [17]:
# Teknolojilerin listesi
technologies = ["Python", "Java", "Javascript", "C++", "AWS", "SQL", "Azure", "SAP", "React", "Node.js"]

Import libraries required to create excel spreadsheet


In [18]:
# Excel dosyası oluşturmak ve veri işlemek için gerekli kütüphaneler
import pandas as pd
import requests
import json
!pip install openpyxl



Create a workbook and select the active worksheet


In [19]:
# Excel dosyası oluşturmak için gerekli kütüphane
from openpyxl import Workbook

# Yeni bir çalışma kitabı (workbook) oluştur
workbook = Workbook()

# Aktif çalışma sayfasını (worksheet) seç
sheet = workbook.active

# Sayfaya başlık ver (isteğe bağlı)
sheet.title = "Job Data"

print("Yeni Excel dosyası oluşturuldu ve aktif sayfa seçildi!")

Yeni Excel dosyası oluşturuldu ve aktif sayfa seçildi!


Find the number of jobs postings for each of the technology in the above list.
Write the technology name and the number of jobs postings into the excel spreadsheet.


In [27]:
import requests

def get_number_of_jobs_for_location(location):
    url = "http://127.0.0.1:5000/data"
    params = {"Location": location}
    response = requests.get(url, params=params)

    if response.status_code == 200:
        data = response.json()
        return len(data)
    else:
        print(f"Error fetching data for {location}: {response.status_code}")
        return 0
print(get_number_of_jobs_for_technology("Python"))
print(get_number_of_jobs_for_location("Los Angeles"))

1173
640


Save into an excel spreadsheet named **job-postings.xlsx**.


In [28]:
import sys
import subprocess
import requests
import pandas as pd
from time import sleep

# openpyxl yoksa yüklemeye çalış
try:
    import openpyxl  # noqa: F401
except Exception:
    print("openpyxl yüklü değil, yüklüyorum... (birkaç saniye sürebilir)")
    try:
        # Jupyter içinde çalışıyorsa !pip de çalışır, ama burada güvenli pip install
        subprocess.check_call([sys.executable, "-m", "pip", "install", "openpyxl"])
        import openpyxl  # noqa: F401
        print("openpyxl yüklendi.")
    except Exception as e:
        print("openpyxl yüklenemedi:", e)
        print("Excel .xlsx kaydetme başarısız olabilir. Alternatif olarak CSV kaydetme kodu da ekledim.")

# API base URL (Flask uygulamanızın çalıştığı adres)
BASE_URL = "http://127.0.0.1:5000/data"

# Teknolojiler ve şehirler (labda istenenleri ve örnek teknolojileri ekledim)
technologies = ["Python", "Java", "Javascript", "C", "C++", "AWS", "SQL", "React", "Node.js"]
locations = ["Los Angeles", "New York", "San Francisco", "Washington DC", "Seattle", "Austin", "Detroit"]

# Fonksiyonlar: API'den count al
def get_number_of_jobs_for_technology(technology, timeout=10):
    """
    'Key Skills' parametresi ile /data endpoint'ine istek atar.
    Dönen JSON listesi uzunluğunu döner.
    """
    try:
        params = {"Key Skills": technology}
        r = requests.get(BASE_URL, params=params, timeout=timeout)
        if r.status_code == 200:
            data = r.json()
            return len(data) if data is not None else 0
        else:
            print(f"[TECH] {technology}: HTTP {r.status_code}")
            return 0
    except requests.exceptions.RequestException as e:
        print(f"[TECH] {technology}: Request failed ->", e)
        return 0

def get_number_of_jobs_for_location(location, timeout=10):
    """
    'Location' parametresi ile /data endpoint'ine istek atar.
    Dönen JSON listesi uzunluğunu döner.
    """
    try:
        params = {"Location": location}
        r = requests.get(BASE_URL, params=params, timeout=timeout)
        if r.status_code == 200:
            data = r.json()
            return len(data) if data is not None else 0
        else:
            print(f"[LOC] {location}: HTTP {r.status_code}")
            return 0
    except requests.exceptions.RequestException as e:
        print(f"[LOC] {location}: Request failed ->", e)
        return 0

# Verileri topla
tech_results = []
loc_results = []

print("API'ye istek atılıyor — lütfen Flask server'ın çalıştığından emin ol (http://127.0.0.1:5000).")
sleep(0.5)

for tech in technologies:
    cnt = get_number_of_jobs_for_technology(tech)
    print(f"Technology: {tech} -> {cnt}")
    tech_results.append({"Technology": tech, "Number of Jobs": cnt})
    sleep(0.1)

for loc in locations:
    cnt = get_number_of_jobs_for_location(loc)
    print(f"Location: {loc} -> {cnt}")
    loc_results.append({"Location": loc, "Number of Jobs": cnt})
    sleep(0.1)

# DataFrame oluştur
df_tech = pd.DataFrame(tech_results)
df_loc = pd.DataFrame(loc_results)

# Excel'e kaydet (iki sheet)
output_filename = "job-postings.xlsx"
try:
    with pd.ExcelWriter(output_filename, engine="openpyxl") as writer:
        df_tech.to_excel(writer, sheet_name="By_Technology", index=False)
        df_loc.to_excel(writer, sheet_name="By_Location", index=False)
    print(f"\nBaşarılı: '{output_filename}' oluşturuldu. (Sheets: By_Technology, By_Location)")
except Exception as e:
    print("Excel kaydetme sırasında hata:", e)
    # fallback: CSV olarak kaydet
    csv_tech = "job-postings-by-technology.csv"
    csv_loc = "job-postings-by-location.csv"
    df_tech.to_csv(csv_tech, index=False)
    df_loc.to_csv(csv_loc, index=False)
    print(f"Alternatif olarak CSV olarak kaydedildi: '{csv_tech}', '{csv_loc}'")


API'ye istek atılıyor — lütfen Flask server'ın çalıştığından emin ol (http://127.0.0.1:5000).
Technology: Python -> 1173
Technology: Java -> 2609
[TECH] Javascript: HTTP 500
Technology: Javascript -> 0
Technology: C -> 13498
Technology: C++ -> 305
[TECH] AWS: HTTP 500
Technology: AWS -> 0
[TECH] SQL: HTTP 500
Technology: SQL -> 0
[TECH] React: HTTP 500
Technology: React -> 0
[TECH] Node.js: HTTP 500
Technology: Node.js -> 0
Location: Los Angeles -> 640
Location: New York -> 3226
Location: San Francisco -> 435
Location: Washington DC -> 5316
Location: Seattle -> 3375
Location: Austin -> 434
Location: Detroit -> 3945

Başarılı: 'job-postings.xlsx' oluşturuldu. (Sheets: By_Technology, By_Location)


#### In the similar way, you can try for below given technologies and results  can be stored in an excel sheet.


Collect the number of job postings for the following languages using the API:

*   C
*   C#
*   C++
*   Java
*   JavaScript
*   Python
*   Scala
*   Oracle
*   SQL Server
*   MySQL Server
*   PostgreSQL
*   MongoDB


In [29]:
import requests
from openpyxl import Workbook

# Base URL of your Flask API
BASE_URL = "http://127.0.0.1:5000/data"

# List of technologies to query
technologies = [
    "C", "C#", "C++", "Java", "JavaScript",
    "Python", "Scala", "Oracle", "SQL Server",
    "MySQL Server", "PostgreSQL", "MongoDB"
]

def get_number_of_jobs_for_technology(tech):
    """Fetch number of jobs for a given technology via local Flask API"""
    params = {"Key Skills": tech}
    try:
        response = requests.get(BASE_URL, params=params)
        if response.status_code == 200:
            jobs = response.json()
            return len(jobs)
        else:
            print(f"Error fetching data for {tech}: {response.status_code}")
            return 0
    except Exception as e:
        print(f"Error fetching data for {tech}: {e}")
        return 0

# Create Excel workbook
wb = Workbook()
sheet = wb.active
sheet.title = "Job Postings"
sheet.append(["Technology", "Number of Job Postings"])

# Loop through each technology and store results
for tech in technologies:
    count = get_number_of_jobs_for_technology(tech)
    print(f"{tech}: {count} jobs found")
    sheet.append([tech, count])

# Save results to Excel file
wb.save("job-postings.xlsx")
print("\n✅ Results saved to job-postings.xlsx successfully.")


C: 13498 jobs found
C#: 333 jobs found
C++: 305 jobs found
Java: 2609 jobs found
JavaScript: 355 jobs found
Python: 1173 jobs found
Scala: 33 jobs found
Oracle: 784 jobs found
SQL Server: 250 jobs found
MySQL Server: 0 jobs found
PostgreSQL: 10 jobs found
MongoDB: 174 jobs found

✅ Results saved to job-postings.xlsx successfully.


## Authors


Ayushi Jain


### Other Contributors


Rav Ahuja

Lakshmi Holla

Malika


Copyright © IBM Corporation.


<!--## Change Log


<!--| Date (YYYY-MM-DD) | Version | Changed By        | Change Description                 |
| ----------------- | ------- | ----------------- | ---------------------------------- | 
| 2022-01-19        | 0.3     | Lakshmi Holla        | Added changes in the markdown      |
| 2021-06-25        | 0.2     | Malika            | Updated GitHub job json link       |
| 2020-10-17        | 0.1     | Ramesh Sannareddy | Created initial version of the lab |--!>
