 # **Collecting job data using APIs**

## Objectives

* Collect job data from GitHub Jobs API
* Store the collected data into an excel spreadsheet.



## Warm-Up Exercise

Before the attempt of the actual lab, here is a fully solved warmup exercise that will help you to learn how to access an API.

Using an API, let us find out who currently are on the International Space Station (ISS).
The API at http://api.open-notify.org/astros.json gives us the information of astronauts currently on ISS in json format.
You can read more about this API at http://open-notify.org/Open-Notify-API/People-In-Space/


In [3]:
#Call the API and check that everything is ok
import requests

api_url = "http://api.open-notify.org/astros.json" 

response = requests.get(api_url)

if response.ok:
    data = response.json()

data

{'people': [{'name': 'Mark Vande Hei', 'craft': 'ISS'},
  {'name': 'Oleg Novitskiy', 'craft': 'ISS'},
  {'name': 'Pyotr Dubrov', 'craft': 'ISS'},
  {'name': 'Thomas Pesquet', 'craft': 'ISS'},
  {'name': 'Megan McArthur', 'craft': 'ISS'},
  {'name': 'Shane Kimbrough', 'craft': 'ISS'},
  {'name': 'Akihiko Hoshide', 'craft': 'ISS'},
  {'name': 'Nie Haisheng', 'craft': 'Tiangong'},
  {'name': 'Liu Boming', 'craft': 'Tiangong'},
  {'name': 'Tang Hongbo', 'craft': 'Tiangong'}],
 'number': 10,
 'message': 'success'}

In [5]:
#Extract the number of astronauts
data["number"] 

10

In [6]:
#Extract the name of the astronauts
astronauts = data.get("people")
for astro in astronauts:
    print(astro["name"])

Mark Vande Hei
Oleg Novitskiy
Pyotr Dubrov
Thomas Pesquet
Megan McArthur
Shane Kimbrough
Akihiko Hoshide
Nie Haisheng
Liu Boming
Tang Hongbo


<br></br>

## Collecting jobs data using Careerjet APIs

At first, the API that was expected to be used was GitHub Jobs, the web page has been deprecated, so it was decided to look for alternatives, choosing the CareerJet API for collecting the job data.

For more information about the CareerJet API, check the documentation: https://pypi.org/project/careerjet-api/

### Objective: Determine the number of jobs currently open for various technologies

Collect the number of job postings in Mexico City for the following languages using the API:

* C++
* C#
* Java
* JavaScript
* Python
* Scala
* Oracle
* SQL Server
* MySQL Server
* PostgreSQL
* MongoDB
* Excel



Write a function to get the number of jobs per technology in the year 2021.

In [1]:
import careerjet_api #Import the libraries
import pandas as pd

In [2]:
def get_number_jobs(technology):
    cj = careerjet_api.CareerjetAPIClient("es_MX")
    result_json = cj.search({
    "keywords" : f"{technology}",
    "location" : "Ciudad de México",
    "sort" : "date",
    "affid" : "a3150e0699e2de8a7acbd5123e366838",
    "user_ip" : "192.168.1.65",
    "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:89.0) Gecko/20100101 Firefox/89.0",
    'url'         : 'http://www.example.com/jobsearch?q=python&l=Tabasco'})
    
    job_count = result_json["hits"]
    
    return technology, job_count




In [3]:
get_number_jobs("Python")

('Python', 771)

Get the number of jobs for every technology

In [8]:
technologies = [
    "C#",
    "C++",
    "Java",
    "JavaScript",
    "Python",
    "Scala",
    "Oracle",
    "SQL",
    "MySQL",
    "PostgreSQL",
    "MongoDB",
    "Excel"
]

tec_dict = {}
counts = []
TECHS = []

In [9]:

for tech in technologies:
    tecnologia, conteo = get_number_jobs(tech)
    counts.append(conteo)
    TECHS.append(tecnologia)
    tec_dict["Technologies"] = TECHS
    tec_dict["Number_jobs"] = counts

In [10]:
tec_dict

{'Technologies': ['C#',
  'C++',
  'Java',
  'JavaScript',
  'Python',
  'Scala',
  'Oracle',
  'SQL',
  'MySQL',
  'PostgreSQL',
  'MongoDB',
  'Excel'],
 'Number_jobs': [4800,
  4800,
  1378,
  894,
  771,
  61,
  749,
  1692,
  380,
  156,
  162,
  4537]}

Create a Dataframe to save your data

In [11]:
df_jobs = pd.DataFrame(tec_dict)

df_jobs

Unnamed: 0,Technologies,Number_jobs
0,C#,4800
1,C++,4800
2,Java,1378
3,JavaScript,894
4,Python,771
5,Scala,61
6,Oracle,749
7,SQL,1692
8,MySQL,380
9,PostgreSQL,156


Save your DataFrame in an Excel file

In [12]:
df_jobs.to_excel("Careerjet_posting.xlsx")