In [1]:
# import libraries
import requests
import pandas as pd
import pprint

### A Little About Requests

Requests allows you to send HTTP/1.1 requests easily. There’s no need to manually add query strings to your URLs, or to form-encode your `PUT` & `POST` data.

### Example 1
We are going to ingest data from this API.

 https://wizard-world-api.herokuapp.com/swagger/index.html



 DOCUMENTATION - THE SOURCE OF TRUTH

 --
 The first step is to examine the documentation of the API, this is found in the website of the API. The documentation contains useful information about the API :
 * endpoints
 * parameters
 * datatypes of the parameters
 * code snippets and tutorials  

Our API in the link above has the following endpoints:
* Elixirs
* Houses
* Ingredients
* Spells
* Wizards

Lets take a close look at the Spells endpoint. From the documentation, we see that the endpoint accepts `name`, `type` and `incantation` - all are string datatypes.

If we click `Try it Out` on the top right div, we can mock an example of a get request in the endpoint and see if we get a valid response.
Since we are not familiar with any spells, leave the field empty and click on the execute button.

In the response body, you should see data in json format,like a nested Python dictionary

Above that we can see the query string.
`https://wizard-world-api.herokuapp.com/Spells?Type=Spell`

It is what we are interested in when making an API call



In [2]:
# making first API call
response = requests.get("https://wizard-world-api.herokuapp.com/Spells?Type=Spell")
response

<Response [200]>

### Response
The response we get from the server is usually in the form of a number. The number is called a `status code`. Different status codes serve to inform us if our API call was successfull or not.

* `200` - API call successfull
* `402` - Client Error
* `404` - Resource not found on server

Find out more about status codes in the link below:

https://en.wikipedia.org/wiki/List_of_HTTP_status_codes

In [3]:
# display the methods and attributes of the response object
help(response)

Help on Response in module requests.models object:

class Response(builtins.object)
 |  The :class:`Response <Response>` object, which contains a
 |  server's response to an HTTP request.
 |
 |  Methods defined here:
 |
 |  __bool__(self)
 |      Returns True if :attr:`status_code` is less than 400.
 |
 |      This attribute checks if the status code of the response is between
 |      400 and 600 to see if there was a client error or a server error. If
 |      the status code, is between 200 and 400, this will return True. This
 |      is **not** a check to see if the response code is ``200 OK``.
 |
 |  __enter__(self)
 |
 |  __exit__(self, *args)
 |
 |  __getstate__(self)
 |      Helper for pickle.
 |
 |  __init__(self)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |
 |  __iter__(self)
 |      Allows you to use a response as an iterator.
 |
 |  __nonzero__(self)
 |      Returns True if :attr:`status_code` is less than 400.
 |
 |      This attribute checks if 

In [4]:
# get the response content as text
text = response.text
text

'[{"id":"3ba417ce-8165-464d-9f29-daf23da1b2bc","name":"Albus Dumbledore\'s forceful spell","incantation":null,"effect":"Great force","canBeVerbal":null,"type":"Spell","light":"Transparent","creator":null},{"id":"0cc4ee69-4aa8-49e1-bb89-125793ed2e95","name":"False memory spell","incantation":null,"effect":"Implants a false memory in the victim\'s mind","canBeVerbal":null,"type":"Spell","light":"None","creator":"Mnemone Radford c. 1900s (possibly)"},{"id":"5984fe39-c7e6-4ff9-ad8d-35cca813a239","name":"Shield penetration spell","incantation":null,"effect":"Used to break down magical shields","canBeVerbal":null,"type":"Spell","light":"BlueishWhite","creator":null},{"id":"e93e47e2-fcae-4c8c-ad0e-f6f4804b04b2","name":"Shooting spell","incantation":null,"effect":"Small explosion with a gunshot-sound","canBeVerbal":null,"type":"Spell","light":"Transparent","creator":null}]'

In [5]:
# We are interested in parsing the data as json
data = response.json()
data

[{'id': '3ba417ce-8165-464d-9f29-daf23da1b2bc',
  'name': "Albus Dumbledore's forceful spell",
  'incantation': None,
  'effect': 'Great force',
  'canBeVerbal': None,
  'type': 'Spell',
  'light': 'Transparent',
  'creator': None},
 {'id': '0cc4ee69-4aa8-49e1-bb89-125793ed2e95',
  'name': 'False memory spell',
  'incantation': None,
  'effect': "Implants a false memory in the victim's mind",
  'canBeVerbal': None,
  'type': 'Spell',
  'light': 'None',
  'creator': 'Mnemone Radford c. 1900s (possibly)'},
 {'id': '5984fe39-c7e6-4ff9-ad8d-35cca813a239',
  'name': 'Shield penetration spell',
  'incantation': None,
  'effect': 'Used to break down magical shields',
  'canBeVerbal': None,
  'type': 'Spell',
  'light': 'BlueishWhite',
  'creator': None},
 {'id': 'e93e47e2-fcae-4c8c-ad0e-f6f4804b04b2',
  'name': 'Shooting spell',
  'incantation': None,
  'effect': 'Small explosion with a gunshot-sound',
  'canBeVerbal': None,
  'type': 'Spell',
  'light': 'Transparent',
  'creator': None}]

### Cleaning up our code

The code we have works, however assuming we want to make an API call but using a different end point, we will have to write the same code again. We want to refactor our code to adhere the `DRY` priniciple, `Do Not Repeat Yourself`. If you find yourself writing the same code over and over again, MODULARIZE IT.

In [6]:
# modularizing our code
def extract_api_basic(uri):
  """Description:
      Ingest data from API endpoint
    Parameters:
      uri : URL of API endpoint where we are ingesting data from
    Returns:
      JSON data
      """
  res = requests.get(uri)
  data = res.json()

  return data

extract_api_basic("https://wizard-world-api.herokuapp.com/Spells?Type=Spell")

[{'id': '3ba417ce-8165-464d-9f29-daf23da1b2bc',
  'name': "Albus Dumbledore's forceful spell",
  'incantation': None,
  'effect': 'Great force',
  'canBeVerbal': None,
  'type': 'Spell',
  'light': 'Transparent',
  'creator': None},
 {'id': '0cc4ee69-4aa8-49e1-bb89-125793ed2e95',
  'name': 'False memory spell',
  'incantation': None,
  'effect': "Implants a false memory in the victim's mind",
  'canBeVerbal': None,
  'type': 'Spell',
  'light': 'None',
  'creator': 'Mnemone Radford c. 1900s (possibly)'},
 {'id': '5984fe39-c7e6-4ff9-ad8d-35cca813a239',
  'name': 'Shield penetration spell',
  'incantation': None,
  'effect': 'Used to break down magical shields',
  'canBeVerbal': None,
  'type': 'Spell',
  'light': 'BlueishWhite',
  'creator': None},
 {'id': 'e93e47e2-fcae-4c8c-ad0e-f6f4804b04b2',
  'name': 'Shooting spell',
  'incantation': None,
  'effect': 'Small explosion with a gunshot-sound',
  'canBeVerbal': None,
  'type': 'Spell',
  'light': 'Transparent',
  'creator': None}]

### Pimping Our Code
The code is better, it is modular.
The following improvements can be made on the code:

*  Making the code robust - can gracefully handle unexpected errors without crashing.
*  Introduce observability. These are like checkpoints that let us know whether the action was successful or not.

NOTE : For this purpose we will use print statements. The best practice is to implement logging in your code. **YOU WILL NOT RUN YOUR PIPELINES IN NOTEBOOKS !!**. This is only for demonstration purposes.



In [7]:
def extract_api_enhanced(uri):
  """Description:
      Ingest json data from API endpoint
    Parameters:
      uri : URL of API endpoint where we are ingesting data from
    Returns:
      JSON data
      """
  print(f"Extracting data from {uri}")

  try:
    res = requests.get(uri)
    data = res.json()
  except Exception as e:
    print(f"Exception {e} while extracting data from {e}")
    data = res.status

  return data

data_json = extract_api_enhanced("https://wizard-world-api.herokuapp.com/Spells?Type=Spell")
#error = extract_api_enhanced("https://wizard-world-api.herokuapp.com/Spells?Type=Spells")



Extracting data from https://wizard-world-api.herokuapp.com/Spells?Type=Spell


In [8]:
error = extract_api_enhanced("https://wizard-world-api.herokuapp.com/Spells?Type=Spells")
error

Extracting data from https://wizard-world-api.herokuapp.com/Spells?Type=Spells


{'errors': {'Type': ["The value 'Spells' is not valid for Type."]},
 'type': 'https://tools.ietf.org/html/rfc7231#section-6.5.1',
 'title': 'One or more validation errors occurred.',
 'status': 400,
 'traceId': '00-d32d1f14ff247e41a6b8e7964c07e808-3ede910b22c0e64d-00'}

### Some Commentary
Notice, when you comment out the error variable and run the code, the programs does not crush despite the fact we have entered an endpoint that does not exist i.e we have mispelled 'Spells'. Try entering some spelling errors in the basic function and notice how the program crashes when it encounters an unexpected error such as invalid input. It is good practise to implement error handling in your pipelines.

### 2. Cat Ninjas API

Link to the API:

https://catfact.ninja

In [9]:
## 2. CAT NINJAS API

In [10]:
# Exercise - using the best practices mentioned in this notebook, extract the breeds of cat API
# https://catfact.ninja

import logging

# Create a logger 
logger = logging.getLogger('etl_logger') 

# Set the level of the logger (DEBUG, INFO, WARNING, ERROR, CRITICAL) 
logger.setLevel(logging.INFO) 

# Create a file handler to log messages to a file 
file_handler = logging.FileHandler('etl_log.log') 

# Create a formatter 
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s') 

# Add the formatter to the handler 
file_handler.setFormatter(formatter) 
 
# Add the handler to the logger 
logger.addHandler(file_handler) 

def extract_cats(uri):
  """Description:
      Ingest data from API endpoint
    Parameters:
      uri : URL of API endpoint where we are ingesting data from
    Returns:
      JSON data
      """

  logger.info(f"Extracting data from {uri}")

  try:
    res = requests.get(uri)
    data = res.json()
  except Exception as e:
    logger.error(f"Exception {e} while extracting data from {e}")
    data = res.status

  return data

In [11]:
uri = "https://catfact.ninja/breeds"
cats = extract_cats(uri)  

In [12]:
cats

{'current_page': 1,
 'data': [{'breed': 'Abyssinian',
   'country': 'Ethiopia',
   'origin': 'Natural/Standard',
   'coat': 'Short',
   'pattern': 'Ticked'},
  {'breed': 'Aegean',
   'country': 'Greece',
   'origin': 'Natural/Standard',
   'coat': 'Semi-long',
   'pattern': 'Bi- or tri-colored'},
  {'breed': 'American Curl',
   'country': 'United States',
   'origin': 'Mutation',
   'coat': 'Short/Long',
   'pattern': 'All'},
  {'breed': 'American Bobtail',
   'country': 'United States',
   'origin': 'Mutation',
   'coat': 'Short/Long',
   'pattern': 'All'},
  {'breed': 'American Shorthair',
   'country': 'United States',
   'origin': 'Natural',
   'coat': 'Short',
   'pattern': 'All but colorpoint'},
  {'breed': 'American Wirehair',
   'country': 'United States',
   'origin': 'Mutation',
   'coat': 'Rex',
   'pattern': 'All but colorpoint'},
  {'breed': 'Arabian Mau',
   'country': 'Arabian Peninsula',
   'origin': 'Natural',
   'coat': 'Short',
   'pattern': ''},
  {'breed': 'Austral

In [13]:
# Extract a random cat fact from the api using the same function you wrote
uri = "https://catfact.ninja/fact"
cat_fact = extract_cats(uri)



In [14]:
cat_fact

{'fact': 'There is a species of cat smaller than the average housecat. It is native to Africa and it is the Black-footed cat (Felis nigripes). Its top weight is 5.5 pounds.',
 'length': 162}

### 3. Vehicle API

After a demonstration of ingesting data from this API, you are to attempt some exercises on ingesting data from the different endpoints in the API.

https://vpic.nhtsa.dot.gov/api/

In [15]:
# Example 1 - Get all manufacturers

# we can still use our enhanced function by passing the entire query string

manufacturer_data = extract_api_enhanced("https://vpic.nhtsa.dot.gov/api/vehicles/getallmanufacturers?format=json&page=2")
pprint.pprint(manufacturer_data)


Extracting data from https://vpic.nhtsa.dot.gov/api/vehicles/getallmanufacturers?format=json&page=2
{'Count': 100,
 'Message': 'Response returned successfully',
 'Results': [{'Country': 'UNITED STATES (USA)',
              'Mfr_CommonName': 'Toyota',
              'Mfr_ID': 1090,
              'Mfr_Name': 'TOYOTA MOTOR MANUFACTURING, TEXAS, INC.',
              'VehicleTypes': []},
             {'Country': 'UNITED STATES (USA)',
              'Mfr_CommonName': 'Toyota',
              'Mfr_ID': 1091,
              'Mfr_Name': 'TOYOTA MOTOR MANUFACTURING, NORTHERN KENTUCKY, INC.',
              'VehicleTypes': []},
             {'Country': 'UNITED STATES (USA)',
              'Mfr_CommonName': 'Thomas Grinding',
              'Mfr_ID': 1092,
              'Mfr_Name': 'THOMAS GRINDING INC.',
              'VehicleTypes': [{'IsPrimary': True, 'Name': 'Trailer'}]},
             {'Country': 'UNITED STATES (USA)',
              'Mfr_CommonName': 'AM GENERAL',
              'Mfr_ID': 1093,
   

In [16]:
# getting creative with our extract function

def extract_vehicle_data(endpoint, search_param=None, format="json" ):
  """Description:
      Ingest data from API endpoint
    Parameters:
      endpoint : API endpoint hosting data
      search_param : string to query endpoint e.g model name - "mercedes
      format : data format for response - Options are json, csv, xml, defaults to json
    Returns:
      JSON data
      """

  params = {"format" : format}

  if search_param is None:
    uri = f"https://vpic.nhtsa.dot.gov/api/vehicles/{endpoint}"
  else:
    #params[endpoint] = search_param
    uri = f"https://vpic.nhtsa.dot.gov/api/vehicles/{endpoint}/{search_param}"


  print(f"Extracting data from {uri}")

  try:
    res = requests.get(uri, params=params)
    data = res.json()
  except Exception as e:
    print(f"Exception {e} while extracting data from {e}")
    data = pd.DataFrame()


  return data

manufacturer_data = extract_vehicle_data("getvehicletypesformake","mercedes")
manufacturer_data


Extracting data from https://vpic.nhtsa.dot.gov/api/vehicles/getvehicletypesformake/mercedes


{'Count': 5,
 'Message': 'Response returned successfully',
 'SearchCriteria': 'Make: mercedes',
 'Results': [{'MakeId': 449,
   'MakeName': 'MERCEDES-BENZ',
   'VehicleTypeId': 2,
   'VehicleTypeName': 'Passenger Car'},
  {'MakeId': 449,
   'MakeName': 'MERCEDES-BENZ',
   'VehicleTypeId': 3,
   'VehicleTypeName': 'Truck'},
  {'MakeId': 449,
   'MakeName': 'MERCEDES-BENZ',
   'VehicleTypeId': 5,
   'VehicleTypeName': 'Bus'},
  {'MakeId': 449,
   'MakeName': 'MERCEDES-BENZ',
   'VehicleTypeId': 7,
   'VehicleTypeName': 'Multipurpose Passenger Vehicle (MPV)'},
  {'MakeId': 449,
   'MakeName': 'MERCEDES-BENZ',
   'VehicleTypeId': 10,
   'VehicleTypeName': 'Incomplete Vehicle'}]}

In [17]:
# Example 1 - get all makes

makes = extract_vehicle_data("getallmakes")
makes

Extracting data from https://vpic.nhtsa.dot.gov/api/vehicles/getallmakes


{'Count': 11445,
 'Message': 'Response returned successfully',
 'SearchCriteria': None,
 'Results': [{'Make_ID': 12858, 'Make_Name': '#1 ALPINE CUSTOMS'},
  {'Make_ID': 4877, 'Make_Name': '1/OFF KUSTOMS, LLC'},
  {'Make_ID': 11257, 'Make_Name': '102 IRONWORKS, INC.'},
  {'Make_ID': 12255, 'Make_Name': '12832429 CANADA INC.'},
  {'Make_ID': 13053, 'Make_Name': '137 INDUSTRIES INC.'},
  {'Make_ID': 6387, 'Make_Name': '17 CREEK ENTERPRISES'},
  {'Make_ID': 12948, 'Make_Name': '1955 CUSTOM BELAIR'},
  {'Make_ID': 9172, 'Make_Name': '1M CUSTOM CAR TRANSPORTS, INC.'},
  {'Make_ID': 6124, 'Make_Name': '1ST CHOICE MANUFACTURING INC'},
  {'Make_ID': 12972, 'Make_Name': '2 GOLDEN EAGLES'},
  {'Make_ID': 6488, 'Make_Name': '2-G TRAILER CO LLC'},
  {'Make_ID': 612, 'Make_Name': '2231545 ONTARIO'},
  {'Make_ID': 11399, 'Make_Name': '24/7 ONSITE CAMERAS INC'},
  {'Make_ID': 608, 'Make_Name': '280 TRAILERS'},
  {'Make_ID': 10123, 'Make_Name': '3 CUSTOM SOLUTIONS'},
  {'Make_ID': 11253, 'Make_Name': '

In [18]:
# Example 2 - get all the vehicle types that subaru makes in a CSV file
subaru_makes = extract_vehicle_data("getmakeformanufacturer","subaru")
subaru_makes

Extracting data from https://vpic.nhtsa.dot.gov/api/vehicles/getmakeformanufacturer/subaru


{'Count': 6,
 'Message': 'Results returned successfully',
 'SearchCriteria': 'Manufacturer:subaru',
 'Results': [{'Make_ID': 448,
   'Make_Name': 'TOYOTA',
   'Mfr_Name': 'FUJI HEAVY INDUSTRIES U.S.A., INC. (C/O SUBARU OF AMERICA)'},
  {'Make_ID': 448, 'Make_Name': 'TOYOTA', 'Mfr_Name': 'SUBARU CORPORATION'},
  {'Make_ID': 523,
   'Make_Name': 'SUBARU',
   'Mfr_Name': 'FUJI HEAVY INDUSTRIES U.S.A., INC. (C/O SUBARU OF AMERICA)'},
  {'Make_ID': 523,
   'Make_Name': 'SUBARU',
   'Mfr_Name': 'SUBARU OF AMERICA, INC'},
  {'Make_ID': 523, 'Make_Name': 'SUBARU', 'Mfr_Name': 'SUBARU CORPORATION'},
  {'Make_ID': 572,
   'Make_Name': 'SAAB',
   'Mfr_Name': 'FUJI HEAVY INDUSTRIES U.S.A., INC. (C/O SUBARU OF AMERICA)'}]}

In [19]:
# Exercise 1 - from the API endpoint - get all models for the make toyota
toyota_makes = extract_vehicle_data("getmakeformanufacturer","toyota")
toyota_makes

Extracting data from https://vpic.nhtsa.dot.gov/api/vehicles/getmakeformanufacturer/toyota


{'Count': 21,
 'Message': 'Results returned successfully',
 'SearchCriteria': 'Manufacturer:toyota',
 'Results': [{'Make_ID': 448,
   'Make_Name': 'TOYOTA',
   'Mfr_Name': 'TOYOTA MOTOR NORTH AMERICA, INC'},
  {'Make_ID': 448,
   'Make_Name': 'TOYOTA',
   'Mfr_Name': 'TOYOTA MOTOR CORPORATION'},
  {'Make_ID': 448,
   'Make_Name': 'TOYOTA',
   'Mfr_Name': 'TOYOTA MOTOR MFG DE BAJA CALIFORNIA S DE RL DE CV'},
  {'Make_ID': 448,
   'Make_Name': 'TOYOTA',
   'Mfr_Name': 'TOYOTA MOTOR MANUFACTURING, CALIFORNIA, INC.'},
  {'Make_ID': 448,
   'Make_Name': 'TOYOTA',
   'Mfr_Name': 'TOYOTA MOTOR MANUFACTURING CANADA'},
  {'Make_ID': 448,
   'Make_Name': 'TOYOTA',
   'Mfr_Name': 'TOYOTA MOTOR MANUFACTURING, INDIANA, INC.'},
  {'Make_ID': 448,
   'Make_Name': 'TOYOTA',
   'Mfr_Name': 'TOYOTA MOTOR MANUFACTURING, KENTUCKY, INC.'},
  {'Make_ID': 448,
   'Make_Name': 'TOYOTA',
   'Mfr_Name': 'TOYOTA MOTOR MANUFACTURING, TEXAS, INC.'},
  {'Make_ID': 448,
   'Make_Name': 'TOYOTA',
   'Mfr_Name': 'TOYO

In [20]:
# Exercise 2 - gell all the manufacturer details for the model ford
ford_makes = extract_vehicle_data("getmakeformanufacturer","ford")
ford_makes

Extracting data from https://vpic.nhtsa.dot.gov/api/vehicles/getmakeformanufacturer/ford


{'Count': 39,
 'Message': 'Results returned successfully',
 'SearchCriteria': 'Manufacturer:ford',
 'Results': [{'Make_ID': 460,
   'Make_Name': 'FORD',
   'Mfr_Name': 'FORD MOTOR COMPANY'},
  {'Make_ID': 460,
   'Make_Name': 'FORD',
   'Mfr_Name': 'FORD MOTOR COMPANY OF CANADA, LTD.'},
  {'Make_ID': 460,
   'Make_Name': 'FORD',
   'Mfr_Name': 'FORD MOTOR COMPANY, MEXICO'},
  {'Make_ID': 460,
   'Make_Name': 'FORD',
   'Mfr_Name': 'FORD OTOMOTIV SANAYI A.S., TURKEY'},
  {'Make_ID': 460,
   'Make_Name': 'FORD',
   'Mfr_Name': 'KIA MOTORS (FORD IMPORTS)'},
  {'Make_ID': 460, 'Make_Name': 'FORD', 'Mfr_Name': 'FORD WERKE AG'},
  {'Make_ID': 460,
   'Make_Name': 'FORD',
   'Mfr_Name': 'FORD MOTOR COMPANY BRASIL LTDA.'},
  {'Make_ID': 460, 'Make_Name': 'FORD', 'Mfr_Name': 'FORD INDIA LTD'},
  {'Make_ID': 464, 'Make_Name': 'LINCOLN', 'Mfr_Name': 'FORD MOTOR COMPANY'},
  {'Make_ID': 464,
   'Make_Name': 'LINCOLN',
   'Mfr_Name': 'FORD MOTOR COMPANY OF CANADA, LTD.'},
  {'Make_ID': 464,
   'Mak

In [21]:
# NOTE - consult the documentation for these exercises

### 4. Football data API

https://www.football-data.org/documentation/quickstart

key : `e257b2f572c2479090faf7220e4dd673`

NOTE: The API key is limited to 10 API calls / minute.


In [22]:
import requests

def extract_api_enhanced(uri, api_key=None):
    """Description:
        Ingest JSON data from API endpoint
    Parameters:
        uri : URL of API endpoint where we are ingesting data from
        api_key : API key for authentication (optional)
    Returns:
        JSON data
    """
    print(f"Extracting data from {uri}")

    headers = {}
    if api_key:
        headers['Authorization'] = f'Bearer {api_key}'

    try:
        res = requests.get(uri, headers=headers)
        res.raise_for_status()  # Raises HTTPError for bad responses
        data = res.json()
    except requests.exceptions.HTTPError as http_err:
        print(f"HTTP error occurred: {http_err}")
        data = None
    except Exception as e:
        print(f"Exception occurred: {e}")
        data = None

    return data


In [23]:
teams = extract_api_enhanced("http://api.football-data.org/v4/competitions/WC/teams", 
                             api_key="e257b2f572c2479090faf7220e4dd673")
teams

Extracting data from http://api.football-data.org/v4/competitions/WC/teams
HTTP error occurred: 403 Client Error:  for url: http://api.football-data.org/v4/competitions/WC/teams
