# API Data

An **API** (Application Programming Interface) is a web application that allows a user (machine or human) to "talk" to server and request information. The API receives the request and responds back with the information. APIs control the access point for the server. Although there are many public APIs available, organizations with valuable or proprietary information may require an authentication credential for the user accessing its server. 

![API request process](https://miro.medium.com/max/1000/1*OcmVkcsM5BWRHrg8GC17iw.png)
Source: [Medium: What exactly IS an API?](https://medium.com/@perrysetgo/what-exactly-is-an-api-69f36968a41f)


In this lesson, we will connect to the [BreweryDB API](https://www.brewerydb.com/developers/apps) with an API key credential for authorization. Within the API, we will load in the JSON-formatted data and explore the data to identify the structure. Then we will extract the necessary information to make a dataframe of beers and their characteristics.

In [1]:
import requests
import json
import pandas as pd
import numpy as np

## Load API key

Before we can connect to the API, you must first create an account on the [BreweryDB website](https://www.brewerydb.com/developers/apps). After your account is created, go to "Developers" in the menu area on the website and select "My API Keys" from the dropdown list. In the "Sandbox API Keys" section, copy the string of characters for the Sandbox key. 

*Note*: For the purposes of the lesson, we will access their "sandbox" version of the API. An API key for the full database can be purchased through the website.

Then open a plain text file and paste the sandbox key in the file. Save the file and name it `brewDB_key.txt`. Make sure that the text file is save in the same location as this notebook file.

**CAUTION**: When using an authentication credential for an API, treat your credential information like a password. You do not want anybody to see or use your API credentials!! There are many methods that are used to save API authentication credentials to load into the program. In this example we are using a plain text file, which is simple but is one of the least secure methods.

In [3]:
with open("brewDB_key.txt") as file:
    api_key = file.read()

In [4]:
# DO NOT LEAVE YOUR API KEY SHOWING IN YOUR CODE
api_key

'38e61f7e2824bd6e11bad498122148b7'

## Connect to API

Each API has their endpoint structure (URL) for sending a request (with authentication, if required). Read the API documentation for the website before proceeding with any requests. Limitations can include number of request calls, access privilege, and legal use of data.

Using the `requests` library to connect to the internet, we will send a request for access to the API (along with the authentication key). Then we will load the JSON data in from the API.

In [5]:
# endpoint structure for BreweryDB API
url = r"https://sandbox-api.brewerydb.com/v2/beers/?key="

In [6]:
# full URL with API attached
url + api_key

'https://sandbox-api.brewerydb.com/v2/beers/?key=38e61f7e2824bd6e11bad498122148b7'

In [7]:
# send the request to the API
response = requests.get(url + api_key)

In [8]:
# check the status code
# if other than 200 (OK), check API key
response.status_code

200

In [9]:
# load JSON data from API
beerdata = response.json()

In [10]:
# identify root data structure
type(beerdata)

dict

## Explore data structure

Because the root data structure of the JSON sent back from the API is a dictionary, we can check what keys are available to access other information. From there, we can find the structure path for the data that we want to extract for the dataframe to analyze.

In [11]:
# get keys for dictionary
beerdata.keys()

dict_keys(['currentPage', 'numberOfPages', 'totalResults', 'data', 'status'])

In [12]:
# access value in "currentPage" key
beerdata['currentPage']

1

In [13]:
# access value in "numberOfPages" key
beerdata['numberOfPages']

23

In [14]:
# access value in "totalResults" key
beerdata['totalResults']

1109

In [15]:
# access value in "status" key
beerdata['status']

'success'

`currentPage`, `numberOfPages`, `totalResults`, and `status` all reach the end of their data structure. Since we are working in the sandbox environment, only the 1st page of the data is available to view. In the full dataset, the data collection script would need to iterate through each page to get to the next chunk of data.

In [16]:
# access value in "data" key
beerdata['data']

[{'id': 'c4f2KE',
  'name': "'Murican Pilsner",
  'nameDisplay': "'Murican Pilsner",
  'abv': '5.5',
  'glasswareId': 4,
  'styleId': 98,
  'isOrganic': 'N',
  'isRetired': 'N',
  'labels': {'icon': 'https://brewerydb-images.s3.amazonaws.com/beer/c4f2KE/upload_jjKJ7g-icon.png',
   'medium': 'https://brewerydb-images.s3.amazonaws.com/beer/c4f2KE/upload_jjKJ7g-medium.png',
   'large': 'https://brewerydb-images.s3.amazonaws.com/beer/c4f2KE/upload_jjKJ7g-large.png',
   'contentAwareIcon': 'https://brewerydb-images.s3.amazonaws.com/beer/c4f2KE/upload_jjKJ7g-contentAwareIcon.png',
   'contentAwareMedium': 'https://brewerydb-images.s3.amazonaws.com/beer/c4f2KE/upload_jjKJ7g-contentAwareMedium.png',
   'contentAwareLarge': 'https://brewerydb-images.s3.amazonaws.com/beer/c4f2KE/upload_jjKJ7g-contentAwareLarge.png'},
  'status': 'verified',
  'statusDisplay': 'Verified',
  'createDate': '2013-08-19 11:58:12',
  'updateDate': '2018-11-02 02:15:14',
  'glass': {'id': 4, 'name': 'Pilsner', 'createD

In [17]:
# check data type
type(beerdata['data'])

list

In [18]:
# check number of items in list
len(beerdata['data'])

50

In [19]:
# access the first item in the list
beerdata['data'][0]

{'id': 'c4f2KE',
 'name': "'Murican Pilsner",
 'nameDisplay': "'Murican Pilsner",
 'abv': '5.5',
 'glasswareId': 4,
 'styleId': 98,
 'isOrganic': 'N',
 'isRetired': 'N',
 'labels': {'icon': 'https://brewerydb-images.s3.amazonaws.com/beer/c4f2KE/upload_jjKJ7g-icon.png',
  'medium': 'https://brewerydb-images.s3.amazonaws.com/beer/c4f2KE/upload_jjKJ7g-medium.png',
  'large': 'https://brewerydb-images.s3.amazonaws.com/beer/c4f2KE/upload_jjKJ7g-large.png',
  'contentAwareIcon': 'https://brewerydb-images.s3.amazonaws.com/beer/c4f2KE/upload_jjKJ7g-contentAwareIcon.png',
  'contentAwareMedium': 'https://brewerydb-images.s3.amazonaws.com/beer/c4f2KE/upload_jjKJ7g-contentAwareMedium.png',
  'contentAwareLarge': 'https://brewerydb-images.s3.amazonaws.com/beer/c4f2KE/upload_jjKJ7g-contentAwareLarge.png'},
 'status': 'verified',
 'statusDisplay': 'Verified',
 'createDate': '2013-08-19 11:58:12',
 'updateDate': '2018-11-02 02:15:14',
 'glass': {'id': 4, 'name': 'Pilsner', 'createDate': '2012-01-03 0

In [20]:
# check data type
type(beerdata['data'][0])

dict

In [21]:
# access second item in list
beerdata['data'][1]

{'id': 'zTTWa2',
 'name': '11.5° PLATO',
 'nameDisplay': '11.5° PLATO',
 'description': 'The Plato scale is a measurement of the density of liquid. The number tells brewers how big or small a resulting beer will be—the larger the number the bigger the beer. We designed 11.5° Plato—a lower number on the beer scale—to give us just enough body to support a heavy heap of hops. The result is an easy-drinking session IPA which satisfies the thirst for hops, but urges you to have another round.',
 'abv': '4.5',
 'ibu': '35',
 'styleId': 164,
 'isOrganic': 'N',
 'isRetired': 'N',
 'status': 'verified',
 'statusDisplay': 'Verified',
 'originalGravity': '1.046',
 'createDate': '2016-08-09 14:44:42',
 'updateDate': '2018-11-02 02:15:14',
 'style': {'id': 164,
  'categoryId': 3,
  'category': {'id': 3,
   'name': 'North American Origin Ales',
   'createDate': '2012-03-21 20:06:45'},
  'name': 'Session India Pale Ale',
  'shortName': 'Session IPA',
  'description': 'Session India Pale Ales are gold

The first and second items (nested dictionaries) in the list under the `beerdata['data']` key seem to be similar in structure. At a quick glance, each dictionary look like the information for a beer.

In [22]:
# get keys for the first beer's dictionary
beerdata['data'][0].keys()

dict_keys(['id', 'name', 'nameDisplay', 'abv', 'glasswareId', 'styleId', 'isOrganic', 'isRetired', 'labels', 'status', 'statusDisplay', 'createDate', 'updateDate', 'glass', 'style'])

In [23]:
# get keys for second beer's dictionary
beerdata['data'][1].keys()

dict_keys(['id', 'name', 'nameDisplay', 'description', 'abv', 'ibu', 'styleId', 'isOrganic', 'isRetired', 'status', 'statusDisplay', 'originalGravity', 'createDate', 'updateDate', 'style'])

The second beer in the dataset has dictionary keys that do not exist in the first beer dictionary, such as `originalGravity`. This data is ***missing*** from the first beer and will later be referenced when extracting the data.

In [24]:
# access value for "id" key
beerdata['data'][1]['id']

'zTTWa2'

In [25]:
# access value for "name" key
beerdata['data'][1]['name']

'11.5° PLATO'

In [26]:
# access value for "nameDisplay" key
beerdata['data'][1]['nameDisplay']

'11.5° PLATO'

In [27]:
# access value for "description" key
beerdata['data'][1]['description']

'The Plato scale is a measurement of the density of liquid. The number tells brewers how big or small a resulting beer will be—the larger the number the bigger the beer. We designed 11.5° Plato—a lower number on the beer scale—to give us just enough body to support a heavy heap of hops. The result is an easy-drinking session IPA which satisfies the thirst for hops, but urges you to have another round.'

In [28]:
# # access value for "abv" key
beerdata['data'][1]['abv']

'4.5'

In [29]:
# access value for "ibu" key
beerdata['data'][1]['ibu']

'35'

In [30]:
# access value for "isOrganic" key
beerdata['data'][1]['isOrganic']

'N'

In [31]:
# access value for "isRetired" key
beerdata['data'][1]['isRetired']

'N'

In [32]:
# access value for "originalGravity" key
beerdata['data'][1]['originalGravity']

'1.046'

In [33]:
# access value for "createDate" key
beerdata['data'][1]['createDate']

'2016-08-09 14:44:42'

In [34]:
# access value for "updateDate" key
beerdata['data'][1]['updateDate']

'2018-11-02 02:15:14'

Many of the keys in this dictionary reach the end of the data structure, which is information we can extract for the dataframe. However, the `style` key access a nested dictionary with information about the beer category that a particular beer belongs to.

In [35]:
# access value for "style" key
beerdata['data'][1]['style']

{'id': 164,
 'categoryId': 3,
 'category': {'id': 3,
  'name': 'North American Origin Ales',
  'createDate': '2012-03-21 20:06:45'},
 'name': 'Session India Pale Ale',
 'shortName': 'Session IPA',
 'description': 'Session India Pale Ales are gold to copper. Chill haze is allowable at cold temperatures and hop haze is allowable at any temperature. Fruity-ester aroma is light to moderate. Hop aroma is medium to high with qualities from a wide variety of hops from all over the world. Low to medium maltiness is present. Hop flavor is strong, characterized by flavors from a wide variety of hops. Hop bitterness is medium to high. Fruity-ester flavors are low to moderate. Diacetyl is absent or at very low levels. Body is low to medium.',
 'createDate': '2015-04-07 17:07:27'}

In [36]:
# verify data type
type(beerdata['data'][1]['style'])

dict

In [37]:
# get keys for dictionary
beerdata['data'][1]['style'].keys()

dict_keys(['id', 'categoryId', 'category', 'name', 'shortName', 'description', 'createDate'])

In [38]:
# access value for "name" key
beerdata['data'][1]['style']['name']

'Session India Pale Ale'

In [39]:
# access value for "description" key
beerdata['data'][1]['style']['description']

'Session India Pale Ales are gold to copper. Chill haze is allowable at cold temperatures and hop haze is allowable at any temperature. Fruity-ester aroma is light to moderate. Hop aroma is medium to high with qualities from a wide variety of hops from all over the world. Low to medium maltiness is present. Hop flavor is strong, characterized by flavors from a wide variety of hops. Hop bitterness is medium to high. Fruity-ester flavors are low to moderate. Diacetyl is absent or at very low levels. Body is low to medium.'

## Extract Data

Now that we have identified the structure for beers dataset, we can collect the following information:
- ID
- date created
- date updated
- beer name
- description
- ABV (alcohol by volume)
- gravity
- IBU (international bittering unit)
- if it is organic
- if it is retired
- beer style category name
- beer style category description

We will use the same key names in the dataset for the dictionary. The key names will be used to access the values within the JSON data. However, because some of the keys do not exist for certain beer's dictionaries, we will use `try`/`except` to handle any `KeyError`s.

In [40]:
# each dictionary in the "data" key list is a beer
beers = beerdata['data']

In [41]:
beers_info = {'id':[],
              'createDate':[],
              'updateDate':[],
              'name':[],
              'description':[],
              'abv':[],
              'originalGravity':[],
              'ibu':[],
              'isOrganic':[],
              'isRetired':[]
             }

# hold the style category information
style_name = []
style_descr = []

In [42]:
for beer in beers:
    
    for key in beers_info.keys():
        try:
            beers_info[key].append(beer[key])
        except KeyError:
            beers_info[key].append(np.nan)
    
    try:
        style_name.append(beer['style']['name'])
    except KeyError:
            style_name.append(np.nan)
            
    try:
        style_descr.append(beer['style']['description'])
    except KeyError:
            style_descr.append(np.nan)

In [43]:
# check collected information in "name" key of beers_info dictionary
beers_info['name']

["'Murican Pilsner",
 '11.5° PLATO',
 '12th Of Never',
 '15th Anniversary Ale',
 '16 So Fine Red Wheat Wine',
 '1794 The Fergal Project',
 '17th Saison',
 '18th Anniversary Belgian Tripel',
 '19 - Golden Belgian Style Ale',
 '1904 American Red Lager',
 '2 x 4',
 '200th Anniversary Export Stout',
 '2017 Beer Camp',
 '20th Anniversary Imperial Hash IPA on Brett',
 '20th Street Ale Citra',
 '20th Street Ale Crystal',
 '20th Street Ale Magnum',
 '21st Anniversary',
 '25th Anniversary',
 '3 Weight',
 "30th Anniversary - Charlie, Fred & Ken's Bock",
 "30th Anniversary - Fritz and Ken's Ale",
 '30th Anniversary - Grand Cru',
 "30th Anniversary - Jack & Ken's Ale",
 "35th Anniversary  - Brewer's Reserve",
 '420 Extra Pale Ale',
 '420 Fest',
 '420 Strain G13 IPA',
 '471 Double IPA - Hull Melon',
 '471 ESB - Extra Special Bitter',
 '471 IPA Barrel Series: Citra',
 '471 IPA Barrel Series: Eureka!',
 '471 IPA. Aggressive Hoppiness',
 '471 Pilsner',
 '471 Small Batch IPA',
 "5 C's IPA",
 '7 Birds',

In [44]:
# check collected information in style_name list
style_name

['American-Style Pilsener',
 'Session India Pale Ale',
 'American-Style Pale Ale',
 'Extra Special Bitter',
 'American-Style Wheat Wine Ale',
 'American-Style Stout',
 'French & Belgian-Style Saison',
 'Belgian-Style Tripel',
 'Wood- and Barrel-Aged Strong Beer',
 'American-Style Lager',
 'Other Belgian-Style Ales',
 'Foreign (Export)-Style Stout',
 'American-Style India Pale Ale',
 'Imperial or Double India Pale Ale',
 nan,
 nan,
 nan,
 'Belgian-Style Flanders Oud Bruin or Oud Red Ales',
 'American-Style Imperial Porter',
 'Session India Pale Ale',
 'German-Style Heller Bock/Maibock',
 'American-Style Imperial Stout',
 'Other Strong Ale or Lager',
 'American-Style Barley Wine Ale',
 'Strong Ale',
 'American-Style Pale Ale',
 'Imperial or Double India Pale Ale',
 'American-Style India Pale Ale',
 'Imperial or Double India Pale Ale',
 'Extra Special Bitter',
 'Imperial or Double India Pale Ale',
 'Imperial or Double India Pale Ale',
 'Imperial or Double India Pale Ale',
 'American-Style

# Create Dataframe

Now that the data is collected, we can put it into a dataframe. The style name and description are in separate lists, so we will attach them to the dataframe after it is created.

In [45]:
# use beers_info dictionary to make dataframe
beer_df = pd.DataFrame(data=beers_info)
beer_df.head()

Unnamed: 0,id,createDate,updateDate,name,description,abv,originalGravity,ibu,isOrganic,isRetired
0,c4f2KE,2013-08-19 11:58:12,2018-11-02 02:15:14,'Murican Pilsner,,5.5,,,N,N
1,zTTWa2,2016-08-09 14:44:42,2018-11-02 02:15:14,11.5° PLATO,The Plato scale is a measurement of the densit...,4.5,1.046,35.0,N,N
2,zfP2fK,2016-08-03 23:25:54,2018-11-02 02:15:14,12th Of Never,"Tropically Hoppy. Light, yet Full-Bodied. Brig...",5.5,1.05,45.0,N,N
3,xwYSL2,2015-04-16 15:44:15,2018-11-02 02:15:14,15th Anniversary Ale,For the ﬁrst ever SweetWater anniversary beer ...,,,,N,N
4,UJGpVS,2013-02-24 16:31:05,2018-11-02 02:15:14,16 So Fine Red Wheat Wine,For our super heady 16 year anniversary beer w...,11.0,,,N,N


In [46]:
# add two new columns for style_name and style_descr lists
beer_df['style_name'] = style_name
beer_df['style_description'] = style_descr

In [47]:
# display completed dataframe (first 5 rows)
beer_df.head()

Unnamed: 0,id,createDate,updateDate,name,description,abv,originalGravity,ibu,isOrganic,isRetired,style_name,style_description
0,c4f2KE,2013-08-19 11:58:12,2018-11-02 02:15:14,'Murican Pilsner,,5.5,,,N,N,American-Style Pilsener,This classic and unique pre-Prohibition Americ...
1,zTTWa2,2016-08-09 14:44:42,2018-11-02 02:15:14,11.5° PLATO,The Plato scale is a measurement of the densit...,4.5,1.046,35.0,N,N,Session India Pale Ale,Session India Pale Ales are gold to copper. Ch...
2,zfP2fK,2016-08-03 23:25:54,2018-11-02 02:15:14,12th Of Never,"Tropically Hoppy. Light, yet Full-Bodied. Brig...",5.5,1.05,45.0,N,N,American-Style Pale Ale,American pale ales range from deep golden to c...
3,xwYSL2,2015-04-16 15:44:15,2018-11-02 02:15:14,15th Anniversary Ale,For the ﬁrst ever SweetWater anniversary beer ...,,,,N,N,Extra Special Bitter,Extra special bitter possesses medium to stron...
4,UJGpVS,2013-02-24 16:31:05,2018-11-02 02:15:14,16 So Fine Red Wheat Wine,For our super heady 16 year anniversary beer w...,11.0,,,N,N,American-Style Wheat Wine Ale,American style wheat wines range from gold to ...
