# HW08: API
_Hamidah Alatas_
_ha2573_

### 1. Choose an API

#### a) Choose an API and briefly describe the type of data you can obtain from it

For this assignment, I will be using [**SeatGeek Events API**](https://platform.seatgeek.com/). This API let me to obtain a directory of live events in the United States and Canada.  I will be using this API to retrieve Broadway show information such as number of ticket available, average price, city, name of the theater, lowest price, and highest_price

#### b) Provide a link to the API documentation

This is the link to the API documentation: https://platform.seatgeek.com/

#### c) the base URL of the API you intend to use

This is the base URL of the API: https://api.seatgeek.com/2/events

However, since I am only interested in getting the data for Broadway shows, I will specifically use this URL: https://api.seatgeek.com/2/events?taxonomies.name=broadway_tickets_national

### 2. Authentication

#### a) Briefly explain how the API authenticates the user

SeatGeek API required user to pass a valid API Key (client id and client secret keys). The Platform's API follows RESTful principles. You need to have SeatGeek account to proceed. There are 2 options to authenticate, first is using `Query String Parameters` and the other one is using `HTTP Basic Auth`

#### b) Apply for an API key if necessary and provide the information (with relevant URL) how that can be done. Do not include the API key in the assignment submission.

`client_id` and `client_secret` can be obtained [SeatGeek Developers Page](https://seatgeek.com/account/develop). I have applied and got my keys for this assignment.

### 3. Send a Simple GET request

#### a) Execute a simple GET request to obtain a small amount of data from the API

Below, I will send a simple GET request for 200 list of broadway events that happened in New York during November - December and still has available ticket listing.

In [1]:
import requests
import os
from dotenv import load_dotenv
load_dotenv()

client_id = os.getenv("MY_CLIENT_ID")
client_secret = os.getenv("MY_CLIENT_SECRET")

params = {'taxonomies.name' : 'broadway_tickets_national', 'venue.city' : 'New York', 
          "listing_count.gt" : 0, "per_page" : 200,
          "datetime_local.gte" : "2021-11-01", "datetime_local.lte" : "2021-12-31"}

r = requests.get('https://api.seatgeek.com/2/events', auth=(client_id, client_secret), params = params)

Above, I add query for:

- `taxonomies.name = "broadway_tickets_national"` which to filter the search only to includes broadway show
- `venue.city = "New York"` to include only response in New York City (not national tour of Broadway)
- `listing_count.gt = 0` to include only event with available listing (tickets are still available)
- `datetime_local.gte = 2021-11-01` to include only event start from 2021-11-01
- `datetime_local.lte = 2021-12-31` to include only event before 2021-12-31
- `per_page = 200` take the first 200 events

#### b) Check (and show) the status of the request

The code is 200 which means that the API request went through with no error

In [2]:
r.status_code

200

c) Check (and show) the type of the response (e.g. XML, JSON, csv)

The type of response is JSON as seen below

In [3]:
r.headers['content-type']

'application/json'

### 4. Parse the response and Create a dataset

#### a) Take the response returned by the API and turn it into a useful Python object (e.g. a list, vector, or pandas data frame). Show the code how this is done.

In [4]:
import json

broadway_ny = r.json()
print(type(broadway_ny))  # dictionary
print(broadway_ny.keys())

<class 'dict'>
dict_keys(['events', 'meta'])


In [5]:
import pandas as pd

broadway_ny_df = pd.DataFrame(broadway_ny['events'])
broadway_ny_df.shape

(200, 31)

#### b) Using the API, create a dataset (in data frame format) for multiple records. I'd say a sample size greater than 100 is sufficient for the example but feel free to get more data if you feel ambitious and the API allows you to do that fairly easily. The dataset can include only a small subset of the returned data. Just choose some interesting features. There is no need to be inclusive here.

In [6]:
# Look at all available columns
for col in broadway_ny_df.columns:
   print(col)

type
id
datetime_utc
venue
datetime_tbd
performers
is_open
links
datetime_local
time_tbd
short_title
visible_until_utc
stats
taxonomies
url
score
announce_date
created_at
date_tbd
title
popularity
description
status
access_method
event_promotion
announcements
conditional
enddatetime_utc
last_full_index_at
themes
domain_information


In [7]:
broadway_ny_df = broadway_ny_df[["short_title", "url", "venue", "datetime_local", "score", "stats", "popularity", "event_promotion"]]

In [8]:
venue_col = broadway_ny_df['venue'].apply(pd.Series).add_prefix('venue_')
stats_col = broadway_ny_df['stats'].apply(pd.Series).add_prefix('stats_')
broadway_ny_df = pd.concat([broadway_ny_df.drop(['venue', 'stats'], axis=1), venue_col, stats_col], axis=1)

In [9]:
col = ["short_title", "url", "datetime_local", "popularity", "score", "venue_name", "venue_extended_address", "stats_listing_count", "stats_average_price", "stats_lowest_price_good_deals", "stats_lowest_price", "stats_highest_price", "event_promotion"]
broadway_ny_df_clean = broadway_ny_df[col]

In [10]:
broadway_ny_df_clean.head()

Unnamed: 0,short_title,url,datetime_local,popularity,score,venue_name,venue_extended_address,stats_listing_count,stats_average_price,stats_lowest_price_good_deals,stats_lowest_price,stats_highest_price,event_promotion
0,Waitress,https://seatgeek.com/waitress-tickets/barrymor...,2021-11-15T19:00:00,0.634,0.407,Barrymore Theatre,"New York, NY 10036",2.0,314.0,324.0,304.0,333.0,
1,Company - New York,https://seatgeek.com/company-new-york-tickets/...,2021-11-15T20:00:00,0.57,0.328,Bernard B. Jacobs Theatre,"New York, NY 10036",1.0,1281.0,,1281.0,1314.0,
2,Chicago (Musical) - New York,https://seatgeek.com/chicago-musical-new-york-...,2021-11-15T20:00:00,0.609,0.392,Ambassador Theatre,"New York, NY 10019",1.0,285.0,,285.0,292.0,
3,Six the Musical - New York,https://seatgeek.com/six-the-musical-new-york-...,2021-11-15T20:00:00,0.616,0.398,Brooks Atkinson Theatre,"New York, NY 10036",20.0,513.0,101.0,66.0,1642.0,
4,Chicken and Biscuits - New York,https://seatgeek.com/chicken-and-biscuits-new-...,2021-11-16T19:00:00,0.585,0.359,Circle In The Square Theatre,"New York, NY 10019",1.0,236.0,,233.0,239.0,


c) Provide some summary statistics of the data. Include the data frame in a .csv file called data.csv with your submission for the grader.

In [11]:
# Dimension of the data
broadway_ny_df_clean.shape

(200, 13)

There are 200 rows and 13 columns of the data

In [12]:
# Summary statistics for the numerical variables
broadway_ny_df_clean.describe()

Unnamed: 0,popularity,score,stats_listing_count,stats_average_price,stats_lowest_price_good_deals,stats_lowest_price,stats_highest_price
count,200.0,200.0,200.0,200.0,172.0,200.0,200.0
mean,0.613255,0.436325,76.27,496.86,185.784884,174.595,1071.46
std,0.076411,0.100109,75.733645,278.662582,151.380065,148.579339,955.688469
min,0.521,0.315,1.0,91.0,60.0,59.0,124.0
25%,0.565,0.36575,20.0,283.25,112.5,108.0,578.5
50%,0.587,0.397,70.5,459.5,138.5,133.0,987.0
75%,0.63075,0.48125,98.25,643.75,168.0,164.0,1409.0
max,0.829,0.676,435.0,1457.0,1135.0,1281.0,11607.0


Below, I would like to see the average of ticket price by show

In [13]:
show_avg_price = broadway_ny_df_clean[['short_title', 'stats_average_price', 'stats_lowest_price']].groupby('short_title').mean().reset_index()

5 shows with lowest average ticket price in November - December 2021

In [14]:
show_avg_price.nsmallest(5, 'stats_average_price', keep='all')

Unnamed: 0,short_title,stats_average_price,stats_lowest_price
17,Perfect Crime - New York,100.333333,68.5
24,The Office! A Musical Parody - New York,103.6,61.4
21,The Book of Mormon,148.25,106.5
8,David Byrne's American Utopia,170.833333,68.0
12,Is This A Room - New York,189.666667,120.0


5 shows with highest average ticket price in November - December 2021

In [15]:
show_avg_price.nlargest(5, 'stats_average_price', keep='all')

Unnamed: 0,short_title,stats_average_price,stats_lowest_price
19,Six the Musical - New York,1017.25,314.5
5,Clyde's - New York,997.142857,638.428571
25,To Kill a Mockingbird - New York,995.285714,106.714286
7,Company - New York,925.833333,512.0
16,Mrs. Doubtfire - New York,846.8,189.4


In [16]:
# Export the result
broadway_ny_df_clean.to_csv('broadway_seatgeek_nov_dec.csv')

### 5. API Client

#### a) API client function

Lastly, let's try to wrap the code from the previous sections into a simple API client function. 
For your API function, try to create a simple function that does the following things:

- allows the user to specify some smallish set of query parameters (from Q.3a)
- run a GET request with these parameters
- check the status of the request the server returns and inform the user of any errors (from Q.3b)
- parse the response and return a Python object to the user of the function. You can choose whether returning a list (from Q.4a) or a data frame (from Q.4b) is best.
- Add docstrings to the API client function that explain the paramters, the output, and ideally include a quick example.

Note: There is no need to make this into an Python package here. A simple function is sufficient.

In [17]:
import pandas as pd

def seatgeek_broadway_cheapest(date_start = '2021-12-01', 
                      date_end = '2021-12-31', city = 'New York',
                      n = 10, type = 'lowest'):
    """
    Function to get n cheapest Broadway (or Off-Broadway) shows by range of date from SeatGeek website.

    Parameters
    ----------
    date_start : str
      Date start of the search in YYYY-MM-DD format.
    date_end : str
      Date end of the search in YYYY-MM-DD format.
    city : str
      Name of the city of broadway show you want to look at recommended to use 'New York'.
    n : int
      Number of cheapest shows you want to observe.
    type : str
      Method to find cheapest ticket by lowest ticket price in the show ('lowest') or average ticket price ('average').
      
    Returns
    -------
    pandas.core.frame.DataFrame
      Table containing n rows of cheapest broadway shows in that range of date.

    Examples
    --------
    >>> seatgeek_broadway_cheapest(date_start = '2021-12-01',date_end = '2021-12-31',n = 3, type = 'average')
    short_title	datetime_local	popularity	venue_name	stats_listing_count	stats_average_price	stats_lowest_price	url
    The Office! A Musical Parody - New York	2021-12-05T14:00:00	0.467	The Theater Center	9.0	88.0	59.0	https://seatgeek.com/the-office-a-musical-paro...
    The Office! A Musical Parody - New York	2021-12-04T14:00:00	0.468	The Theater Center	9.0	89.0	59.0	https://seatgeek.com/the-office-a-musical-paro...
    The Office! A Musical Parody - New York	2021-12-04T21:30:00	0.480	The Theater Center	8.0	89.0	57.0	https://seatgeek.com/the-office-a-musical-paro...
    """
    params = {'taxonomies.name' : 'broadway_tickets_national', 
              'venue.city' : city,
              "listing_count.gt" : 0, 
              "per_page" : 200,
              "datetime_local.gte" : date_start, "datetime_local.lte" : date_end}
    
    response = requests.get('https://api.seatgeek.com/2/events', auth=(client_id, client_secret), params = params)
    status = response.status_code
    if status==404:
        raise Exception("404 : error (failed to make request)")
    if status==500:
        raise Exception("500 : successfully made request but had internal error")
    else:
        broadway_ny = response.json()
        broadway_ny_df = pd.DataFrame(broadway_ny['events'])
        venue_col = broadway_ny_df['venue'].apply(pd.Series).add_prefix('venue_')
        stats_col = broadway_ny_df['stats'].apply(pd.Series).add_prefix('stats_')
        broadway_ny_df = pd.concat([broadway_ny_df.drop(['venue', 'stats'], axis=1), venue_col, stats_col], axis=1)
        col = ["short_title", "datetime_local", "popularity", "venue_name", "stats_listing_count", "stats_average_price", "stats_lowest_price", "url"]
        broadway_ny_df_clean = broadway_ny_df[col]
        if type == 'lowest':
            result = broadway_ny_df_clean.nsmallest(n, 'stats_lowest_price', keep='all')
        else:
            result = broadway_ny_df_clean.nsmallest(n, 'stats_average_price', keep='all')
    return result

In [18]:
seatgeek_broadway_cheapest(date_start = '2021-12-01', 
                      date_end = '2021-12-31', 
                      n = 10, type = 'average')

Unnamed: 0,short_title,datetime_local,popularity,venue_name,stats_listing_count,stats_average_price,stats_lowest_price,url
149,The Office! A Musical Parody - New York,2021-12-05T14:00:00,0.467,The Theater Center,9.0,88.0,59.0,https://seatgeek.com/the-office-a-musical-paro...
93,The Office! A Musical Parody - New York,2021-12-04T14:00:00,0.468,The Theater Center,9.0,89.0,59.0,https://seatgeek.com/the-office-a-musical-paro...
139,The Office! A Musical Parody - New York,2021-12-04T21:30:00,0.48,The Theater Center,8.0,89.0,57.0,https://seatgeek.com/the-office-a-musical-paro...
44,The Office! A Musical Parody - New York,2021-12-02T19:30:00,0.486,The Theater Center,9.0,90.0,61.0,https://seatgeek.com/the-office-a-musical-paro...
65,Perfect Crime - New York,2021-12-03T20:00:00,0.473,The Theater Center,6.0,92.0,59.0,https://seatgeek.com/perfect-crime-new-york-ti...
2,Slave Play,2021-12-01T14:00:00,0.488,August Wilson Theatre,156.0,93.0,47.0,https://seatgeek.com/slave-play-tickets/august...
120,Perfect Crime - New York,2021-12-04T20:00:00,0.473,The Theater Center,6.0,94.0,60.0,https://seatgeek.com/perfect-crime-new-york-ti...
178,Perfect Crime - New York,2021-12-05T19:30:00,0.466,The Theater Center,3.0,98.0,68.0,https://seatgeek.com/perfect-crime-new-york-ti...
92,Perfect Crime - New York,2021-12-04T14:00:00,0.467,The Theater Center,3.0,99.0,68.0,https://seatgeek.com/perfect-crime-new-york-ti...
159,Perfect Crime - New York,2021-12-05T15:00:00,0.465,The Theater Center,3.0,99.0,68.0,https://seatgeek.com/perfect-crime-new-york-ti...
