# <font size="6"><b>Assignment V: GitHub and the ticketmaster.com API (Python)</b></font>
<font size="5">Data Science Project Management (DS400) | Winter Term 2022/23</font>

<br>

<div class="author_year">
Submitted by Jakob Zgonc (6293178)<br>
Submitted on 23.01.2023
</div>

<br>

---

<h2>Table of Contents<span class="tocSkip"></span></h2>

<div class="toc">
    <ul class="toc-item", style="list-style-type:none;">
        <li><span><a href="#1-Interacting-with-the-API---the-basics">1 Interacting with the API - the basics</a></span></li>
        <li><span><a href="#2-Interacting-with-the-API---advanced">2 Interacting with the API - advanced</a></span></li>
        <li><span><a href="#3-Visualizing-the-extracted-data">3 Visualizing the extracted data</a></span></li>
        <li><span><a href="#4-Event-locations-in-other-countries">4 Event locations in other countries</a></span></li>
    </ul>
</div>

<br>

---

**Code of conduct**

*I hereby acknowledge that the submitted assignment is my own work. During the preparation of this assignment I have worked together with Max Mohr and Felix Koehn.*

<br>

---

In [1]:
%%html
<style>

div.author_year {
    color: #708090;
    font-size: 17px;
    font-style: italic;
}

div.task {
    background-color:#DAE3F3;
    color: #337AB7; /*  #002060;*/
    border-radius: 10px; 
    padding: 20px;
    margin-top: 20px;
    margin-bottom: 20px;
    margin-left: -5px;
    margin-right: -20px;
    font-size: 15px;
    border-left: 10px solid #337AB7;
}

body {
    max-width:960px;
    margin:0 auto;
}

</style>

This document is also available in [this GitHub repository](https://github.com/jzgonc/dspm_2022_assignment_5).

---

## 1 Interacting with the API - the basics

<div class="task">
    <ol start=7>
        <li>Perform a first <code>GET</code> request, that searches for event venues in Germany (<code>countryCode = "DE"</code>). Extract the content from the response object and inspect the resulting list. Describe what you can see.
        </li>
        </ol>
</div>

In [2]:
# import packages
import requests
import numpy as np
import pandas as pd
import time

In [3]:
# get API key
with open('ticketmaster_api_key.txt','r') as file:
    apikey = file.read()

In [4]:
# url for venues
url = 'https://app.ticketmaster.com/discovery/v2/venues'

In [5]:
# define params for API call
params = dict(apikey=apikey, countryCode = 'DE')

# get response from API call as json (i.e. dictionary)
venues_DE_first_page = requests.get(url=url, params=params).json()

# show response
venues_DE_first_page

{'_embedded': {'venues': [{'name': 'Gruenspan',
    'type': 'venue',
    'id': 'KovZpZAneakA',
    'test': False,
    'url': 'http://www.ticketmaster.de/venue/287155',
    'locale': 'en-de',
    'images': [{'ratio': '16_9',
      'url': 'https://s1.ticketm.net/dbimages/2057v.',
      'width': 205,
      'height': 115,
      'fallback': False}],
    'postalCode': '22767',
    'timezone': 'Europe/Berlin',
    'city': {'name': 'Hamburg'},
    'country': {'name': 'Germany', 'countryCode': 'DE'},
    'address': {'line1': 'Grosse Freiheit 58'},
    'location': {'longitude': '9.958075', 'latitude': '53.551885'},
    'markets': [{'name': 'Germany', 'id': '210'}],
    'dmas': [{'id': 610}],
    'boxOfficeInfo': {'phoneNumberDetail': 'Gruenspan Große Freiheit 58 22767 Hamburg Tel: 040-313616 mail: info@gruenspan.de web: www.gruenspan.de'},
    'upcomingEvents': {'_total': 2, 'mfx-de': 2, '_filtered': 0},
    'ada': {'adaPhones': '+49.(0)1805 - 969 0000 (14 Ct./Min.)',
     'adaCustomCopy': 'Soll

The response of the API call is a nested dictionary (not a list as stated in the task, I hope that's fine anyway). On the first level there are three items:
* `_embedded` contains the actual data we have requested. The value is a dictionary itself. In this dictionary, again, there is a list with all returned venues (each of which is a dictionary, again)
* `_links` contains the url that was used for the API call (excluding the apikey and the base url)
* `page` contains information about the returned page from the data source

<div class="task">
    <ol start=8>
        <li>Extract the <code>name</code>, the <code>city</code>, the <code>postalCode</code> and <code>address</code>, as well as the <code>url</code> and the <code>longitude</code> and <code>latitude</code> of the <code>venues</code> to a data frame. This data frame should have the following structure:
        <br><br>
<code>## Rows: 20
## Columns: 7
## $ name       chr "Gruenspan", "Huxleys Neue Welt", "Kleine Olympiahalle", "Z~
## $ city       chr "Hamburg", "Berlin", "Munich", "Emmelshausen", "Mülheim", "~
## $ postalCode dbl 22767, 10967, 80809, 56281, 45479, 76646, 68766, 44263, 542~
## $ address    chr "Grosse Freiheit 58", "Hasenheide 107 – 113", "Spiridon-Lou~
## $ url        chr "http://www.ticketmaster.de/venue/287155", "http://www.tick~
## $ longitude  dbl 9.958075, 13.421380, 11.550920, 7.556560, 6.874710, 8.59908~
## $ latitude   dbl 53.55188, 52.48639, 48.17543, 50.15544, 51.42778, 49.12692,~</code></li>
</ol>
</div>

In [6]:
def retrieve_from_dict(d: dict, first_level: str, second_level: str = None):
    """Retrieve a value from a dictionary. The value might be on the second
    level of the dictionary. If the specified key does not exist in the
    dictionary, None will be returned.
    
    Args:
    -----
    d: dictionary
        dictionary where to retrieve the value from
        
    first_level: string
        key for the dictionary item
        
    second_level: string
        key for the dictionary on the second level, i.e. if the first level
        value was a dictionary itself
        
    Returns:
    --------
    value:
        retrieved value
    
    """
  
    # retrieve first level, key does not exist in d -> set value to None
    value = d.get(first_level, None)

    # retrieve second level, but only if first level key existed
    # if key does not exist in d -> set value to None
    if (value is not None) and (second_level is not None):
        value = value.get(second_level, None)
    
    return value

In [7]:
def create_df(venue: dict) -> pd.DataFrame:
    """Create a dataframe from a dictionary for a single venue.
    
    Args:
    -----
    venue: dictionary
        dictionary that contains the response from the API call for a single
        venue
        
    Returns:
    --------
    venue_simplified_df: pandas DataFrame
        dataframe with simplified information for a single venue retrieved from
        the input dictionary
        
    """
      
    # simplify venue dictionary by extracting relevant information
    venue_simplified = dict(
        name=retrieve_from_dict(venue, 'name'),
        city=retrieve_from_dict(venue, 'city', 'name'),
        postalCode=retrieve_from_dict(venue, 'postalCode'),
        address=retrieve_from_dict(venue, 'address', 'line1'),
        url=retrieve_from_dict(venue, 'url'),
        longitude=retrieve_from_dict(venue, 'location', 'longitude'),
        latitude=retrieve_from_dict(venue, 'location', 'latitude')
    )

    # convert to pandas data frame (the index is needed as its only one row)
    venue_simplified_df = pd.DataFrame(venue_simplified, index=[0])

    return venue_simplified_df

In [8]:
def get_venues_df_single_page(venues: list[dict]) -> pd.DataFrame:
    """Create a data frame of all venues from a single page. Every venue builds
    a row.
    
    Args:
    -----
    venues: list of dictionaries
        list with one dictionary with the information for every venue
    
    Returns:
    --------
    venues_df_single_page: pandas DataFrame
        data frame that contains information for all venues from a single page
    
    """
      
    # call create_df function for each dictionary in venues list
    # and concatenate to a single data frame
    venues_df_single_page = pd.concat(
        [create_df(venue) for venue in venues]
    ).reset_index(drop=True)

    return venues_df_single_page

In [9]:
# call function to get information of all venues on the first page in a data frame
get_venues_df_single_page(venues = venues_DE_first_page['_embedded']['venues'])

Unnamed: 0,name,city,postalCode,address,url,longitude,latitude
0,Gruenspan,Hamburg,22767.0,Grosse Freiheit 58,http://www.ticketmaster.de/venue/287155,9.958075,53.551885
1,Huxleys Neue Welt,Berlin,10967.0,Hasenheide 107 – 113,http://www.ticketmaster.de/venue/286842,13.42138,52.486391
2,Virtual Event,Worldwide,,,https://www.ticketmaster.de/venue/virtuelles-e...,10.0,50.0
3,Ev. St. Jacobi Kirche,Sangerhausen,6526.0,Marktplatz,http://www.ticketmaster.de/venue/290061,,
4,Evangelische Kirche,Senden,48308.0,Steverstrasse 5,http://www.ticketmaster.de/venue/290066,,
5,HDI Arena,Hannover,,Robert-Enke-Straße 1,http://www.ticketmaster.de/venue/461692,9.73371,52.361993
6,Arsenal,Berlin,10785.0,Potsdamer Strasse 2,http://www.ticketmaster.de/venue/290646,,
7,Freilichtbühne Heppenheim,Heppenheim,64646.0,Oberhalb der Stadt,http://www.ticketmaster.de/venue/290639,,
8,Schlosswallhalle,Osnabrück,49074.0,Schlosswall 10,http://www.ticketmaster.de/venue/290630,,
9,Metropol Theater,Vechta,49377.0,Kolpingstrasse 27,http://www.ticketmaster.de/venue/290631,,


## 2 Interacting with the API - advanced

<div class="task">
    <ol start=9>
        <li>Have a closer look at the list element named <code>page</code>. Did your <code>GET</code> request from exercise (7) return <i>all</i> event locations in Germany? Obviously not - there are of course much more venues in Germany than those contained in this list. Your <code>GET</code> request only yielded the first results page containing the first 20 out of several thousands of venues. Check the API documentation under the section <a href="https://developer.ticketmaster.com/products-and-docs/apis/discovery-api/v2/#search-venues-v2">Venue Search</a>. How can you request the venues from the remaining results pages? Iterate over the results pages and perform <code>GET</code> requests for all venues in Germany. After each iteration, extract the seven variables <code>name</code>, <code>city</code>, <code>postalCode</code>, <code>address</code>, <code>url</code>, <code>longitude</code>, and <code>latitude</code>. Join the information in one large data frame. Print the first 10 rows and the shape of the resulting data frame. The resulting data frame should look something like this (note that the exact number of search results may have changed since this document has been last modified):<br><br>
<code>## Rows: 12,671
## Columns: 7
## $ name       chr "Gruenspan", "Huxleys Neue Welt", "Kleine Olympiahalle", "Z~
## $ city       chr "Hamburg", "Berlin", "Munich", "Emmelshausen", "Mülheim", "~
## $ postalCode dbl 22767, 10967, 80809, 56281, 45479, 76646, 68766, 44263, 542~
## $ address    chr "Grosse Freiheit 58", "Hasenheide 107 – 113", "Spiridon-Lou~
## $ url        chr "http://www.ticketmaster.de/venue/287155", "http://www.tick~
## $ longitude  dbl 9.958075, 13.421380, 11.550920, 7.556560, 6.874710, 8.59908~
## $ latitude   dbl 53.55188, 52.48639, 48.17543, 50.15544, 51.42778, 49.12692,~</code></li>
</ol>
</div>

In [10]:
def get_venues_in_country(country_code: str, verbose: bool = True) -> pd.DataFrame:
    """Get all venues from ticketmaster via it's API and convert the response
    to a data frame that contains a row for each venue.
    
    Note: This functions uses the global variables 'apikey' and 'url'.
    In order to make this function work properly, these have to be defined.
    
    Args:
    -----
    country_code: string
        country code to be used as parameter for the API call
        
    verbose: boolean
        whether to print status messages to console
    
    """
    
    # define params for API calls (size parameter must be less than 500)
    params = dict(apikey=apikey, countryCode = country_code, size = 499)

    # first API call to get number of pages
    n_pages = (
        requests
        .get(url=url, params=params)
        .json()
        .get('page')
        .get('totalPages')
    )

    # save timestamp of when API call was completed
    time_last_apicall = time.time()

    # create empty list to store single page data frames in
    single_page_dfs = []

    # print message
    if verbose:
        print(f'Starting to retrieve venues in country "{country_code}"'
              f' from {n_pages} pages...')

    # loop through pages
    for page in range(n_pages):

        # add page number to params dictionary
        params['page'] = page

        # make sure we do not more than 5 requests per second
        time_since_last_apicall = time.time() - time_last_apicall
        time.sleep(max(0, 1/5 - time_since_last_apicall))

        # get venues list from API call response
        venues = (
            requests
            .get(url=url, params=params)
            .json()
            .get('_embedded')
            .get('venues')
        )

        # update timestamp of when API call was completed
        time_last_apicall = time.time()

        # get data frame of all venues on the current page
        # and append to single_page_dfs list
        single_page_dfs.append(get_venues_df_single_page(venues = venues))

        # print message (progress)
        fraction = (page+1)/n_pages
        bar_length = 10
        arrow = int(fraction * bar_length) * '#'
        padding = int(bar_length - len(arrow)) * ' '
        print(f'Progress: [{arrow}{padding}] {int(fraction*100)}%',
              end=('\n' if fraction == 1 else '\r'))

    # concatenate all data frames in single_page_dfs list
    all_venues_df = pd.concat(single_page_dfs)

    # print message
    if verbose:
        print(f'Retrieved data for {all_venues_df.shape[0]} venues.')

    return all_venues_df

In [11]:
# call function for germany (country_code='DE')
venues_DE = get_venues_in_country(country_code = 'DE')

Starting to retrieve venues in country "DE" from 10 pages...
Progress: [##########] 100%
Retrieved data for 4745 venues.


In [12]:
# show data frame
venues_DE

Unnamed: 0,name,city,postalCode,address,url,longitude,latitude
0,Gruenspan,Hamburg,22767,Grosse Freiheit 58,http://www.ticketmaster.de/venue/287155,9.958075,53.551885
1,Huxleys Neue Welt,Berlin,10967,Hasenheide 107 – 113,http://www.ticketmaster.de/venue/286842,13.42138,52.486391
2,Virtual Event,Worldwide,,,https://www.ticketmaster.de/venue/virtuelles-e...,10.0,50.0
3,Ev. St. Jacobi Kirche,Sangerhausen,06526,Marktplatz,http://www.ticketmaster.de/venue/290061,,
4,Evangelische Kirche,Senden,48308,Steverstrasse 5,http://www.ticketmaster.de/venue/290066,,
...,...,...,...,...,...,...,...
249,PreZero Arena,Sinsheim,74889,Baden-Wurttemberg,,0,0
250,BayArena,Leverkusen,51371,Bismarckstrake 122,,0,0
251,Impuls Arena,Augsburg,86150,Burgermeister Ulrich-strasse 90,,0,0
252,Apollo Theater - Stuttgart,Stuttgart,70567,Plieninger Str 102,,0,0


## 3 Visualizing the extracted data

<div class="task">
    <ol start=10>
        <li>Below, you can find code that produces a map of Germany. Add points to the map indicating the locations of the event venues across Germany.</li>
    </ol>
</div>

<div class="task">
    <ol start=11>
        <li>You will find that some coordinates lie way beyond the German borders and can be assumed to be faulty. Set coordinate values to <code>NA</code> where the value of <code>longitude</code> is outside the range (<code>5.866, 15.042</code>) or where the value of <code>latitude</code> is outside the range (<code>47.270, 55.059</code>) (these coordinate ranges have been derived from the extreme points of Germany as listed on Wikipedia (see <a href="https://en.wikipedia.org/wiki/Geography_of_Germany#Extreme_points">here</a>). For extreme points of other countries, see <a href="https://en.wikipedia.org/wiki/Lists_of_extreme_points#Sovereign_states">here</a>).</li>
        </ol>
</div>

## 4 Event locations in other countries

<div class="task">
    <ol start=12>
        <li>Repeat exercises (9)–(11) for another European country of your choice. (Hint: Clean code pays off! If you have coded the exercises efficiently, only very few adaptions need to be made.)</li>
    </ol>
</div>


<br>

***