# Backend Database (Part 4: Songkick)

The foundation of our backend is to have an extensive list of artist and information relevant to them. We will use X different APIs to collect various information:

1. lastfm API to gather a long list of artists, mainly those popular in the US.
2. SeatGeek API to gather data on upcoming concerts, particularly ticket pricing and concert size.
3. Scrape SeatGeek website for capacity information on concert venues.
4. **Songkick API to retrieve data on historical concerts.**
5. Scrape Billboard website for recent successful concerts. Includes information like revenue and attendance.


In [1]:
import requests
import pickle
import MySQLdb as mdb
import sys
from datetime import datetime

## Defining a few functions to fetch data

**(i) Functions to get ids and names used in Songkick's database**

In [2]:
#Functions that utilize search API
#Use the search query to get a matching id json response | returns a dictionary:
def search_api(sql_name):
    url_id = "http://api.songkick.com/api/3.0/search/artists.json?apikey=<apikey>&query={}".format(sql_name)
    ids = requests.get(url_id).json()
    return ids

#Function to get artist ID | returns integer
def search_id(ids):
    try:
        return ids['resultsPage']['results']['artist'][0]['id']
    except:
        None

#Function to get displayName for searching | returns a string:
def search_displayName(ids):
    try:
        return ids['resultsPage']['results']['artist'][0]['displayName']
    except:
        None

**(ii) Functions to get past concerts using ids retrieved in (i)**

In [3]:
#Function to use id gigography API
#Returns a list with useful information:
def get_gigography(artist_id):
    gigography_cleaned = []
    try:
        for i in range(1,6):
            base = "http://api.songkick.com/api/3.0/artists/"
            page = i
            api_key = "apikey"
            url_gig = base + "{}/gigography.json?apikey={}&query&order=desc&page={}".format(artist_id, api_key, page)
            response = requests.get(url_gig).json()
            gigography_result = response['resultsPage']['results']['event']
            gigography_cleaned += gigography_result
        return gigography_cleaned
    except:
        None

In [4]:
url_gig = "http://api.songkick.com/api/3.0/artists/315398/gigography.json?apikey=<apikey>&query&order=desc&page=2"
a = requests.get(url_gig).json()

In [5]:
base = "http://api.songkick.com/api/3.0/artists/"
page = '1'
page2 = 2
api_key = "<apikey>"
url_gig = base + "{}/gigography.json?apikey={}&query&order=desc&page={}".format(315398, api_key, page)
response1 = requests.get(url_gig).json()
gigography_result1 = response1['resultsPage']['results']['event']

url_gig2 = base + "{}/gigography.json?apikey={}&query&order=desc&page={}".format(315398, api_key, page2)
response2 = requests.get(url_gig2).json()
gigography_result2 = response2['resultsPage']['results']['event']

**(iii) Function to get relevant data from the json response in (ii)**

In [6]:
#Functions to use gigoraphy_result
#Begin with search['resultsPage']['results']['event'] | returns a list of concerts:
def get_details(gigography_result, displayName, artistname):
    empty_list = []
    try:
        for i in range(len(gigography_result)):
            dict1 = {}
            dict1['artist'] = artistname
            dict1['concert'] = gigography_result[i]['displayName']
            dict1['date'] = datetime.strptime(gigography_result[i]['start']['date'], '%Y-%m-%d')
            dict1['city'] = gigography_result[i]['location']['city']
            dict1['lat'] = gigography_result[i]['location']['lat']
            dict1['lon'] = gigography_result[i]['location']['lng']
            performance = gigography_result[i]['performance'] #specifies the performance
            for j in range(len(performance)):
                if performance[j]['displayName'] == displayName: #check if each performer is == displayName
                    dict1['role'] = performance[j]['billing'] #returns the one that is equal 
                    empty_list.append(dict1)
        return empty_list
    except:
        None

# Creating Songkick table and inserting data

In [7]:
#A function that shifts the list each time
def shift(pickled_list, n):
    return pickled_list[n:] + pickled_list[:n]

In [8]:
#Open pickled file containing the artist names
fileObject = open("/home/ubuntu/jupyter/Student_Notebooks/Assignments/Project/Crontab/songkick_artist_name_list",'rb') 
old_pickled_list = pickle.load(fileObject)

In [9]:
#Restricting the search to 20 artists 
list_30 = old_pickled_list[0:30]

In [10]:
input_list = []
for i in list_30:
    name_artist = search_api(i)
    id_artist = search_id(name_artist)
    displayName = search_displayName(name_artist)
    gigography_result = get_gigography(id_artist)
    details = get_details(gigography_result, displayName, i)
    input_list.append(details)

In [11]:
input_list = [x for x in input_list if x != None] 

In [12]:
#Connecting to MySQL database
con = mdb.connect(host = 'localhost', 
                  user = 'root',
                  passwd = '<password>', 
                  charset='utf8', use_unicode=True);

In [13]:
#Create a table for Songkick
cursor = con.cursor()
table_name = 'songkick'

create_table_query = '''CREATE TABLE IF NOT EXISTS {db}.{table} 
                                (artist varchar(250), 
                                concert_date datetime,
                                concert_name varchar(250),
                                role varchar(250),
                                city varchar(250),
                                latitude int,
                                longitude int,
                                PRIMARY KEY(artist, concert_date)
                                )'''.format(db='Project', table=table_name)
cursor.execute(create_table_query)
cursor.close()

In [14]:
#Insert songkick data
cursor = con.cursor()
table_name = 'songkick'

query_template = '''INSERT IGNORE INTO {db}.{table}(artist, 
                                            concert_date,
                                            concert_name,
                                            role,
                                            city,
                                            latitude,
                                            longitude) 
                                            VALUES (%s, %s, %s, %s, %s, %s, %s)'''.format(db="Project", table=table_name)

cursor = con.cursor()



for data in input_list:
    for i in range(len(data)):
        artist = data[i]['artist']
        concert_date = data[i]['date']
        concert_name = data[i]['concert']
        role = data[i]['role']
        city = data[i]['city']
        latitude = data[i]['lat']
        longitude = data[i]['lon']
        
        query_parameters = (artist, concert_date, concert_name, role, city, latitude, longitude)
        cursor.execute(query_template, query_parameters)

con.commit()
cursor.close()



## Shift the entire list and pickle the new list to replace file

In [15]:
shifted_list_names = shift(old_pickled_list, 30)

In [16]:
#Updating pickle file with current artists
file_Name = "/home/ubuntu/jupyter/Student_Notebooks/Assignments/Project/Crontab/songkick_artist_name_list"
fileObject = open(file_Name,'wb') 

pickle.dump(shifted_list_names,fileObject)
fileObject.close()