# Ticketmaster Ticket Price Prediction Model

Our goal was to predict the minimum and maximum price of a hip/hop rap concert using the Ticketmaster API. 

![alt text](https://miro.medium.com/max/5484/1*Abrgi5f4y7VrVk97s8nSmQ.jpeg)

# Data Collection & Cleaning

The endpoints provided by Ticketmaster mainly focus on 3 major entities: events, attractions, and venues. We chose to use the Event Search because it allowed us to gather data on a maximum of 200 events that we could narrow down by location and specifics of the event itself. 

In [1]:
import pandas as pd
import requests
import json
from pandas.io.json import json_normalize

api_key = "ZI0iUaNUAZvGVZt7ufAfvEGS2FBlaF2V"
m_resp = requests.get("https://app.ticketmaster.com/discovery/v2/events.json?classificationName=music&genreId=KnvZfZ7vAv1&countryCode=US&size=200&apikey=" + api_key)
music_json = m_resp.json()
df_music = json_normalize(music_json["_embedded"]["events"])
df_music.head()

Unnamed: 0,name,type,id,test,url,locale,images,classifications,promoters,priceRanges,sales.public.startDateTime,sales.public.startTBD,sales.public.endDateTime,dates.start.localDate,dates.start.localTime,dates.start.dateTime,dates.start.dateTBD,dates.start.dateTBA,dates.start.timeTBA,dates.start.noSpecificTime,dates.timezone,dates.status.code,dates.spanMultipleDays,promoter.id,promoter.name,promoter.description,seatmap.staticUrl,_links.self.href,_links.attractions,_links.venues,_embedded.venues,_embedded.attractions,info,pleaseNote,sales.presales,ticketLimit.info,accessibility.info,products,dates.access.startDateTime,dates.access.startApproximate,dates.access.endApproximate,dates.end.approximate,dates.end.noSpecificTime,dates.access.endDateTime,dates.end.localDate,dates.end.localTime,dates.end.dateTime,outlets
0,"Lit In Ac 2020 With Lil Kim, Fat Joe, Ja Rule,...",event,vv1AeZAOUGkdbcBub,False,https://www.ticketmaster.com/lit-in-ac-2020-wi...,en-us,"[{'ratio': '16_9', 'url': 'https://s1.ticketm....","[{'primary': True, 'segment': {'id': 'KZFzniwn...","[{'id': '494', 'name': 'PROMOTED BY VENUE', 'd...","[{'type': 'standard', 'currency': 'USD', 'min'...",2020-01-24T17:00:00Z,False,2020-06-21T00:00:00Z,2020-06-20,19:00:00,2020-06-20T23:00:00Z,False,False,False,False,America/New_York,rescheduled,False,494,PROMOTED BY VENUE,PROMOTED BY VENUE / NTL / USA,https://maps.ticketmaster.com/maps/geometry/3/...,/discovery/v2/events/vv1AeZAOUGkdbcBub?locale=...,[{'href': '/discovery/v2/attractions/K8vZ9171s...,[{'href': '/discovery/v2/venues/KovZpZA6AaJA?l...,"[{'name': 'Boardwalk Hall', 'type': 'venue', '...","[{'name': 'Lil Kim', 'type': 'attraction', 'id...",,,,,,,,,,,,,,,,
1,The Weeknd with Special Guests Sabrina Claudio...,event,vv1AeZA-3GkdJ8HaN,False,https://www.ticketmaster.com/the-weeknd-with-s...,en-us,"[{'ratio': '16_9', 'url': 'https://s1.ticketm....","[{'primary': True, 'segment': {'id': 'KZFzniwn...","[{'id': '4018', 'name': 'LIVE NATION - NO LN C...","[{'type': 'standard', 'currency': 'USD', 'min'...",2020-02-28T15:00:00Z,False,2020-07-07T23:00:00Z,2020-07-07,19:00:00,2020-07-07T23:00:00Z,False,False,False,False,America/New_York,onsale,False,4018,LIVE NATION - NO LN CONCERTS BRANDING,LIVE NATION - NO LN CONCERTS BRANDING / NTL / USA,https://maps.ticketmaster.com/maps/geometry/3/...,/discovery/v2/events/vv1AeZA-3GkdJ8HaN?locale=...,[{'href': '/discovery/v2/attractions/K8vZ9172L...,[{'href': '/discovery/v2/venues/KovZpZAE7vaA?l...,"[{'name': 'Prudential Center', 'type': 'venue'...","[{'name': 'The Weeknd', 'type': 'attraction', ...","To purchase advance parking for this event, pl...",To allow for more Card Members to enjoy the sh...,"[{'startDateTime': '2020-02-25T15:00:00Z', 'en...",There is an overall 8 ticket limit for this ev...,,,,,,,,,,,,
2,"Yo Gotti, Da Baby, Kevin Gates, Kash Doll & more",event,vv1AFZAqVGkdEWUmn,False,https://www.ticketmaster.com/yo-gotti-da-baby-...,en-us,"[{'ratio': '4_3', 'url': 'https://s1.ticketm.n...","[{'primary': True, 'segment': {'id': 'KZFzniwn...","[{'id': '494', 'name': 'PROMOTED BY VENUE', 'd...","[{'type': 'standard', 'currency': 'USD', 'min'...",2019-11-29T15:00:00Z,False,2020-05-20T23:00:00Z,2020-05-20,19:00:00,2020-05-20T23:00:00Z,False,False,False,False,America/New_York,rescheduled,False,494,PROMOTED BY VENUE,PROMOTED BY VENUE / NTL / USA,https://maps.ticketmaster.com/maps/geometry/3/...,/discovery/v2/events/vv1AFZAqVGkdEWUmn?locale=...,[{'href': '/discovery/v2/attractions/K8vZ917uW...,[{'href': '/discovery/v2/venues/KovZ917A25V?lo...,"[{'name': 'Little Caesars Arena', 'type': 'ven...","[{'name': 'Yo Gotti', 'type': 'attraction', 'i...",,"Originally scheduled to take place Sunday, Mar...",,There is an overall 8 ticket limit for this ev...,Accessible seating is available for wheelchair...,,,,,,,,,,,
3,Tech N9ne Enterfear Tour 2020,event,1kk8vbo9GAuEwvf,False,https://www.ticketmaster.com/tech-n9ne-enterfe...,en-us,"[{'ratio': '16_9', 'url': 'https://s1.ticketm....","[{'primary': True, 'segment': {'id': 'KZFzniwn...","[{'id': '494', 'name': 'PROMOTED BY VENUE', 'd...","[{'type': 'standard', 'currency': 'USD', 'min'...",2020-02-07T15:43:35Z,False,2020-09-11T05:00:00Z,2020-09-10,20:00:00,2020-09-11T03:00:00Z,False,False,False,False,America/Phoenix,rescheduled,False,494,PROMOTED BY VENUE,PROMOTED BY VENUE / NTL / USA,https://maps.ticketmaster.com/maps/geometry/3/...,/discovery/v2/events/1kk8vbo9GAuEwvf?locale=en-us,[{'href': '/discovery/v2/attractions/K8vZ91753...,[{'href': '/discovery/v2/venues/KovZpZAdlFaA?l...,"[{'name': 'Rialto Theatre-Tucson', 'type': 've...","[{'name': 'Tech N9ne', 'type': 'attraction', '...",Doors 7:00PM | Show 8:00PM | Ages 7+,This event has been rescheduled: Original Date...,,There is a 6 ticket limit,,,,,,,,,,,,
4,Feed The Streetz Tour 2020,event,vvG1zZpdS2wU2Y,False,https://www.ticketmaster.com/feed-the-streetz-...,en-us,"[{'ratio': '3_2', 'url': 'https://s1.ticketm.n...","[{'primary': True, 'segment': {'id': 'KZFzniwn...","[{'id': '494', 'name': 'PROMOTED BY VENUE', 'd...","[{'type': 'standard', 'currency': 'USD', 'min'...",2020-01-31T15:00:00Z,False,2020-04-24T23:00:00Z,2020-04-24,19:00:00,2020-04-24T23:00:00Z,False,False,False,False,America/New_York,onsale,False,494,PROMOTED BY VENUE,PROMOTED BY VENUE / NTL / USA,https://maps.ticketmaster.com/maps/geometry/3/...,/discovery/v2/events/vvG1zZpdS2wU2Y?locale=en-us,[{'href': '/discovery/v2/attractions/K8vZ917Ci...,[{'href': '/discovery/v2/venues/KovZpa2Xke?loc...,"[{'name': 'State Farm Arena', 'type': 'venue',...","[{'name': '2 Chainz', 'type': 'attraction', 'i...",,,,There is an overall 12 ticket limit for this e...,,,,,,,,,,,,


From there we decided to analyze concerts in the U.S. that fall under the genre "Hip Hop/Rap." We believed that focusing on a particular genre would produce a better model since we would be considering a lot of other factors such as artists, location and venue that would produce variations in ticket price. 

Since most of the columns were stored as JSON objects, we used mapping to get individual features and use those as columns because they were more representative. 

In [2]:
df_music_train = pd.read_csv("https://raw.githubusercontent.com/reillynski/data301-finalproject/master/df_music.csv", index_col=0)
df_music_train.head()

Unnamed: 0,name,type,promoter.name,info,pleaseNote,priceMin,priceMax,subGenre,city,state,venueName,attractionName,artists,num.artists,latitude,longitude,venueUpcoming,meanPrice,date
0,"Lit In Ac 2020 With Lil Kim, Fat Joe, Ja Rule,...",event,PROMOTED BY VENUE,,,52.0,92.0,French Rap,Atlantic City,New Jersey,Boardwalk Hall,Lil Kim,"['Lil Kim', 'Fat Joe', 'Ja Rule', 'State Prope...",10,39.354905,-74.438391,15,72.0,2020-04-04
2,"Yo Gotti, Da Baby, Kevin Gates, Kash Doll & more",event,PROMOTED BY VENUE,,"Originally scheduled to take place Sunday, Mar...",54.0,154.0,French Rap,Detroit,Michigan,Little Caesars Arena,Yo Gotti,"['Yo Gotti', 'Kash Doll', 'DaBaby', 'Kevin Gat...",7,42.341089,-83.055434,26,104.0,2020-05-20
3,Spring MegaFest,event,PROMOTED BY VENUE,,,53.0,179.0,French Rap,Indianapolis,Indiana,Bankers Life Fieldhouse,Lil Baby,"['Lil Baby', '2 Chainz', 'Rod Wave', 'Jacquees...",5,39.764064,-86.155507,8,116.0,2020-04-10
4,No Limit Reunion Tour,event,PROMOTED BY VENUE,,Artists subject to change. All sales are final...,55.0,195.0,French Rap,Atlanta,Georgia,State Farm Arena,Master P,"['Master P', 'Mia X', 'Silkk the Shocker', 'My...",5,33.757796,-84.394569,21,125.0,2020-05-01
5,Feed The Streetz Tour 2020,event,PROMOTED BY VENUE,,Lineup subject to change.,75.0,175.0,French Rap,Brooklyn,New York,Barclays Center,Rick Ross,"['Rick Ross', 'Jeezy', '2 Chainz', 'Yo Gotti',...",9,40.683504,-73.976617,21,125.0,2020-05-15


# Data Exploration

In [3]:
from altair import *

df_music = pd.read_csv("https://raw.githubusercontent.com/reillynski/data301-finalproject/master/df_music.csv", index_col=0)

df_music['date'] = pd.to_datetime(df_music['date'])
df_music["month"] = df_music["date"].dt.month

##not sure what graph is best here
Chart(df_music).mark_line().encode(
    x="month",
    y="count()"
)

In this dataset, the events are in the months of March-October with the most events falling in the month of May. If we redid this analysis we would expect that the above graph would be shifted to the right due to many events being pushed back weeks or even months due to the COVID-19 outbreak.

In [4]:
Chart(df_music).mark_errorbar().encode(
    x="state",
    y="min(priceMin)",
    y2="max(priceMax)"
) + Chart(df_music).mark_circle().encode(
    x="state",
    y="mean(meanPrice)"
)

This graph shows the minimum, maximum, and average prices of tickets in each of the states that have events in our dataset. It appears that the majority of states have a minimum price of around \$30 and an average price between \$50 and \$150. The maximum price of the states varies greatly with some having a maximum price of around \$50 and others with prices over \$600.

# Machine Learning

We decided to create two separate machine learning models: one to predict the minimum ticket price and another to predict the maximum ticket price. For both priceMin and priceMax, we went through a process of finding the set of features that minimized error for different machine learning models. 

The features we tested were the following: 


1.   **Quantitative:** num.artists, latitude, longitude, venueUpcoming, date_quant
2.   **Categorical:** promoter.name, subGenre, city, state, venueName, attractionName
3.   **Text:** name, info, pleaseNote



The machine learning models we used were the following: 

1.   K-Nearest Neighbors
2.   Linear Regression
3.   RandomForest



For priceMin, the model that minimized error the most was RandomForest with the following variables: num.artists, latitude, promoter.name, subGenre, city, state, venueName, attractionName, name, info, pleaseNote. It had an RMSE of 13.72409064399621.

For priceMax, the model that minimized error the most was RandomForest with the following variables: num.artists, latitude, longitude, venueUpcoming, promoter.name, subGenre, city, state, venueName, attractionName, name. It had an RMSE of 63.43886369366631. 

**Predictions**

To make our predictions, we used a test set comprised of events that had a missing priceRanges variable. We predicted both the priceMin and priceMax for each event using the priceMin and priceMax models described above. 

In [5]:
df_music_predictions = pd.read_csv("https://raw.githubusercontent.com/reillynski/data301-finalproject/master/df_music_predictions.csv", index_col=0)
df_music_predictions

Unnamed: 0,name,type,promoter.name,info,pleaseNote,subGenre,city,state,venueName,attractionName,artists,num.artists,latitude,longitude,venueUpcoming,date,date_quant,priceMin,priceMax
0,Kevin Gates,event,,,,Hip-Hop/Rap,Baltimore,Maryland,Rams Head Live,Kevin Gates,['Kevin Gates'],1,39.297401,-76.607399,16,2020-06-09,189,37.8339,69.1504
1,NF - The Search Tour,event,LIVE NATION MUSIC,,,French Rap,Tulsa,Oklahoma,Brady Theater,NF,['NF'],1,36.158186,-95.995284,6,2020-04-10,130,32.7263,74.3717
2,NF - The Search Tour,event,LIVE NATION MUSIC,,,French Rap,Milwaukee,Wisconsin,Eagles Club/The Rave/Eagles Ballroom,NF,['NF'],1,43.038074,-87.943308,19,2020-04-16,136,37.3423,70.783
3,NF - The Search Tour,event,LIVE NATION MUSIC,,,French Rap,Buffalo,New York,Buffalo RiverWorks,NF,['NF'],1,42.869917,-78.872638,1,2020-04-18,138,37.0948,80.139
4,NF - The Search Tour,event,LIVE NATION MUSIC,,,French Rap,Kansas City,Missouri,Starlight Theatre,NF,['NF'],1,39.006963,-94.531517,49,2020-05-12,162,35.5312,54.8035
5,POSTPONED :: Watsky - Placement Album Tour,event,,Doors: 7 p.m. || Music: 8 p.m. || All Ages$20:...,,Alternative Rap,Lincoln,Nebraska,Bourbon Theatre,Watsky,"['Watsky', 'Feed the Biirds']",2,40.813344,-96.700617,56,2020-04-28,148,30.3495,43.3395
6,[POSTPONED] Watsky - Placement Album Tour,event,,"ALL AGESSHOW POSTPONED:Unfortunately, WATSKY a...",,Urban,Boise,Idaho,Knitting Factory Concert House - Boise,Watsky,"['Watsky', 'Hollis']",2,43.613149,-116.207134,56,2020-05-05,155,35.2915,63.445
7,Pitbull,event,,,,Hip-Hop/Rap,Edinburg,Texas,Bert Ogden Arena,Pitbull,['Pitbull'],1,26.2938,-98.1548,11,2020-05-10,160,35.071,72.5727
8,Pitbull,event,,,,Hip-Hop/Rap,Vienna,Virginia,Filene Center,Pitbull,['Pitbull'],1,38.9062,-77.294899,54,2020-08-30,270,35.0288,33.8


In [6]:
df_music_predictions[["name", "subGenre", "city", "state", "attractionName", "num.artists", "priceMin", "priceMax"]]

Unnamed: 0,name,subGenre,city,state,attractionName,num.artists,priceMin,priceMax
0,Kevin Gates,Hip-Hop/Rap,Baltimore,Maryland,Kevin Gates,1,37.8339,69.1504
1,NF - The Search Tour,French Rap,Tulsa,Oklahoma,NF,1,32.7263,74.3717
2,NF - The Search Tour,French Rap,Milwaukee,Wisconsin,NF,1,37.3423,70.783
3,NF - The Search Tour,French Rap,Buffalo,New York,NF,1,37.0948,80.139
4,NF - The Search Tour,French Rap,Kansas City,Missouri,NF,1,35.5312,54.8035
5,POSTPONED :: Watsky - Placement Album Tour,Alternative Rap,Lincoln,Nebraska,Watsky,2,30.3495,43.3395
6,[POSTPONED] Watsky - Placement Album Tour,Urban,Boise,Idaho,Watsky,2,35.2915,63.445
7,Pitbull,Hip-Hop/Rap,Edinburg,Texas,Pitbull,1,35.071,72.5727
8,Pitbull,Hip-Hop/Rap,Vienna,Virginia,Pitbull,1,35.0288,33.8


In [7]:
Chart(df_music_predictions).mark_circle().encode(
    x="priceMin", 
    y="priceMax",
    color="subGenre"
)

Based on the scatterplot, there appears to be a positive linear relationship between priceMin and priceMax. 

# Questions?