# Ticketmaster Ticket Price Prediction Model

Our goal was to predict the minimum and maximum price of a hip/hop rap concert using the Ticketmaster API. 

![alt text](https://miro.medium.com/max/5484/1*Abrgi5f4y7VrVk97s8nSmQ.jpeg)

# Data Collection & Cleaning

The endpoints provided by Ticketmaster mainly focus on 3 major entities: events, attractions, and venues. We chose to use the Event Search because it allowed us to gather data on a maximum of 200 events that we could narrow down by location and specifics of the event itself. 

From there we decided to analyze concerts in the U.S. that fall under the genre "Hip Hop/Rap." We believed that focusing on a particular genre would produce a better model since we would be considering a lot of other factors such as artists, location and venue that would produce variations in ticket price. 

# Data Exploration

In [1]:
import pandas as pd
from altair import *

df_music = pd.read_csv("https://raw.githubusercontent.com/reillynski/data301-finalproject/master/df_music.csv", index_col=0)

df_music['date'] = pd.to_datetime(df_music['date'])
df_music["month"] = df_music["date"].dt.month

##not sure what graph is best here
Chart(df_music).mark_line().encode(
    x="month",
    y="count()"
)

In this dataset, the events are in the months of March-October with the most events falling in the month of May. If we redid this analysis we would expect that the above graph would be shifted to the right due to many events being pushed back weeks or even months due to the COVID-19 outbreak.

In [2]:
Chart(df_music).mark_errorbar().encode(
    x="state",
    y="min(priceMin)",
    y2="max(priceMax)"
) + Chart(df_music).mark_circle().encode(
    x="state",
    y="mean(meanPrice)"
)

This graph shows the minimum, maximum, and average prices of tickets in each of the states that have events in our dataset. It appears that the majority of states have a minimum price of around \$30 and an average price between \$50 and \$150. The maximum price of the states varies greatly with some having a maximum price of around \$50 and others with prices over \$600.

# Machine Learning

We decided to create two separate machine learning models: one to predict the minimum ticket price and another to predict the maximum ticket price. For both priceMin and priceMax, we went through a process of finding the set of features that minimized error for different machine learning models. 

The features we tested were the following: 


1.   **Quantitative:** num.artists, latitude, longitude, venueUpcoming, date_quant
2.   **Categorical:** promoter.name, subGenre, city, state, venueName, attractionName
3.   **Text:** name, info, pleaseNote



The machine learning models we used were the following: 

1.   K-Nearest Neighbors
2.   Linear Regression
3.   RandomForest



For priceMin, the model that minimized error the most was RandomForest with the following variables: num.artists, latitude, promoter.name, subGenre, city, state, venueName, attractionName, name, info, pleaseNote. It had an RMSE of 13.72409064399621.

For priceMax, the model that minimized error the most was RandomForest with the following variables: num.artists, latitude, longitude, venueUpcoming, promoter.name, subGenre, city, state, venueName, attractionName, name. It had an RMSE of 63.43886369366631. 

**Predictions**

To make our predictions, we used a test set comprised of events that had a missing priceRanges variable. We predicted both the priceMin and priceMax for each event using the priceMin and priceMax models described above. 

In [3]:
df_music_predictions = pd.read_csv("https://raw.githubusercontent.com/reillynski/data301-finalproject/master/df_music_predictions.csv", index_col=0)
df_music_predictions

Unnamed: 0,name,type,promoter.name,info,pleaseNote,subGenre,city,state,venueName,attractionName,artists,num.artists,latitude,longitude,venueUpcoming,date,date_quant,priceMin,priceMax
0,Kevin Gates,event,,,,Hip-Hop/Rap,Baltimore,Maryland,Rams Head Live,Kevin Gates,['Kevin Gates'],1,39.297401,-76.607399,16,2020-06-09,189,37.8339,69.1504
1,NF - The Search Tour,event,LIVE NATION MUSIC,,,French Rap,Tulsa,Oklahoma,Brady Theater,NF,['NF'],1,36.158186,-95.995284,6,2020-04-10,130,32.7263,74.3717
2,NF - The Search Tour,event,LIVE NATION MUSIC,,,French Rap,Milwaukee,Wisconsin,Eagles Club/The Rave/Eagles Ballroom,NF,['NF'],1,43.038074,-87.943308,19,2020-04-16,136,37.3423,70.783
3,NF - The Search Tour,event,LIVE NATION MUSIC,,,French Rap,Buffalo,New York,Buffalo RiverWorks,NF,['NF'],1,42.869917,-78.872638,1,2020-04-18,138,37.0948,80.139
4,NF - The Search Tour,event,LIVE NATION MUSIC,,,French Rap,Kansas City,Missouri,Starlight Theatre,NF,['NF'],1,39.006963,-94.531517,49,2020-05-12,162,35.5312,54.8035
5,POSTPONED :: Watsky - Placement Album Tour,event,,Doors: 7 p.m. || Music: 8 p.m. || All Ages$20:...,,Alternative Rap,Lincoln,Nebraska,Bourbon Theatre,Watsky,"['Watsky', 'Feed the Biirds']",2,40.813344,-96.700617,56,2020-04-28,148,30.3495,43.3395
6,[POSTPONED] Watsky - Placement Album Tour,event,,"ALL AGESSHOW POSTPONED:Unfortunately, WATSKY a...",,Urban,Boise,Idaho,Knitting Factory Concert House - Boise,Watsky,"['Watsky', 'Hollis']",2,43.613149,-116.207134,56,2020-05-05,155,35.2915,63.445
7,Pitbull,event,,,,Hip-Hop/Rap,Edinburg,Texas,Bert Ogden Arena,Pitbull,['Pitbull'],1,26.2938,-98.1548,11,2020-05-10,160,35.071,72.5727
8,Pitbull,event,,,,Hip-Hop/Rap,Vienna,Virginia,Filene Center,Pitbull,['Pitbull'],1,38.9062,-77.294899,54,2020-08-30,270,35.0288,33.8


In [4]:
df_music_predictions[["name", "subGenre", "city", "state", "attractionName", "num.artists", "priceMin", "priceMax"]]

Unnamed: 0,name,subGenre,city,state,attractionName,num.artists,priceMin,priceMax
0,Kevin Gates,Hip-Hop/Rap,Baltimore,Maryland,Kevin Gates,1,37.8339,69.1504
1,NF - The Search Tour,French Rap,Tulsa,Oklahoma,NF,1,32.7263,74.3717
2,NF - The Search Tour,French Rap,Milwaukee,Wisconsin,NF,1,37.3423,70.783
3,NF - The Search Tour,French Rap,Buffalo,New York,NF,1,37.0948,80.139
4,NF - The Search Tour,French Rap,Kansas City,Missouri,NF,1,35.5312,54.8035
5,POSTPONED :: Watsky - Placement Album Tour,Alternative Rap,Lincoln,Nebraska,Watsky,2,30.3495,43.3395
6,[POSTPONED] Watsky - Placement Album Tour,Urban,Boise,Idaho,Watsky,2,35.2915,63.445
7,Pitbull,Hip-Hop/Rap,Edinburg,Texas,Pitbull,1,35.071,72.5727
8,Pitbull,Hip-Hop/Rap,Vienna,Virginia,Pitbull,1,35.0288,33.8


In [5]:
Chart(df_music_predictions).mark_circle().encode(
    x="priceMin", 
    y="priceMax",
    color="subGenre"
)

Based on the scatterplot, there appears to be a positive linear relationship between priceMin and priceMax. 

# Questions?