# Business Analytics

We have provided a sample of 460 games (230 from the 2016-17 season and 230 from the 2017-18 season) for which your task is to predict total international viewers. To assist with your model, we have provided with viewership numbers from 1,000 games by country in each of the two seasons, plus stats on team and player performance and status. 

#### Using these inputs, we would like you to predict the total number of international viewers for each of the 460 games in the test set.

Team game stats and player stats are provided for all games in the training data set, but have been removed from the games in the test set.

You will be graded on Mean Absolute Percentage Error (MAPE) on Total Viewers. We selected this metric due to natural scaling in the international viewership data. This metric is defined as:

\begin{equation*}
MAPE = \frac{1}{n} \sum_{i=1}^n \lvert \frac{A_i-P_i}{A_i}\rvert
\end{equation*}

where 𝑛 = 460 is the total observations in the test set, and 𝐴𝑖 and 𝑃𝑖 are the 𝑖th actual and predicted Total Viewers. Some tips to help you in your modeling:
- Consider all factors that may drive viewership. Team strength may be one, but there may be others such as market size and/or social following.
- Consider temporal/seasonal effects such as day of week, opening day/week and holidays.
- Consider using other public information like Google Trends or further historical team performance if you find it may helpful. Not required.

Please return to us a copy of test\_set_[Team_Name].csv with the “Total Viewers” column filled in with your response variable. Please also return all code or relevant working files. Thank you!

In [1]:
# import packages
import pandas as pd

### Load 'game_data.csv'

This dataset includes incoming wins and losses (i.e., record going into a game), dates, and selected game stats for each game in the 2016-17 and 2017-18 seasons. Game stats are only included for games in the training set – not in the test set.

In [2]:
Game_Data = pd.read_csv('game_data.csv', sep=',', header = 0)
Game_Data.head()

Unnamed: 0,Season,Game_ID,Game_Date,Team,Location,Wins_Entering_Gm,Losses_Entering_Gm,Team_Minutes,Final_Score,Lead_Changes,Ties,Largest_Lead,Full_Timeouts,Short_Timeouts,Qtr_4_Score,L2M_Score
0,2016-17,21600001,10/25/2016,CLE,H,0,0,240.0,117.0,1.0,3.0,32.0,4.0,1.0,82.0,113.0
1,2016-17,21600001,10/25/2016,NYK,A,0,0,240.0,88.0,0.0,1.0,4.0,4.0,2.0,64.0,84.0
2,2016-17,21600002,10/25/2016,POR,H,0,0,240.0,113.0,6.0,6.0,13.0,5.0,0.0,77.0,104.0
3,2016-17,21600002,10/25/2016,UTA,A,0,0,240.0,104.0,5.0,1.0,8.0,5.0,2.0,83.0,102.0
4,2016-17,21600003,10/25/2016,GSW,H,0,0,240.0,100.0,1.0,2.0,4.0,3.0,1.0,77.0,100.0


### Load 'player_data.csv'

This dataset includes performance stats for each player in each game in the 2016-17 and 2017-18 seasons. It also includes indicators for whether the player was selected as an All-Star for the season in question and whether the player was active for the game. Game statistics are only included for games in the training set – not in the test set. All-Star status and active vs. inactive status is provided for all games.

In [3]:
Player_Data = pd.read_csv('player_data.csv', sep=',', header = 0)
Player_Data.head()

Unnamed: 0,Season,Game_ID,Game_Date,Team,Person_ID,Name,ASG_Team,Active_Status,Minutes,Points,...,Steals,Blocks,Turnovers,Field_Goals,Field_Goals_Attempted,Three_Pointers,Three_Pointers_Attempted,Free_Throws,Free_Throws_Attempted,Personal_Fouls
0,2016-17,21600001,10/25/2016,CLE,101112,Channing Frye,,Active,14.0,6.0,...,1.0,1.0,1.0,2.0,6.0,2.0,5.0,0.0,0.0,4.0
1,2016-17,21600001,10/25/2016,CLE,201567,Kevin Love,East ASG,Active,25.0,23.0,...,3.0,0.0,2.0,6.0,15.0,2.0,6.0,9.0,12.0,3.0
2,2016-17,21600001,10/25/2016,CLE,202681,Kyrie Irving,East ASG,Active,30.0,29.0,...,2.0,0.0,0.0,12.0,22.0,4.0,7.0,1.0,1.0,1.0
3,2016-17,21600001,10/25/2016,CLE,202684,Tristan Thompson,,Active,22.0,0.0,...,0.0,0.0,2.0,0.0,1.0,0.0,0.0,0.0,0.0,4.0
4,2016-17,21600001,10/25/2016,CLE,202697,Iman Shumpert,,Active,14.0,2.0,...,0.0,1.0,2.0,1.0,3.0,0.0,1.0,0.0,0.0,2.0


### Load 'training_set.csv'

This dataset includes total viewership for each international country for each game in the 2016-17 and 2017-18 seasons. Included are 1,000 games from each season.

In [4]:
Training_Set = pd.read_csv('training_set.csv', sep=',', header = 0)
Training_Set.head()

Unnamed: 0,Season,Game_ID,Game_Date,Away_Team,Home_Team,Country,Rounded Viewers
0,2016-17,21600001,10/25/2016,NYK,CLE,C113,18
1,2016-17,21600001,10/25/2016,NYK,CLE,C193,1
2,2016-17,21600001,10/25/2016,NYK,CLE,C126,4
3,2016-17,21600001,10/25/2016,NYK,CLE,C163,11
4,2016-17,21600001,10/25/2016,NYK,CLE,C73,3


### Load 'test_set.csv'

This dataset includes a list of games in the 2016-17 and 2017-18 seasons for which you are expected to predict total international viewership (i.e., the sum across all countries). Included are 230 games from each season.

In [5]:
Test_Set = pd.read_csv('test_set.csv', sep=',', header = 0)
Test_Set.head()

Unnamed: 0,Season,Game_ID,Game_Date,Away_Team,Home_Team,Total_Viewers
0,2016-17,21600010,10/26/2016,DEN,NOP,
1,2016-17,21600023,10/28/2016,HOU,DAL,
2,2016-17,21600027,10/29/2016,BOS,CHA,
3,2016-17,21600026,10/29/2016,ATL,PHI,
4,2016-17,21600048,11/1/2016,LAL,IND,
