# Data Bootcamp Final Project
### Deven Patnaik, Catherine Sang, and Salomon Ruiz

This dataset (cited in README.md) describes each Premier League soccer match in the 2010 through 2021 seasons. With this data, we seek to answer the question, "Do penalties affect the result of a match?"

In [3]:
import pandas as pd
import numpy as np
df = pd.read_csv('df_full_premierleague.csv')
df

Unnamed: 0.1,Unnamed: 0,link_match,season,date,home_team,away_team,result_full,result_ht,home_clearances,home_corners,...,tackles_avg_away,touches_avg_away,yellow_cards_avg_away,goals_scored_ft_avg_away,goals_conced_ft_avg_away,sg_match_ft_acum_away,goals_scored_ht_avg_away,goals_conced_ht_avg_away,sg_match_ht_acum_away,performance_acum_away
0,0,https://www.premierleague.com/match/7186,10/11,2010-11-01,Blackpool,West Bromwich Albion,2-1,1-0,15.0,8.0,...,20.0,584.9,1.6,1.44,1.67,-2.0,0.33,0.78,-4.0,55.6
1,1,https://www.premierleague.com/match/7404,10/11,2011-04-11,Liverpool,Manchester City,3-0,3-0,16.0,6.0,...,22.0,681.4,2.0,1.61,0.87,23.0,0.87,0.32,17.0,60.2
2,2,https://www.premierleague.com/match/7255,10/11,2010-12-13,Manchester United,Arsenal,1-0,1-0,26.0,5.0,...,21.2,748.0,1.8,2.12,1.12,16.0,0.94,0.38,9.0,66.7
3,3,https://www.premierleague.com/match/7126,10/11,2010-09-13,Stoke City,Aston Villa,2-1,0-1,26.0,8.0,...,25.0,567.3,2.0,1.33,2.00,-2.0,1.00,1.00,0.0,66.7
4,4,https://www.premierleague.com/match/7350,10/11,2011-02-14,Fulham,Chelsea,0-0,0-0,50.0,4.0,...,19.4,728.6,1.4,1.84,0.88,24.0,0.84,0.48,9.0,58.7
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4065,4065,https://www.premierleague.com/match/59113,20/21,2021-02-03,Liverpool,Brighton and Hove Albion,0-1,0-0,15.0,5.0,...,16.8,651.0,1.5,1.10,1.38,-6.0,0.52,0.67,-3.0,33.3
4066,4066,https://www.premierleague.com/match/59177,20/21,2021-03-03,Burnley,Leicester City,1-1,1-1,13.0,5.0,...,17.7,679.0,1.8,1.73,1.15,15.0,0.77,0.58,5.0,62.8
4067,4067,https://www.premierleague.com/match/59178,20/21,2021-03-03,Crystal Palace,Manchester United,0-0,0-0,25.0,4.0,...,14.9,742.6,1.6,2.04,1.23,21.0,0.85,0.69,4.0,64.1
4068,4068,https://www.premierleague.com/match/59182,20/21,2021-03-03,Sheffield United,Aston Villa,1-0,1-0,47.0,2.0,...,13.3,587.0,1.7,1.58,1.08,12.0,0.67,0.42,6.0,54.2


#### I. Shaping, Cleaning, and Filtering 
* Title all columns
* Convert date to datetime format
* Reindex by date
* Split Result columns between Home and Away
* Delete games with nulls and empty values
* Remove columns irrelevant to score and penalties

In [7]:
df.columns = [i.title() for i in df.columns]

In [9]:
# Convert date to datetime format
df['Date'] = pd.to_datetime(df['Date'])

In [10]:
df.set_index(df['Date'])

Unnamed: 0_level_0,Unnamed: 0,Link_Match,Season,Date,Home_Team,Away_Team,Result_Full,Result_Ht,Home_Clearances,Home_Corners,...,Tackles_Avg_Away,Touches_Avg_Away,Yellow_Cards_Avg_Away,Goals_Scored_Ft_Avg_Away,Goals_Conced_Ft_Avg_Away,Sg_Match_Ft_Acum_Away,Goals_Scored_Ht_Avg_Away,Goals_Conced_Ht_Avg_Away,Sg_Match_Ht_Acum_Away,Performance_Acum_Away
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2010-11-01,0,https://www.premierleague.com/match/7186,10/11,2010-11-01,Blackpool,West Bromwich Albion,2-1,1-0,15.0,8.0,...,20.0,584.9,1.6,1.44,1.67,-2.0,0.33,0.78,-4.0,55.6
2011-04-11,1,https://www.premierleague.com/match/7404,10/11,2011-04-11,Liverpool,Manchester City,3-0,3-0,16.0,6.0,...,22.0,681.4,2.0,1.61,0.87,23.0,0.87,0.32,17.0,60.2
2010-12-13,2,https://www.premierleague.com/match/7255,10/11,2010-12-13,Manchester United,Arsenal,1-0,1-0,26.0,5.0,...,21.2,748.0,1.8,2.12,1.12,16.0,0.94,0.38,9.0,66.7
2010-09-13,3,https://www.premierleague.com/match/7126,10/11,2010-09-13,Stoke City,Aston Villa,2-1,0-1,26.0,8.0,...,25.0,567.3,2.0,1.33,2.00,-2.0,1.00,1.00,0.0,66.7
2011-02-14,4,https://www.premierleague.com/match/7350,10/11,2011-02-14,Fulham,Chelsea,0-0,0-0,50.0,4.0,...,19.4,728.6,1.4,1.84,0.88,24.0,0.84,0.48,9.0,58.7
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2021-02-03,4065,https://www.premierleague.com/match/59113,20/21,2021-02-03,Liverpool,Brighton and Hove Albion,0-1,0-0,15.0,5.0,...,16.8,651.0,1.5,1.10,1.38,-6.0,0.52,0.67,-3.0,33.3
2021-03-03,4066,https://www.premierleague.com/match/59177,20/21,2021-03-03,Burnley,Leicester City,1-1,1-1,13.0,5.0,...,17.7,679.0,1.8,1.73,1.15,15.0,0.77,0.58,5.0,62.8
2021-03-03,4067,https://www.premierleague.com/match/59178,20/21,2021-03-03,Crystal Palace,Manchester United,0-0,0-0,25.0,4.0,...,14.9,742.6,1.6,2.04,1.23,21.0,0.85,0.69,4.0,64.1
2021-03-03,4068,https://www.premierleague.com/match/59182,20/21,2021-03-03,Sheffield United,Aston Villa,1-0,1-0,47.0,2.0,...,13.3,587.0,1.7,1.58,1.08,12.0,0.67,0.42,6.0,54.2


1. Cleaning and sorting
2. Visualization
3. Experiments
4. (if time) Predictions