## 2021: Week 16 - The Super League

If you are a fan of football or have been following any of the latest news then you would have probably heard about the proposed new 'Super League' that was planned with a selection of European clubs. This would have involved a group of 12 teams playing in a competition each year without having to qualify or be relegated. The lack of  fair competition between other clubs has caused an uproar among fans, players, media outlets, football associations, and even the UK government! 

The 'big 6' English teams to propose the Super League were Arsenal, Chelsea, Liverpool, Manchester United, Manchester City, and Tottenham Hotspur.

One of the ideas to try and discourage the clubs from proceeding with the new league, was to threaten the English teams with being expelled from the English Premier League. The challenge this week is to try and understand how the current league table would change if these clubs were to be 'kicked out'.

### Input
The input this week is a list of all of the fixtures from the 2020/21 season (up until 19/04/2021). 

### Requirement
- Input the data
- Calculate the Total Points for each team. The points are as follows: 
    - Win - 3 Points
    - Draw - 1 Point
    - Lose - 0 Points
- Calculate the goal difference for each team. Goal difference is the difference between goals scored and goals conceded. 
- Calculate the current rank/position of each team. This is based on Total Points (high to low) and in a case of a tie then Goal Difference (high to low).
- The current league table is our first output.

- Assuming that the 'Big 6' didn't play any games this season, recalculate the league table.
- After removing the 6 clubs, how has the position changed for the remaining clubs?
- The updated league table is the second output.
- Bonus - Think about features in Tableau Prep to make this repeatable process easier!

### Output
1. Current League Table
    - 5 Fields, 20 Rows (21 including header)
    - Position
    - Team
    - Total Games Played
    - Total Points
    - Goal Difference
    
    
2. Updated League Table
    - 5 Fields, 14 Rows (15 including header)
    - Position
    - Team
    - Total Games Played
    - Total Points
    - Goal Difference

In [251]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Input the data

In [252]:
df = pd.read_csv("./data/PL Fixtures.csv", parse_dates=["Date"])
df.head()

Unnamed: 0,Round Number,Date,Location,Home Team,Away Team,Result
0,1,2020-12-09 12:30:00,Craven Cottage,Fulham,Arsenal,0 - 3
1,1,2020-12-09 15:00:00,Selhurst Park,Crystal Palace,Southampton,1 - 0
2,1,2020-12-09 17:30:00,Anfield,Liverpool,Leeds,4 - 3
3,1,2020-12-09 20:00:00,London Stadium,West Ham,Newcastle,0 - 2
4,1,2020-09-13 14:00:00,The Hawthorns,West Brom,Leicester,0 - 3


In [253]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 380 entries, 0 to 379
Data columns (total 6 columns):
 #   Column        Non-Null Count  Dtype         
---  ------        --------------  -----         
 0   Round Number  380 non-null    int64         
 1   Date          380 non-null    datetime64[ns]
 2   Location      380 non-null    object        
 3   Home Team     380 non-null    object        
 4   Away Team     380 non-null    object        
 5   Result        316 non-null    object        
dtypes: datetime64[ns](1), int64(1), object(4)
memory usage: 17.9+ KB


### Calculate the Total Points for each team. The points are as follows

In [254]:
result_null = df.loc[df["Result"].isna()].index
df = df.drop(result_null, axis=0)
df["Home Score"] = df["Result"].map(lambda x: x.split("-")[0])
df["Away Score"] = df["Result"].map(lambda x: x.split("-")[-1])
df.sample(10)

Unnamed: 0,Round Number,Date,Location,Home Team,Away Team,Result,Home Score,Away Score
101,11,2020-05-12 20:00:00,Stamford Bridge,Chelsea,Leeds,3 - 1,3,1
213,22,2021-03-02 20:15:00,Villa Park,Aston Villa,West Ham,1 - 3,1,3
84,9,2020-11-22 16:30:00,Elland Road,Leeds,Arsenal,0 - 0,0,0
9,2,2020-09-19 15:00:00,Elland Road,Leeds,Fulham,4 - 3,4,3
139,15,2020-12-26 15:00:00,Craven Cottage,Fulham,Southampton,0 - 0,0,0
206,22,2021-02-02 18:00:00,Bramall Lane,Sheffield Utd,West Brom,2 - 1,2,1
186,20,2021-01-26 18:00:00,St. James' Park,Newcastle,Leeds,1 - 2,1,2
221,23,2021-07-02 12:00:00,Tottenham Hotspur Stadium,Spurs,West Brom,2 - 0,2,0
29,4,2020-03-10 15:00:00,Goodison Park,Everton,Brighton,4 - 2,4,2
120,13,2020-12-16 18:00:00,Elland Road,Leeds,Newcastle,5 - 2,5,2


In [255]:
df["Home Score"] = df["Home Score"].astype(int)
df["Away Score"] = df["Away Score"].astype(int)

In [256]:
def calculate_point(row_):
    if (row_["Home Score"] - row_["Away Score"]) < 0:
        return 0, 3
    elif (row_["Home Score"] - row_["Away Score"]) > 0:
        return 3, 0
    else:
        return 1, 1

In [257]:
result = df.apply(lambda x: calculate_point(x), axis=1).apply(pd.Series)
df = pd.concat([df, result], axis=1)
df = df.rename(columns={0: "Home Point", 1: "Away Point"})
df.sample(10)

Unnamed: 0,Round Number,Date,Location,Home Team,Away Team,Result,Home Score,Away Score,Home Point,Away Point
244,25,2021-02-21 14:05:00,Villa Park,Aston Villa,Leicester,1 - 2,1,2,0,3
131,14,2020-12-20 12:00:00,Amex Stadium,Brighton,Sheffield Utd,1 - 1,1,1,1,1
288,29,2021-03-20 20:00:00,Amex Stadium,Brighton,Newcastle,3 - 0,3,0,3,0
26,3,2020-09-28 17:45:00,Craven Cottage,Fulham,Aston Villa,0 - 3,0,3,0,3
155,17,2021-01-01 17:30:00,Goodison Park,Everton,West Ham,0 - 1,0,1,0,3
284,28,2021-03-14 16:30:00,Emirates Stadium,Arsenal,Spurs,2 - 1,2,1,3,0
125,13,2020-12-17 18:00:00,Villa Park,Aston Villa,Burnley,0 - 0,0,0,1,1
20,3,2020-09-26 17:30:00,The Hawthorns,West Brom,Chelsea,3 - 3,3,3,1,1
165,18,2021-12-01 20:15:00,Turf Moor,Burnley,Man Utd,0 - 1,0,1,0,3
191,20,2021-01-27 18:00:00,Stamford Bridge,Chelsea,Wolves,0 - 0,0,0,1,1


### Calculate the goal difference for each team. 
- Goal difference is the difference between goals scored and goals conceded

In [258]:
home_group = df.groupby(["Home Team"])[["Home Score", "Away Score"]].sum()
home_goal_diff = home_group["Home Score"] - home_group["Away Score"]
home_goal_diff

Home Team
Arsenal            0
Aston Villa        4
Brighton          -3
Burnley           -5
Chelsea           11
Crystal Palace    -9
Everton           -3
Fulham           -15
Leeds              3
Leicester          4
Liverpool          5
Man City          22
Man Utd           13
Newcastle         -5
Sheffield Utd    -15
Southampton        1
Spurs              9
West Brom        -21
West Ham           9
Wolves             0
dtype: int32

In [259]:
away_group = df.groupby(["Away Team"])[["Home Score", "Away Score"]].sum()
away_goal_diff = away_group["Away Score"] - away_group["Home Score"]
away_goal_diff

Away Team
Arsenal            8
Aston Villa        6
Brighton          -2
Burnley          -14
Chelsea            8
Crystal Palace   -10
Everton            6
Fulham            -3
Leeds             -3
Leicester         14
Liverpool         11
Man City          22
Man Utd           16
Newcastle        -13
Sheffield Utd    -24
Southampton      -18
Spurs              8
West Brom        -10
West Ham           2
Wolves            -9
dtype: int32

In [260]:
total_goal_diff = home_goal_diff + away_goal_diff
total_goal_diff

Home Team
Arsenal            8
Aston Villa       10
Brighton          -5
Burnley          -19
Chelsea           19
Crystal Palace   -19
Everton            3
Fulham           -18
Leeds              0
Leicester         18
Liverpool         16
Man City          44
Man Utd           29
Newcastle        -18
Sheffield Utd    -39
Southampton      -17
Spurs             17
West Brom        -31
West Ham          11
Wolves            -9
dtype: int32

### Calculate the current rank/position of each team. 
- This is based on Total Points (high to low) and in a case of a tie then Goal Difference (high to low).

In [261]:
home_df = df[["Round Number", "Date", "Home Team", "Home Score", "Home Point"]]
home_df.columns = ["Round Number", "Date", "Team", "Score", "Point"]

away_df = df[["Round Number", "Date", "Away Team", "Away Score", "Away Point"]]
away_df.columns = ["Round Number", "Date", "Team", "Score", "Point"]

total_df = pd.concat([home_df, away_df], axis=0)
total_df

Unnamed: 0,Round Number,Date,Team,Score,Point
0,1,2020-12-09 12:30:00,Fulham,0,0
1,1,2020-12-09 15:00:00,Crystal Palace,1,3
2,1,2020-12-09 17:30:00,Liverpool,4,3
3,1,2020-12-09 20:00:00,West Ham,0,0
4,1,2020-09-13 14:00:00,West Brom,0,0
...,...,...,...,...,...
312,32,2021-04-17 12:30:00,West Ham,2,0
314,32,2021-04-17 20:15:00,Sheffield Utd,0,0
315,32,2021-04-18 13:30:00,Fulham,1,1
316,32,2021-04-18 16:00:00,Burnley,1,0


In [262]:
current_rank = total_df.groupby(["Team"])["Point"].sum().reset_index()
current_rank["Goal Difference"] = total_goal_diff.values
current_rank

Unnamed: 0,Team,Point,Goal Difference
0,Arsenal,46,8
1,Aston Villa,44,10
2,Brighton,33,-5
3,Burnley,33,-19
4,Chelsea,54,19
5,Crystal Palace,38,-19
6,Everton,49,3
7,Fulham,27,-18
8,Leeds,46,0
9,Leicester,56,18


In [263]:
current_rank["Position"] = current_rank[["Point", "Goal Difference"]].apply(tuple, axis=1).rank(method="dense", ascending=False).astype(int)
current_rank = current_rank.sort_values(by="Position", ascending=True)
current_rank

Unnamed: 0,Team,Point,Goal Difference,Position
11,Man City,74,44,1
12,Man Utd,66,29,2
9,Leicester,56,18,3
18,West Ham,55,11,4
4,Chelsea,54,19,5
10,Liverpool,53,16,6
16,Spurs,50,17,7
6,Everton,49,3,8
0,Arsenal,46,8,9
8,Leeds,46,0,10


### The current league table is our first output

In [264]:
num_match = total_df.groupby(["Team"])["Score"].count().reset_index()
current_rank = current_rank.merge(num_match, how="inner", on="Team").rename(columns={"Score": "Total GamesPlayed", "Point": "Total Points"})
current_rank = current_rank.loc[:, ["Position", "Team", "Total GamesPlayed", "Total Points", "Goal Difference"]]
current_rank

Unnamed: 0,Position,Team,Total GamesPlayed,Total Points,Goal Difference
0,1,Man City,32,74,44
1,2,Man Utd,32,66,29
2,3,Leicester,31,56,18
3,4,West Ham,32,55,11
4,5,Chelsea,31,54,19
5,6,Liverpool,32,53,16
6,7,Spurs,32,50,17
7,8,Everton,31,49,3
8,9,Arsenal,32,46,8
9,10,Leeds,32,46,0


In [265]:
current_rank.to_csv("./output/Week15_output_1.csv")

### Assuming the 'Big 6' didn't play any games this season, recalculate the league table
- 'Big 6' : Arsenal, Chelsea, Liverpool, Manchester United, Manchester City, and Tottenham Hotspur

In [266]:
big_6_home = df.loc[df["Home Team"].isin(["Arsenal", "Chelsea", "Liverpool", "Man City", "Man Utd", "Spurs"])].index
big_6_away = df.loc[df["Away Team"].isin(["Arsenal", "Chelsea", "Liverpool", "Man City", "Man Utd", "Spurs"])].index
big_6 = big_6_home.union(big_6_away)
without_6_df = df.drop(big_6, axis=0)
without_6_df.sample(10)

Unnamed: 0,Round Number,Date,Location,Home Team,Away Team,Result,Home Score,Away Score,Home Point,Away Point
47,5,2020-10-19 20:00:00,Elland Road,Leeds,Wolves,0 - 1,0,1,0,3
242,25,2021-02-20 20:00:00,Craven Cottage,Fulham,Sheffield Utd,1 - 0,1,0,3,0
32,4,2020-04-10 12:00:00,King Power Stadium,Leicester,West Ham,0 - 3,0,3,0,3
67,7,2020-02-11 20:00:00,Elland Road,Leeds,Leicester,1 - 4,1,4,0,3
31,4,2020-03-10 20:00:00,St. James' Park,Newcastle,Burnley,3 - 1,3,1,3,0
172,19,2021-01-16 15:00:00,London Stadium,West Ham,Burnley,1 - 0,1,0,3,0
164,18,2021-12-01 18:00:00,Bramall Lane,Sheffield Utd,Newcastle,1 - 0,1,0,3,0
217,23,2021-06-02 15:00:00,Turf Moor,Burnley,Brighton,1 - 1,1,1,1,1
58,7,2020-10-30 20:00:00,Molineux Stadium,Wolves,Crystal Palace,2 - 0,2,0,3,0
230,24,2021-02-14 12:00:00,St. Mary's Stadium,Southampton,Wolves,1 - 2,1,2,0,3


### After removing the 6 clubs, how has the position changed for the remaining clubs?

In [267]:
# Calculate goal difference for each team
home_group = without_6_df.groupby(["Home Team"])[["Home Score", "Away Score"]].sum()
home_goal_diff = home_group["Home Score"] - home_group["Away Score"]


away_group = without_6_df.groupby(["Away Team"])[["Home Score", "Away Score"]].sum()
away_goal_diff = away_group["Away Score"] - away_group["Home Score"]

total_goal_diff = home_goal_diff + away_goal_diff
total_goal_diff

Home Team
Aston Villa        7
Brighton           0
Burnley           -1
Crystal Palace     0
Everton            4
Fulham            -8
Leeds             11
Leicester         15
Newcastle         -4
Sheffield Utd    -24
Southampton        1
West Brom        -21
West Ham          21
Wolves            -1
dtype: int32

In [268]:
# Home & Away Team record concatenation
home_df = without_6_df[["Round Number", "Date", "Home Team", "Home Score", "Home Point"]]
home_df.columns = ["Round Number", "Date", "Team", "Score", "Point"]

away_df = without_6_df[["Round Number", "Date", "Away Team", "Away Score", "Away Point"]]
away_df.columns = ["Round Number", "Date", "Team", "Score", "Point"]

total_without_6_df = pd.concat([home_df, away_df], axis=0)
total_without_6_df.sample(10)

Unnamed: 0,Round Number,Date,Team,Score,Point
112,12,2020-12-13 12:00:00,Southampton,3,3
261,26,2021-03-03 18:00:00,Sheffield Utd,1,3
209,22,2021-02-02 20:15:00,Crystal Palace,2,3
97,10,2020-11-30 20:00:00,West Ham,2,3
268,27,2021-06-03 17:30:00,Wolves,0,1
159,17,2021-02-01 17:30:00,Wolves,3,1
288,29,2021-03-20 20:00:00,Brighton,3,3
63,7,2020-01-11 14:00:00,Newcastle,2,3
253,26,2021-02-28 12:00:00,Fulham,0,1
197,21,2021-01-30 15:00:00,Wolves,0,0


In [269]:
# Calculate total points and goal difference for each team
update_rank = total_without_6_df.groupby(["Team"])["Point"].sum().reset_index()
update_rank["Goal Difference"] = total_goal_diff.values
update_rank

Unnamed: 0,Team,Point,Goal Difference
0,Aston Villa,34,7
1,Brighton,26,0
2,Burnley,26,-1
3,Crystal Palace,32,0
4,Everton,34,4
5,Fulham,21,-8
6,Leeds,39,11
7,Leicester,40,15
8,Newcastle,32,-4
9,Sheffield Utd,11,-24


In [270]:
# Calculate ranking based on "total points & goal difference"
update_rank["Position"] = update_rank[["Point", "Goal Difference"]].apply(tuple, axis=1).rank(method="dense", ascending=False).astype(int)
update_rank = update_rank.sort_values(by="Position", ascending=True)
update_rank

Unnamed: 0,Team,Point,Goal Difference,Position
12,West Ham,49,21,1
7,Leicester,40,15,2
6,Leeds,39,11,3
0,Aston Villa,34,7,4
4,Everton,34,4,5
3,Crystal Palace,32,0,6
8,Newcastle,32,-4,7
10,Southampton,30,1,8
13,Wolves,30,-1,9
1,Brighton,26,0,10


In [271]:
# Add 'Total GamesPlayed' column to the update_rank DataFrame
num_match = total_without_6_df.groupby(["Team"])["Score"].count().reset_index()
update_rank = update_rank.merge(num_match, how="inner", on="Team")
update_rank = update_rank.rename(columns={"Score": "Total GamesPlayed", "Point": "Total Points"})
update_rank = update_rank.loc[:, ["Position", "Team", "Total GamesPlayed", "Total Points", "Goal Difference"]]
update_rank

Unnamed: 0,Position,Team,Total GamesPlayed,Total Points,Goal Difference
0,1,West Ham,21,49,21
1,2,Leicester,22,40,15
2,3,Leeds,22,39,11
3,4,Aston Villa,22,34,7
4,5,Everton,21,34,4
5,6,Crystal Palace,22,32,0
6,7,Newcastle,23,32,-4
7,8,Southampton,21,30,1
8,9,Wolves,22,30,-1
9,10,Brighton,22,26,0


### The updated league table is the second output.

In [272]:
# Finalize the updated ranking with Position Change by comparing the current ranking number
update_rank = update_rank.merge(current_rank[["Team", "Position"]], how="left", on="Team")
update_rank["Position Change"] = abs(update_rank["Position_x"] - update_rank["Position_y"])
update_rank = update_rank.rename(columns={"Position_x": "Position"})
update_rank = update_rank.drop("Position_y", axis=1)
update_rank

Unnamed: 0,Position,Team,Total GamesPlayed,Total Points,Goal Difference,Position Change
0,1,West Ham,21,49,21,3
1,2,Leicester,22,40,15,1
2,3,Leeds,22,39,11,7
3,4,Aston Villa,22,34,7,7
4,5,Everton,21,34,4,3
5,6,Crystal Palace,22,32,0,7
6,7,Newcastle,23,32,-4,8
7,8,Southampton,21,30,1,6
8,9,Wolves,22,30,-1,3
9,10,Brighton,22,26,0,6


In [273]:
update_rank.to_csv("./output/Week15_output_2.csv")