# 1. An Introduction to British Premier League

## 1.1 About the league

Football/soccer is one of the most played games in the world. It is played by approximately **250 million** players in over **200 countries and dependencies**, making it the [world's most popular sport.](https://en.wikipedia.org/wiki/Association_football#:~:text=It%20is%20played%20by%20approximately,the%20world's%20most%20popular%20sport.)

The **British/English Premier League** is a soccer/football league of **England**. It is a **Division 1** league which is played among **20 clubs**. It works on a system of **promotion and relegation of clubs** to lower division or to UEFA champions league. The first season of English Premier League was played in February 1992 and since then it has been playing to present days. Currently the **29th Season** of the league is in progress with about 13-14 matches played by each club out of 32 matches.

##### Note: The number of matches and clubs were not the same since the start. These rules were adopted later on

## 1.2 Rules & Regulation

Since our goal is not to comprehend how the league proceed but to analyse the factor or similarity of the pattern among the teams being relegated. We will just take a light overview of the game.

Some of the rules taken from [Premier League Official site](https://www.premierleague.com/premier-league-explained#:~:text=The%20league%20takes%20place%20between,winning%20the%20Premier%20League%20title.) are :
- The league takes place between August and May and involves the teams playing each other home and away across the season, a total of 380 matches.
- Three points are awarded for a win, one point for a draw and none for a defeat, with the team with the most points at the end of the season winning the Premier League title.
- The teams that finish in the bottom three of the league table at the end of the campaign are relegated to the Championship, the second tier of English football.
- Those teams are replaced by three clubs promoted from the Championship; the sides that finish in first and second place and the third via the end-of-season playoffs. 

In [1]:
import pandas as pd
import numpy as np
import glob
import kaggle
from zipfile import ZipFile
import os
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [2]:
from kaggle.api.kaggle_api_extended import KaggleApi

In [3]:
api=KaggleApi()
api.authenticate()

In [4]:
api.dataset_download_files("egadharmawan/premier-league-standing-all-season-19922020")

In [5]:
zf=ZipFile('premier-league-standing-all-season-19922020.zip')
zf.extractall('Dataset/')
zf.close()

In [45]:
all_files = glob.glob("Dataset\Premier League*.csv")

# Combine all DataFrame
all = []
for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    df.columns = ['Position','Club','Played','Won','Drawn','Lost','GF','GA','GD','Points','Season']
    all.append(df)

# Sort DataFrame
df1 = pd.concat(all, axis=0, ignore_index=True, sort=False)
df1 = df1.sort_values(['Season','Position'], ascending=[True, True])

In [47]:
df1['Season'].unique()

array(['1992/1993', '1993/1994', '1994/1995', '1995/1996', '1996/1997',
       '1997/1998', '1998/1999', '1999/2000', '2000/2001', '2001/2002',
       '2002/2003', '2003/2004', '2004/2005', '2005/2006', '2006/2007',
       '2007/2008', '2008/2009', '2009/2010', '2010/2011', '2011/2012',
       '2012/2013', '2013/2014', '2014/2015', '2015/2016', '2016/2017',
       '2017/2018', '2018/2019', '2019/2020'], dtype=object)

In [13]:
df1['Club'].unique()

array(['Manchester United', 'Aston Villa', 'Norwich City',
       'Blackburn Rovers', 'Queens Park Rangers', 'Liverpool',
       'Sheffield Wednesday', 'Tottenham Hotspur', 'Manchester City',
       'Arsenal', 'Chelsea', 'Wimbledon', 'Everton', 'Sheffield United',
       'Coventry City', 'Ipswich Town', 'Leeds United', 'Southampton',
       'Oldham Athletic', 'Crystal Palace', 'Middlesbrough',
       'Nottingham Forest', 'Newcastle United', 'West Ham United',
       'Swindon Town', 'Leicester City', 'Bolton Wanderers',
       'Derby County', 'Sunderland', 'Barnsley', 'Charlton Athletic',
       'Bradford City', 'Watford', 'WestHam United', 'Fulham',
       'Birmingham City', 'West Bromwich Albion', 'Portsmouth',
       'Wolverhampton Wanderers', 'Wigan Athletic', 'Reading',
       'Stoke City', 'Hull City', 'Burnley', 'Blackpool', 'Swansea City',
       'Cardiff City', 'Bournemouth', 'Brighton and Hove Albion',
       'Huddersfield Town'], dtype=object)

In [57]:
df2=pd.read_csv("Dataset/2019_2021/2019_2021.csv",index_col=0)

In [58]:
df2['Squad'].unique()

array(['Liverpool', 'Manchester City', 'Manchester Utd', 'Chelsea',
       'Leicester City', 'Tottenham', 'Wolves', 'Arsenal',
       'Sheffield Utd', 'Burnley', 'Southampton', 'Everton',
       'Newcastle Utd', 'Crystal Palace', 'Brighton', 'West Ham',
       'Aston Villa', 'Bournemouth', 'Watford', 'Norwich City',
       'Leeds United', 'Fulham', 'West Brom'], dtype=object)

In [59]:
n=0
team_names_tobecorrected=[]
for x in df2['Squad'].unique():
    if not x in df1['Club'].unique():
        team_names_tobecorrected.append(x)
        n+=1

In [60]:
team_names_tobecorrected

['Manchester Utd',
 'Tottenham',
 'Wolves',
 'Sheffield Utd',
 'Newcastle Utd',
 'Brighton',
 'West Ham',
 'West Brom']

In [61]:
corrected_names=['Manchester United','Tottenham Hotspur','Wolverhampton Wanderers','Sheffield United','Newcastle United','Brighton and Hove Albion','West Ham United','West Bromwich Albion']

In [62]:
df2['Squad'].replace(to_replace=team_names_tobecorrected,value=corrected_names,inplace=True)

In [63]:
df2['Season']='2019/2020'
df2['Season'][20:]='2020/2021'

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [64]:
df2.columns=df1.columns

In [67]:
df1.columns=df2.columns

In [73]:
df=pd.concat([df1,df2])

In [77]:
df.reset_index(drop=True)

Unnamed: 0,Position,Club,Played,Won,Drawn,Lost,GF,GA,GD,Points,Season
0,1,Manchester United,42,24,12,6,67,31,36,84.0,1992/1993
1,2,Aston Villa,42,21,11,10,57,40,17,74.0,1992/1993
2,3,Norwich City,42,21,9,12,61,65,-4,72.0,1992/1993
3,4,Blackburn Rovers,42,20,11,11,68,46,22,71.0,1992/1993
4,5,Queens Park Rangers,42,17,12,13,63,55,8,63.0,1992/1993
...,...,...,...,...,...,...,...,...,...,...,...
601,16,Brighton and Hove Albion,13,2,5,6,15,21,-6,11.0,2020/2021
602,17,Fulham,14,2,4,8,13,23,-10,10.0,2020/2021
603,18,Burnley,12,2,4,6,6,18,-12,10.0,2020/2021
604,19,West Bromwich Albion,13,1,4,8,10,26,-16,7.0,2020/2021


In [78]:
df.head()

Unnamed: 0,Position,Club,Played,Won,Drawn,Lost,GF,GA,GD,Points,Season
400,1,Manchester United,42,24,12,6,67,31,36,84.0,1992/1993
401,2,Aston Villa,42,21,11,10,57,40,17,74.0,1992/1993
402,3,Norwich City,42,21,9,12,61,65,-4,72.0,1992/1993
403,4,Blackburn Rovers,42,20,11,11,68,46,22,71.0,1992/1993
404,5,Queens Park Rangers,42,17,12,13,63,55,8,63.0,1992/1993
