# Introduction

Hai kagglers, this is my tutorial on how to access all insight Premier League All Season Dataset.
We will perform some EDA and visualization, lets get into it.

## Import Modules

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import glob

As you probably know already this dataset is provide with final standing every season and managers as well.
For now we will go through the final standing first. There are 28 season in total, here is how you set up a good
Dataframe for this dataset.

In [None]:
# Import all files
all_files = glob.glob("../input/premier-league-standing-all-season-19922020/Premier League*.csv")

# Combine all DataFrame
all = []
for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    df.columns = ['Position','Club','Played','Won','Drawn','Lost','GF','GA','GD','Points','season']
    all.append(df)

# Sort DataFrame
df = pd.concat(all, axis=0, ignore_index=True, sort=False)
df = df.sort_values(['season','Position'], ascending=[True, True])

Lets check how it look

In [None]:
df.head()

In [None]:
df.info()

In [None]:
df.isnull().sum()

Cool, there is 566 row with 12 features we can explore here.

## Best Team

As always let's start with the best team, who won the first place on every season.

In [None]:
# Ignore this, this is just my setup on how to use seaborn
sns.set(rc={'figure.figsize':(18,9), 'lines.linewidth': 5, 'lines.markersize': 5, "axes.labelsize":15}, style="whitegrid")

# Get the best team
win = df.Club[df['Position'] == 1].value_counts()

In [None]:
sns.barplot(x=win, y=win.index, data=df)

Quite shocking here, i'm not a fans of premier league and i know Manchester United has a great history. Now i see it with data, this is really amazing how MU won almost half of all season on premier league.

## Worst Team

There is always a worst team on every league, and it's just interesting how a club keep getting worst position on every season.

In [None]:
# get all club on position 20 every season
lose = df.Club[df['Position'] == 20].value_counts()

In [None]:
sns.barplot(x=lose, y=lose.index, data=df)

Sunderland is the worst team on premier league since it's finish on last position for 3 season

## Most UCL Participant

UCL participant is decided by the top 4 team on every season.

In [None]:
# Get the top 5 most UCL participant from premier league
UCL = df.Club[df['Position'].between(1,4)].value_counts().nlargest(5)

In [None]:
sns.barplot(y=UCL, x=UCL.index, data=df)

Manchester United has 23 time participate in UCL champion league, almost in every season finish in top 4. What an achievement. Follow by arsenal with 21 times, and the so on.

## Interesting Fact

it os time to gather some interesting fact in this dataset. Fist let's take a look on how many teams play for premier league the most.

In [None]:
# count all teams every season
Part = df.Club.value_counts()

In [None]:
sns.barplot(x=Part, y=Part.index, data=df)

### Monster

The most participant on premier league here is Chelsea, Liverpool, Everton, Arsenal, Manchester United, and Tottenham Hotspur. They play for the whole 28 season and never relegate to lower league. What a monster..

It is quite interesting to see team like Everton and Tottenham Hotspur never won premier league even though they played for all season.

### Invisible

Another fact that got my interest is a team who never lost a match for a season, they just invisible.

In [None]:
df[df['Lost'] == 0]

On 2004, Arsenal come out as the winner of Premier League with 0 lost

## Points Master

Acumulation of each team point in total

In [None]:
point = df.groupby('Club')[['Points']].sum().sort_values('Points', ascending=False)
point

## Total Win Rate

Accumulation of win rate on each team for all season in total

In [None]:
win = df.groupby('Club')[['Won']].sum().sort_values('Won', ascending=False)
played = df.groupby('Club')[['Played']].sum().sort_values('Played', ascending=False)

results = pd.merge(win, played, on=['Club'])
win_rate = []

for x in range(len(results.index)):
    result = results.Won[x] / results.Played[x]
    win_rate.append(result)
    
results['win_rate'] = win_rate
results

## Performance Charts

My favorite section, here we can see all performance charts by the best teams.

In [None]:
# First we need to make time column, in this case i put 4 season in one row
dates = []

c = df['season'].unique()
ser = [c[x:x+4] for x in range(0, len(c), 4)]

for x in range(len(ser)):
    s = ' '.join(ser[x])
    f1 = s[:4]
    f2 = s[-4:]
    f3 = str(f1) + '-' + str(f2)
    dates.append(f3)

# Make a different Dataframe and put time data into it
performance = pd.DataFrame(dates, columns=['Time'])
performance

### Manchester United

In [None]:
MU = []

win = df[(df['Position'] == 1) & (df['Club'] == 'Manchester United')]

for y in range(len(ser)):
    wins = 0
    for s in ser[y]:
        for x in win.season:
            if x == s:
                wins += 1
    MU.append(wins)
    
performance['Manchester United'] = MU

In [None]:
sns.lineplot(x='Time', y='Manchester United', data=performance)

Manchester United won 13 times on premier league, most of it happen around 1992-2012. We can see here Manchester United didn't perform well lately.

### Chelsea

In [None]:
Chelsea = []

win = df[(df['Position'] == 1) & (df['Club'] == 'Chelsea')]

for y in range(len(ser)):
    wins = 0
    for s in ser[y]:
        for x in win.season:
            if x == s:
                wins += 1
    Chelsea.append(wins)
    
performance['Chelsea'] = Chelsea

In [None]:
sns.lineplot(x='Time', y='Chelsea', data=performance)

Chelsea is 2nd best team here, the performance charts show that chelsea is onfire around 2004 - 2008, and doing just fine up until now

### Manchester City

In [None]:
MC = []

win = df[(df['Position'] == 1) & (df['Club'] == 'Manchester City')]

for y in range(len(ser)):
    wins = 0
    for s in ser[y]:
        for x in win.season:
            if x == s:
                wins += 1
    MC.append(wins)
    
performance['Manchester City'] = MC

In [None]:
sns.lineplot(x='Time', y='Manchester City', data=performance)

Manchester City made a breakthrough on 2012 and doing great right now

### Arsenal

In [None]:
ARS = []

win = df[(df['Position'] == 1) & (df['Club'] == 'Arsenal')]

for y in range(len(ser)):
    wins = 0
    for s in ser[y]:
        for x in win.season:
            if x == s:
                wins += 1
    ARS.append(wins)
    
performance['Arsenal'] = ARS

In [None]:
sns.lineplot(x='Time', y='Arsenal', data=performance)

Arsenal has their best achievement around 1996 - 2004, after win premier league on 2004 without losing any match, they never finish first anymore.

# Managers

Now let's talk about managers, team won't be a good team without great manager. Let's take a look into our manager dataset.

In [None]:
df_man = pd.read_csv('../input/premier-league-standing-all-season-19922020/PL Manager All Season (1992-2020).csv')

In [None]:
df_man.head()

In [None]:
df_man.info()

This dataset has 4 features, name, club, nationality, and season. now let's merge this dataset with standings dataset.

In [None]:
# merge dataset
complete = pd.merge(df_man, df, on=['season', 'Club'])
complete.head()

## Best Managers

Let's see who is the best manager ever

In [None]:
# get managers name on standing position #1
best_man = complete.Name[complete['Position'] == 1].value_counts()

In [None]:
sns.barplot(y=best_man.index, x=best_man, data=complete)

Alex ferguson has the most premier league champion title here, it is exactly the same total amount of the team itself, Manchester United.
Now we can take a conclusion here if MU won only when Alex Ferguson is their managers.

## Longest Career

Longest career Manager on premier league, they just loyal.

In [None]:
longst_career = complete.Name.value_counts().head(8)

In [None]:
sns.barplot(y=longst_career.index, x=longst_career, data=complete)

Arsene Wenger is the longest manager on premier league, he is there for 22 season. Follow by Alex Ferguson with 21 season.

## Performance Chart Best Manager

Let's take a look on how few best managers perform and comparison with their team performance.

### Alex Ferguson

In [None]:
AF = []

win = complete[(complete['Position'] == 1) & (complete['Name'] == 'Alex Ferguson')]

for y in range(len(ser)):
    wins = 0
    for s in ser[y]:
        for x in win.season:
            if x == s:
                wins += 1
    AF.append(wins)
    
performance['Alex Ferguson'] = AF

In [None]:
sns.lineplot(x='Time', y='Alex Ferguson', data=performance)

Alex ferguson has his best performance since early season up until 2012, and retired after season 2014.

#### Comparison With Team

His team is Manchester United, we are going to compare their performance

In [None]:
complete[(complete['Name'] == 'Alex Ferguson') & (complete['Position'] == 1)]

In [None]:
complete[(complete['Club'] == 'Manchester United') & (complete['Position'] == 1)]

In [None]:
All = performance[['Time', 'Manchester United', 'Alex Ferguson']].melt('Time', var_name='cols',  value_name='vals')
sns.lineplot(x="Time", y="vals", hue='cols', data=All)

This is a comparison between Alex Ferguson performance and his team Manchester United performance. Their performance chart is exactly same, which os why we only see one kind of line here. This is a proof how bad MU condition without Alex Ferguson as their manager.

### José Mourinho

In [None]:
JM = []

win = complete[(complete['Position'] == 1) & (complete['Name'] == 'José Mourinho')]

for y in range(len(ser)):
    wins = 0
    for s in ser[y]:
        for x in win.season:
            if x == s:
                wins += 1
    JM.append(wins)
    
performance['José Mourinho'] = JM

In [None]:
sns.lineplot(x='Time', y='José Mourinho', data=performance)

José Mourinho perform best at season 2004-2008

#### Comparison With Team

Since most of his team is Chelsea, we are going to compare him with chelsea performance

In [None]:
complete[(complete['Name'] == 'José Mourinho') & (complete['Position'] == 1)]

In [None]:
complete[(complete['Club'] == 'Chelsea') & (complete['Position'] == 1)]

In [None]:
All = performance[['Time', 'Chelsea', 'José Mourinho']].melt('Time', var_name='cols',  value_name='vals')
sns.lineplot(x="Time", y="vals", hue='cols', data=All)

These comparison shows that Chelsea can still perform well no matter who is their managers

### Arsène Wenger

In [None]:
AW = []

win = complete[(complete['Position'] == 1) & (complete['Name'] == 'Arsène Wenger')]

for y in range(len(ser)):
    wins = 0
    for s in ser[y]:
        for x in win.season:
            if x == s:
                wins += 1
    AW.append(wins)
    
performance['Arsène Wenger'] = AW

In [None]:
sns.lineplot(x='Time', y='Arsène Wenger', data=performance)

#### Comparison With Team

This loyal manager always with Arsenal

In [None]:
complete[(complete['Name'] == 'Arsène Wenger') & (complete['Position'] == 1)]

In [None]:
complete[(complete['Club'] == 'Arsenal') & (complete['Position'] == 1)]

In [None]:
All = performance[['Time', 'Arsenal', 'Arsène Wenger']].melt('Time', var_name='cols',  value_name='vals')
sns.lineplot(x="Time", y="vals", hue='cols', data=All)

Both arsenal and Arsene Wenger only manage to do their best around 2000 - 2004, since then they never won any premier league at all

### Josep Guardiola

In [None]:
JG = []

win = complete[(complete['Position'] == 1) & (complete['Name'] == 'Josep Guardiola')]

for y in range(len(ser)):
    wins = 0
    for s in ser[y]:
        for x in win.season:
            if x == s:
                wins += 1
    JG.append(wins)
    
performance['Josep Guardiola'] = JG

In [None]:
sns.lineplot(x='Time', y='Josep Guardiola', data=performance)

Last is Josep Guardiola, this spanish manager make a great performance lately.two season in a row 2018 and 2019.

#### Comparison With Team

In [None]:
complete[(complete['Name'] == 'Josep Guardiola') & (complete['Position'] == 1)]

In [None]:
complete[(complete['Club'] == 'Manchester City') & (complete['Position'] == 1)]

In [None]:
All = performance[['Time', 'Manchester City', 'Josep Guardiola']].melt('Time', var_name='cols',  value_name='vals')
sns.lineplot(x="Time", y="vals", hue='cols', data=All)

This performance show Josep Guardiola make Manchester City win more premier league title in last 4 year.

## End

I have put everything i could come up with in this dataest, hope you guys enjoy exploring this dataset, and i would really appreciate it if you would help me by upvoted it. Thank you for your time.

Have a good day kagglers