# Pandas Operations
---

What else can pandas do?

In [3]:
import pandas as pd
import numpy as np

df = pd.read_csv('results.csv')
df = df[['DateTime', 'HomeTeam', 'AwayTeam', 'FTHG', 'FTAG', 'Referee']]

df.head()

Unnamed: 0,DateTime,HomeTeam,AwayTeam,FTHG,FTAG,Referee
0,2020-09-12T12:30:00Z,Fulham,Arsenal,0,3,C Kavanagh
1,2020-09-12T15:00:00Z,Crystal Palace,Southampton,1,0,J Moss
2,2020-09-12T17:30:00Z,Liverpool,Leeds,4,3,M Oliver
3,2020-09-12T20:00:00Z,West Ham,Newcastle,0,2,S Attwell
4,2020-09-13T14:00:00Z,West Brom,Leicester,0,3,A Taylor


Let's look what teams are in the set with `.unique()`.

In [4]:
unique_teams = df['HomeTeam'].unique()

print(unique_teams)

print(f'{len(unique_teams)} unique teams.')

['Fulham' 'Crystal Palace' 'Liverpool' 'West Ham' 'West Brom' 'Tottenham'
 'Brighton' 'Sheffield United' 'Everton' 'Leeds' 'Man United' 'Arsenal'
 'Southampton' 'Newcastle' 'Chelsea' 'Leicester' 'Aston Villa' 'Wolves'
 'Burnley' 'Man City']
20 unique teams.


20 teams. A full dataset will have 380 games, let's check this.

In [5]:
len(df) == (20*19)

True

In [6]:
# What do we have in the columns?

df.columns

Index(['DateTime', 'HomeTeam', 'AwayTeam', 'FTHG', 'FTAG', 'Referee'], dtype='object')

We're not interested in referees, so let's `del` to permantly delete the referees list.

In [7]:
del df['Referee']

## Functions with data frames

Pandas allows us to easily apply functions and sums to dataFrames and the series that they are made of.

Let's create two new columns:

1. Result - home score minus away score
2. ResultText - strings sayingw whether home or away won, or draw

In [9]:
# series can do lots of sums for us very quickly

df['Result'] = df['FTHG'] - df['FTAG']

# define a new function that calculates the winner from the above number

def find_winner(value):
  if value > 0:
    return 'Home Win'
  elif value == 0:
    return 'Draw'
  else:
    return 'Away Win'

df['ResultText'] = df['Result'].apply(find_winner)

df.head(3)

Unnamed: 0,DateTime,HomeTeam,AwayTeam,FTHG,FTAG,Result,ResultText
0,2020-09-12T12:30:00Z,Fulham,Arsenal,0,3,-3,Away Win
1,2020-09-12T15:00:00Z,Crystal Palace,Southampton,1,0,1,Home Win
2,2020-09-12T17:30:00Z,Liverpool,Leeds,4,3,1,Home Win


Another application would be to see if more goals are scored by the home or away team. Let's check the means.

In [10]:
print(df['FTHG'].mean())
print(df['FTAG'].mean())

1.3526315789473684
1.3421052631578947


As a broad rule for the season, the home team should expect a 0.01 goal advantage. Very slim.

What is the average for home and away goals during home and away wins?

In [12]:
df.groupby('ResultText').mean(numeric_only=True)

Unnamed: 0_level_0,FTHG,FTAG,Result
ResultText,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Away Win,0.581699,2.339869,-1.75817
Draw,0.86747,0.86747,0.0
Home Win,2.451389,0.555556,1.895833


If I love to see lots of goals, what team should I check out?

In [14]:
# create a total goals field by adding home and away

df['TotalGoals'] = df['FTHG'] + df['FTAG']

# group dataframe by home team and look at the mean total goals
# then sort in descending order

df.groupby('HomeTeam').mean(numeric_only=True)['TotalGoals'].sort_values(ascending=False)

HomeTeam
Man United          3.473684
Leicester           3.368421
Man City            3.157895
Newcastle           3.105263
Aston Villa         2.947368
Tottenham           2.894737
West Ham            2.842105
West Brom           2.842105
Southampton         2.789474
Crystal Palace      2.736842
Everton             2.736842
Liverpool           2.578947
Chelsea             2.578947
Leeds               2.578947
Wolves              2.421053
Arsenal             2.368421
Brighton            2.315789
Burnley             2.157895
Sheffield United    2.052632
Fulham              1.947368
Name: TotalGoals, dtype: float64

Looks like we should watch Man Utd. No thanks. What about away?

In [15]:
df.groupby('AwayTeam').mean(numeric_only=True)['TotalGoals'].sort_values(ascending=False)

AwayTeam
Leeds               3.526316
Southampton         3.263158
Liverpool           3.210526
Tottenham           3.052632
West Brom           3.000000
Man City            2.894737
West Ham            2.894737
Crystal Palace      2.894737
Leicester           2.842105
Man United          2.684211
Newcastle           2.578947
Arsenal             2.578947
Burnley             2.473684
Aston Villa         2.368421
Chelsea             2.368421
Sheffield United    2.315789
Fulham              2.263158
Everton             2.263158
Brighton            2.210526
Wolves              2.210526
Name: TotalGoals, dtype: float64

Watch Leeds.