# Introduction

This notebook will walk through some various tasks that can be performed with pandas. We will look at how to read csv files, analyse the overall dataset, create subsets of data, add and remove columns

**NOTE**: This is a markdown cell. These are usefull for adding text to jupyter notebooks. This can be done by selecting the cell and pressing `m`

<a id="contents"></a>

# Contents

- ### [Packages](#packages)
- ### [Data](#data)
  - [Reading Data](#reading-data)
  - [DataFrame Overview](#df-overview)
  - [dtypes](#dtypes)
  - [Converting Strings to Dates](#string-to-date)
  - [Picking and Renaming Columns](#pick-n-rename) 
  - [Renaming Values](#rename-vals)
  - [Creating new columns from existing data](#new-cols-existing-data)

<a id="packages"></a>

# Packages

It is best practice to keep all imported packages at or near the top of notebooks or python scripts. For this walkthrough, we will need a couple of packages. It is standard to `import pandas as pd`. `pd` is like a nickname here that means we do not have to type pandas every time we want to use it. You *could* name this anything you want, but it's an excepted term that makes it easier for everyone to follow your code.

In [1]:
import pandas as pd

<a id="data"></a>

# Data

We can load data directly from file by using the file path. Similarly, if we have a csv file stored in a url we can load the data *directly* from that url. For this tutorial we will be looking at Premier League results and odds from `https://www.football-data.co.uk/`

The link below was found on the website, however there are simple ways of grabbing all of that data through python instead. An explainer of the columns can be found [here](https://www.football-data.co.uk/notes.txt).

Although we will only be using some of these columns and making the names a little clearer



<a id="reading-data"></a>
### Reading Data

In [2]:
df = pd.read_csv("https://www.football-data.co.uk/mmz4281/1920/E0.csv`")
df

Unnamed: 0,Div,Date,Time,HomeTeam,AwayTeam,FTHG,FTAG,FTR,HTHG,HTAG,...,AvgC<2.5,AHCh,B365CAHH,B365CAHA,PCAHH,PCAHA,MaxCAHH,MaxCAHA,AvgCAHH,AvgCAHA
0,E0,09/08/2019,20:00,Liverpool,Norwich,4,1,H,4,0,...,3.43,-2.25,1.91,1.99,1.94,1.98,1.99,2.07,1.90,1.99
1,E0,10/08/2019,12:30,West Ham,Man City,0,5,A,0,1,...,2.91,1.75,1.95,1.95,1.96,1.97,2.07,1.98,1.97,1.92
2,E0,10/08/2019,15:00,Bournemouth,Sheffield United,1,1,D,0,0,...,1.92,-0.50,1.95,1.95,1.98,1.95,2.00,1.96,1.96,1.92
3,E0,10/08/2019,15:00,Burnley,Southampton,3,0,H,0,0,...,1.71,0.00,1.87,2.03,1.89,2.03,1.90,2.07,1.86,2.02
4,E0,10/08/2019,15:00,Crystal Palace,Everton,0,0,D,0,0,...,1.71,0.25,1.82,2.08,1.97,1.96,2.03,2.08,1.96,1.93
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
365,E0,20/07/2020,20:15,Wolves,Crystal Palace,2,0,H,1,0,...,1.69,-1.00,1.79,2.11,1.83,2.10,1.88,2.15,1.79,2.11
366,E0,21/07/2020,18:00,Watford,Man City,0,4,A,0,2,...,2.69,1.50,2.05,1.88,2.06,1.85,2.09,1.91,2.03,1.85
367,E0,21/07/2020,20:15,Aston Villa,Arsenal,1,0,H,1,0,...,2.13,0.25,1.96,1.97,1.98,1.94,1.99,2.00,1.95,1.94
368,E0,22/07/2020,18:00,Man United,West Ham,1,1,D,0,1,...,2.81,-1.75,2.05,1.85,2.04,1.88,2.09,1.97,1.98,1.89


The above output is known as a DataFrame. The most common variable name given to this is df. Looking at the above, we can see that there are `...` halfway across and halfway down the view we have. This indicates that there are columns and rows that are not shown as the number of columns and rows is too large. For rows this is usually fine as we shall see, but it's good to have a full view of our columns. Let's fix that:

<a id="df-overview"></a>

### DataFrame Overview

In [3]:
pd.options.display.max_columns = 999

The above sets the maximum number of visible rows to 999. If we display our dataframe again we'll see all columns are now shown. We *could* do the same for rows, but I'd advise against that unless you want to scroll through a lot of data each time you display your dataframe.

In [4]:
df

Unnamed: 0,Div,Date,Time,HomeTeam,AwayTeam,FTHG,FTAG,FTR,HTHG,HTAG,HTR,Referee,HS,AS,HST,AST,HF,AF,HC,AC,HY,AY,HR,AR,B365H,B365D,B365A,BWH,BWD,BWA,IWH,IWD,IWA,PSH,PSD,PSA,WHH,WHD,WHA,VCH,VCD,VCA,MaxH,MaxD,MaxA,AvgH,AvgD,AvgA,B365>2.5,B365<2.5,P>2.5,P<2.5,Max>2.5,Max<2.5,Avg>2.5,Avg<2.5,AHh,B365AHH,B365AHA,PAHH,PAHA,MaxAHH,MaxAHA,AvgAHH,AvgAHA,B365CH,B365CD,B365CA,BWCH,BWCD,BWCA,IWCH,IWCD,IWCA,PSCH,PSCD,PSCA,WHCH,WHCD,WHCA,VCCH,VCCD,VCCA,MaxCH,MaxCD,MaxCA,AvgCH,AvgCD,AvgCA,B365C>2.5,B365C<2.5,PC>2.5,PC<2.5,MaxC>2.5,MaxC<2.5,AvgC>2.5,AvgC<2.5,AHCh,B365CAHH,B365CAHA,PCAHH,PCAHA,MaxCAHH,MaxCAHA,AvgCAHH,AvgCAHA
0,E0,09/08/2019,20:00,Liverpool,Norwich,4,1,H,4,0,H,M Oliver,15,12,7,5,9,9,11,2,0,2,0,0,1.14,10.00,19.00,1.14,8.25,18.50,1.15,8.00,18.00,1.15,9.59,18.05,1.12,8.5,21.00,1.14,9.50,23.00,1.16,10.00,23.00,1.14,8.75,19.83,1.40,3.00,1.40,3.11,1.45,3.11,1.41,2.92,-2.25,1.96,1.94,1.97,1.95,1.97,2.00,1.94,1.94,1.14,9.50,21.00,1.14,9.00,20.00,1.15,8.00,18.00,1.14,10.43,19.63,1.11,9.5,21.00,1.14,9.50,23.00,1.16,10.50,23.00,1.14,9.52,19.18,1.30,3.50,1.34,3.44,1.36,3.76,1.32,3.43,-2.25,1.91,1.99,1.94,1.98,1.99,2.07,1.90,1.99
1,E0,10/08/2019,12:30,West Ham,Man City,0,5,A,0,1,A,M Dean,5,14,3,9,6,13,1,1,2,2,0,0,12.00,6.50,1.22,11.50,5.75,1.26,11.00,6.10,1.25,11.68,6.53,1.26,13.00,6.0,1.24,12.00,6.50,1.25,13.00,6.75,1.29,11.84,6.28,1.25,1.44,2.75,1.49,2.77,1.51,2.77,1.48,2.65,1.75,2.00,1.90,2.02,1.90,2.02,1.92,1.99,1.89,12.00,7.00,1.25,11.00,6.00,1.26,11.00,6.10,1.25,11.11,6.68,1.27,11.00,6.5,1.24,12.00,6.50,1.25,13.00,7.00,1.29,11.14,6.46,1.26,1.40,3.00,1.43,3.03,1.50,3.22,1.41,2.91,1.75,1.95,1.95,1.96,1.97,2.07,1.98,1.97,1.92
2,E0,10/08/2019,15:00,Bournemouth,Sheffield United,1,1,D,0,0,D,K Friend,13,8,3,3,10,19,3,4,2,1,0,0,1.95,3.60,3.60,1.95,3.60,3.90,1.97,3.55,3.80,2.04,3.57,3.90,2.00,3.5,3.80,2.00,3.60,4.00,2.06,3.65,4.00,2.01,3.53,3.83,1.90,1.90,1.96,1.96,2.00,1.99,1.90,1.93,-0.50,2.01,1.89,2.04,1.88,2.04,1.91,2.00,1.88,1.95,3.70,4.20,1.95,3.60,3.90,1.97,3.55,3.85,1.98,3.67,4.06,1.95,3.6,3.90,2.00,3.60,4.00,2.03,3.70,4.20,1.98,3.58,3.96,1.90,1.90,1.94,1.97,1.97,1.98,1.91,1.92,-0.50,1.95,1.95,1.98,1.95,2.00,1.96,1.96,1.92
3,E0,10/08/2019,15:00,Burnley,Southampton,3,0,H,0,0,D,G Scott,10,11,4,3,6,12,2,7,0,0,0,0,2.62,3.20,2.75,2.65,3.20,2.75,2.65,3.20,2.75,2.71,3.31,2.81,2.70,3.2,2.75,2.70,3.30,2.80,2.80,3.33,2.85,2.68,3.22,2.78,2.10,1.72,2.17,1.77,2.20,1.78,2.12,1.73,0.00,1.92,1.98,1.93,2.00,1.94,2.00,1.91,1.98,2.70,3.25,2.90,2.65,3.10,2.85,2.60,3.20,2.85,2.71,3.19,2.90,2.62,3.2,2.80,2.70,3.25,2.90,2.72,3.26,2.95,2.65,3.18,2.88,2.10,1.72,2.19,1.76,2.25,1.78,2.17,1.71,0.00,1.87,2.03,1.89,2.03,1.90,2.07,1.86,2.02
4,E0,10/08/2019,15:00,Crystal Palace,Everton,0,0,D,0,0,D,J Moss,6,10,2,3,16,14,6,2,2,1,0,1,3.00,3.25,2.37,3.20,3.20,2.35,3.10,3.20,2.40,3.21,3.37,2.39,3.10,3.3,2.35,3.20,3.30,2.45,3.21,3.40,2.52,3.13,3.27,2.40,2.20,1.66,2.23,1.74,2.25,1.74,2.18,1.70,0.25,1.85,2.05,1.88,2.05,1.88,2.09,1.84,2.04,3.40,3.50,2.25,3.30,3.30,2.25,3.40,3.30,2.20,3.37,3.45,2.27,3.30,3.3,2.25,3.40,3.30,2.25,3.55,3.50,2.34,3.41,3.37,2.23,2.20,1.66,2.22,1.74,2.28,1.77,2.17,1.71,0.25,1.82,2.08,1.97,1.96,2.03,2.08,1.96,1.93
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
365,E0,20/07/2020,20:15,Wolves,Crystal Palace,2,0,H,1,0,H,P Bankes,11,7,5,3,11,15,5,2,2,1,0,0,1.50,4.00,7.50,1.50,3.90,7.75,1.53,3.80,7.50,1.51,4.12,8.08,1.47,4.0,8.00,1.50,3.90,8.00,1.54,4.35,8.30,1.50,4.03,7.72,2.30,1.61,2.33,1.68,2.39,1.71,2.29,1.64,-1.00,1.94,1.96,1.97,1.96,1.99,2.02,1.90,1.97,1.40,4.20,9.50,1.44,4.25,8.25,1.45,4.00,8.25,1.46,4.31,8.70,1.42,4.2,9.00,1.45,4.20,9.00,1.49,4.55,10.30,1.44,4.24,8.68,2.20,1.66,2.21,1.75,2.27,1.82,2.18,1.69,-1.00,1.79,2.11,1.83,2.10,1.88,2.15,1.79,2.11
366,E0,21/07/2020,18:00,Watford,Man City,0,4,A,0,2,A,M Oliver,2,26,0,10,14,11,0,8,2,0,0,0,8.50,5.50,1.33,9.00,5.75,1.30,8.00,5.25,1.35,8.57,5.73,1.34,9.50,5.8,1.30,10.00,5.75,1.30,10.00,6.15,1.37,8.62,5.60,1.33,1.50,2.62,1.50,2.74,1.50,2.79,1.48,2.67,1.50,1.97,1.96,1.94,1.98,2.00,1.99,1.94,1.94,9.00,5.75,1.30,7.25,5.75,1.35,9.00,5.75,1.33,9.20,6.37,1.30,9.50,6.0,1.29,9.50,6.00,1.30,10.50,6.60,1.33,9.13,6.01,1.30,1.50,2.62,1.49,2.79,1.50,2.82,1.47,2.69,1.50,2.05,1.88,2.06,1.85,2.09,1.91,2.03,1.85
367,E0,21/07/2020,20:15,Aston Villa,Arsenal,1,0,H,1,0,H,C Kavanagh,8,7,3,0,13,19,8,9,2,4,0,0,3.30,3.60,2.10,3.25,3.90,2.05,3.25,3.55,2.15,3.24,3.94,2.14,3.20,3.8,2.10,3.30,3.70,2.15,3.40,3.98,2.20,3.24,3.78,2.11,1.66,2.20,1.70,2.28,1.74,2.37,1.66,2.25,0.25,2.04,1.89,2.04,1.88,2.09,1.92,2.02,1.85,3.10,3.75,2.20,3.10,3.60,2.20,3.10,3.65,2.25,3.14,3.86,2.21,3.20,3.7,2.15,3.13,3.60,2.25,3.35,3.88,2.26,3.13,3.70,2.20,1.72,2.10,1.76,2.18,1.79,2.25,1.73,2.13,0.25,1.96,1.97,1.98,1.94,1.99,2.00,1.95,1.94
368,E0,22/07/2020,18:00,Man United,West Ham,1,1,D,0,1,A,P Tierney,11,12,4,3,10,9,2,3,3,1,0,0,1.22,6.50,11.00,1.25,6.25,11.50,1.27,6.00,11.00,1.26,6.37,12.18,1.24,6.5,12.00,1.22,6.50,13.00,1.28,7.00,13.50,1.24,6.38,11.56,1.50,2.62,1.48,2.84,1.50,2.84,1.47,2.68,-1.75,1.95,1.98,1.97,1.95,1.97,2.15,1.92,1.96,1.22,6.50,12.00,1.25,6.50,10.50,1.25,6.00,9.00,1.27,6.55,10.55,1.25,6.0,12.00,1.29,6.00,11.50,1.30,7.00,13.00,1.26,6.40,10.94,1.44,2.75,1.45,2.92,1.47,2.93,1.44,2.81,-1.75,2.05,1.85,2.04,1.88,2.09,1.97,1.98,1.89


If we want to see the first 20 results, we have a couple of options:

`
df.head(20)
df[:20]
`

The second option uses a `[from:to]`. It includes the from part and **excludes** the to part. In python, indexes start at `0`, so `:20` or `0:20` will get the first 20 results.

In [5]:
df.head(20)

Unnamed: 0,Div,Date,Time,HomeTeam,AwayTeam,FTHG,FTAG,FTR,HTHG,HTAG,HTR,Referee,HS,AS,HST,AST,HF,AF,HC,AC,HY,AY,HR,AR,B365H,B365D,B365A,BWH,BWD,BWA,IWH,IWD,IWA,PSH,PSD,PSA,WHH,WHD,WHA,VCH,VCD,VCA,MaxH,MaxD,MaxA,AvgH,AvgD,AvgA,B365>2.5,B365<2.5,P>2.5,P<2.5,Max>2.5,Max<2.5,Avg>2.5,Avg<2.5,AHh,B365AHH,B365AHA,PAHH,PAHA,MaxAHH,MaxAHA,AvgAHH,AvgAHA,B365CH,B365CD,B365CA,BWCH,BWCD,BWCA,IWCH,IWCD,IWCA,PSCH,PSCD,PSCA,WHCH,WHCD,WHCA,VCCH,VCCD,VCCA,MaxCH,MaxCD,MaxCA,AvgCH,AvgCD,AvgCA,B365C>2.5,B365C<2.5,PC>2.5,PC<2.5,MaxC>2.5,MaxC<2.5,AvgC>2.5,AvgC<2.5,AHCh,B365CAHH,B365CAHA,PCAHH,PCAHA,MaxCAHH,MaxCAHA,AvgCAHH,AvgCAHA
0,E0,09/08/2019,20:00,Liverpool,Norwich,4,1,H,4,0,H,M Oliver,15,12,7,5,9,9,11,2,0,2,0,0,1.14,10.0,19.0,1.14,8.25,18.5,1.15,8.0,18.0,1.15,9.59,18.05,1.12,8.5,21.0,1.14,9.5,23.0,1.16,10.0,23.0,1.14,8.75,19.83,1.4,3.0,1.4,3.11,1.45,3.11,1.41,2.92,-2.25,1.96,1.94,1.97,1.95,1.97,2.0,1.94,1.94,1.14,9.5,21.0,1.14,9.0,20.0,1.15,8.0,18.0,1.14,10.43,19.63,1.11,9.5,21.0,1.14,9.5,23.0,1.16,10.5,23.0,1.14,9.52,19.18,1.3,3.5,1.34,3.44,1.36,3.76,1.32,3.43,-2.25,1.91,1.99,1.94,1.98,1.99,2.07,1.9,1.99
1,E0,10/08/2019,12:30,West Ham,Man City,0,5,A,0,1,A,M Dean,5,14,3,9,6,13,1,1,2,2,0,0,12.0,6.5,1.22,11.5,5.75,1.26,11.0,6.1,1.25,11.68,6.53,1.26,13.0,6.0,1.24,12.0,6.5,1.25,13.0,6.75,1.29,11.84,6.28,1.25,1.44,2.75,1.49,2.77,1.51,2.77,1.48,2.65,1.75,2.0,1.9,2.02,1.9,2.02,1.92,1.99,1.89,12.0,7.0,1.25,11.0,6.0,1.26,11.0,6.1,1.25,11.11,6.68,1.27,11.0,6.5,1.24,12.0,6.5,1.25,13.0,7.0,1.29,11.14,6.46,1.26,1.4,3.0,1.43,3.03,1.5,3.22,1.41,2.91,1.75,1.95,1.95,1.96,1.97,2.07,1.98,1.97,1.92
2,E0,10/08/2019,15:00,Bournemouth,Sheffield United,1,1,D,0,0,D,K Friend,13,8,3,3,10,19,3,4,2,1,0,0,1.95,3.6,3.6,1.95,3.6,3.9,1.97,3.55,3.8,2.04,3.57,3.9,2.0,3.5,3.8,2.0,3.6,4.0,2.06,3.65,4.0,2.01,3.53,3.83,1.9,1.9,1.96,1.96,2.0,1.99,1.9,1.93,-0.5,2.01,1.89,2.04,1.88,2.04,1.91,2.0,1.88,1.95,3.7,4.2,1.95,3.6,3.9,1.97,3.55,3.85,1.98,3.67,4.06,1.95,3.6,3.9,2.0,3.6,4.0,2.03,3.7,4.2,1.98,3.58,3.96,1.9,1.9,1.94,1.97,1.97,1.98,1.91,1.92,-0.5,1.95,1.95,1.98,1.95,2.0,1.96,1.96,1.92
3,E0,10/08/2019,15:00,Burnley,Southampton,3,0,H,0,0,D,G Scott,10,11,4,3,6,12,2,7,0,0,0,0,2.62,3.2,2.75,2.65,3.2,2.75,2.65,3.2,2.75,2.71,3.31,2.81,2.7,3.2,2.75,2.7,3.3,2.8,2.8,3.33,2.85,2.68,3.22,2.78,2.1,1.72,2.17,1.77,2.2,1.78,2.12,1.73,0.0,1.92,1.98,1.93,2.0,1.94,2.0,1.91,1.98,2.7,3.25,2.9,2.65,3.1,2.85,2.6,3.2,2.85,2.71,3.19,2.9,2.62,3.2,2.8,2.7,3.25,2.9,2.72,3.26,2.95,2.65,3.18,2.88,2.1,1.72,2.19,1.76,2.25,1.78,2.17,1.71,0.0,1.87,2.03,1.89,2.03,1.9,2.07,1.86,2.02
4,E0,10/08/2019,15:00,Crystal Palace,Everton,0,0,D,0,0,D,J Moss,6,10,2,3,16,14,6,2,2,1,0,1,3.0,3.25,2.37,3.2,3.2,2.35,3.1,3.2,2.4,3.21,3.37,2.39,3.1,3.3,2.35,3.2,3.3,2.45,3.21,3.4,2.52,3.13,3.27,2.4,2.2,1.66,2.23,1.74,2.25,1.74,2.18,1.7,0.25,1.85,2.05,1.88,2.05,1.88,2.09,1.84,2.04,3.4,3.5,2.25,3.3,3.3,2.25,3.4,3.3,2.2,3.37,3.45,2.27,3.3,3.3,2.25,3.4,3.3,2.25,3.55,3.5,2.34,3.41,3.37,2.23,2.2,1.66,2.22,1.74,2.28,1.77,2.17,1.71,0.25,1.82,2.08,1.97,1.96,2.03,2.08,1.96,1.93
5,E0,10/08/2019,15:00,Watford,Brighton,0,3,A,0,1,A,C Pawson,11,5,3,3,15,11,5,2,0,1,0,0,1.9,3.4,4.0,1.9,3.4,4.33,1.93,3.4,4.25,1.98,3.44,4.37,1.95,3.4,4.2,1.95,3.5,4.33,2.0,3.5,4.6,1.94,3.41,4.26,2.1,1.72,2.19,1.76,2.24,1.76,2.16,1.71,-0.5,1.95,1.95,1.98,1.95,1.98,1.98,1.94,1.94,2.1,3.25,4.2,2.1,3.1,4.0,2.05,3.2,4.0,2.05,3.38,4.12,2.05,3.25,4.0,2.15,3.3,3.9,2.15,3.38,4.2,2.07,3.27,4.04,2.1,1.72,2.16,1.78,2.2,1.78,2.14,1.73,-0.5,2.04,1.86,2.05,1.88,2.12,1.91,2.05,1.84
6,E0,10/08/2019,17:30,Tottenham,Aston Villa,3,1,H,0,1,A,C Kavanagh,31,7,7,4,13,9,14,0,1,0,0,0,1.3,5.25,10.0,1.3,5.5,10.0,1.3,5.5,9.6,1.3,5.84,10.96,1.29,5.5,10.0,1.3,5.5,12.0,1.33,5.95,12.0,1.3,5.53,10.51,1.66,2.2,1.64,2.4,1.7,2.4,1.65,2.26,-1.5,1.97,1.93,1.99,1.93,2.0,2.0,1.93,1.94,1.36,5.5,9.0,1.35,5.0,9.0,1.3,5.5,9.6,1.39,5.35,8.42,1.35,5.25,8.0,1.4,5.2,9.0,1.4,5.7,10.0,1.36,5.29,8.82,1.57,2.37,1.58,2.52,1.65,2.55,1.58,2.4,-1.5,2.1,1.7,2.18,1.77,2.21,1.87,2.08,1.8
7,E0,11/08/2019,14:00,Leicester,Wolves,0,0,D,0,0,D,A Marriner,15,8,1,2,3,13,12,3,0,2,0,0,2.2,3.2,3.4,2.25,3.3,3.3,2.2,3.25,3.45,2.21,3.34,3.66,2.2,3.25,3.5,2.25,3.3,3.6,2.29,3.38,3.66,2.22,3.28,3.48,2.2,1.66,2.23,1.74,2.25,1.74,2.17,1.7,-0.25,1.9,2.0,1.9,2.04,1.95,2.04,1.91,1.98,2.4,3.25,3.3,2.35,3.2,3.3,2.35,3.15,3.2,2.5,3.12,3.3,2.35,3.1,3.3,2.45,3.2,3.3,2.55,3.25,3.58,2.41,3.14,3.29,2.3,1.61,2.45,1.63,2.45,1.71,2.33,1.62,-0.25,2.07,1.83,2.11,1.83,2.12,1.98,2.06,1.84
8,E0,11/08/2019,14:00,Newcastle,Arsenal,0,1,A,0,0,D,M Atkinson,9,8,2,2,12,7,5,3,1,3,0,0,4.5,3.75,1.72,4.5,3.75,1.78,4.4,3.85,1.77,4.58,3.93,1.81,4.5,3.75,1.78,4.6,3.9,1.8,4.7,4.0,1.83,4.49,3.82,1.79,1.8,2.0,1.83,2.1,1.83,2.14,1.77,2.07,0.75,1.85,2.05,1.86,2.07,1.88,2.08,1.85,2.03,3.4,3.6,2.2,3.3,3.5,2.2,3.25,3.5,2.2,3.36,3.56,2.25,3.5,3.4,2.15,3.4,3.5,2.25,3.76,3.65,2.25,3.36,3.51,2.2,1.8,2.0,1.83,2.09,1.85,2.17,1.79,2.05,0.25,1.99,1.91,1.99,1.95,2.17,1.97,2.0,1.89
9,E0,11/08/2019,16:30,Man United,Chelsea,4,0,H,1,0,H,A Taylor,11,18,5,7,15,13,3,5,3,4,0,0,2.1,3.3,3.5,2.15,3.3,3.5,2.15,3.35,3.4,2.21,3.37,3.63,2.15,3.3,3.5,2.25,3.3,3.5,2.28,3.43,3.63,2.19,3.32,3.49,2.0,1.8,2.05,1.87,2.1,1.87,2.01,1.83,-0.25,1.9,2.0,1.9,2.04,1.92,2.04,1.89,2.0,2.05,3.5,4.0,2.1,3.3,3.8,2.05,3.3,3.85,2.04,3.44,4.14,2.0,3.4,4.0,2.05,3.4,4.1,2.2,3.5,4.4,2.05,3.36,3.99,2.0,1.8,2.05,1.88,2.07,2.04,1.99,1.84,-0.5,2.02,1.88,2.04,1.9,2.1,1.91,2.04,1.85


Similarly, if we wanted to see the *bottom* 10 results we could use 

`
df.tail(10)
df[-10:]
`

the `-10` means we are going from 10 from the bottom - the last ten results

In [6]:
df.tail(10)

Unnamed: 0,Div,Date,Time,HomeTeam,AwayTeam,FTHG,FTAG,FTR,HTHG,HTAG,HTR,Referee,HS,AS,HST,AST,HF,AF,HC,AC,HY,AY,HR,AR,B365H,B365D,B365A,BWH,BWD,BWA,IWH,IWD,IWA,PSH,PSD,PSA,WHH,WHD,WHA,VCH,VCD,VCA,MaxH,MaxD,MaxA,AvgH,AvgD,AvgA,B365>2.5,B365<2.5,P>2.5,P<2.5,Max>2.5,Max<2.5,Avg>2.5,Avg<2.5,AHh,B365AHH,B365AHA,PAHH,PAHA,MaxAHH,MaxAHA,AvgAHH,AvgAHA,B365CH,B365CD,B365CA,BWCH,BWCD,BWCA,IWCH,IWCD,IWCA,PSCH,PSCD,PSCA,WHCH,WHCD,WHCA,VCCH,VCCD,VCCA,MaxCH,MaxCD,MaxCA,AvgCH,AvgCD,AvgCA,B365C>2.5,B365C<2.5,PC>2.5,PC<2.5,MaxC>2.5,MaxC<2.5,AvgC>2.5,AvgC<2.5,AHCh,B365CAHH,B365CAHA,PCAHH,PCAHA,MaxCAHH,MaxCAHA,AvgCAHH,AvgCAHA
360,E0,18/07/2020,17:30,Norwich,Burnley,0,2,A,0,1,A,K Friend,6,23,2,8,9,16,6,8,0,0,2,0,3.2,3.4,2.25,3.4,3.3,2.2,3.3,3.15,2.25,3.4,3.43,2.25,3.3,3.3,2.25,3.4,3.3,2.25,3.52,3.43,2.36,3.32,3.33,2.25,2.0,1.8,2.07,1.85,2.12,1.88,2.05,1.78,0.25,1.97,1.93,1.98,1.94,1.98,1.95,1.94,1.93,3.1,3.4,2.3,3.3,3.4,2.2,3.15,3.15,2.35,3.19,3.45,2.34,3.2,3.4,2.25,3.3,3.3,2.3,3.37,3.6,2.4,3.2,3.39,2.3,2.1,1.72,2.14,1.79,2.21,1.83,2.13,1.73,0.25,1.9,2.0,1.9,2.02,2.01,2.04,1.9,1.98
361,E0,19/07/2020,14:00,Bournemouth,Southampton,0,2,A,0,1,A,C Pawson,11,16,3,8,12,11,12,6,2,2,0,0,2.4,3.6,2.8,2.3,3.7,2.9,2.45,3.4,2.85,2.43,3.78,2.82,2.4,3.6,2.8,2.38,3.5,2.9,2.5,3.82,3.01,2.39,3.65,2.84,1.72,2.1,1.74,2.22,1.79,2.25,1.71,2.16,-0.25,2.07,1.83,2.13,1.81,2.13,1.86,2.08,1.81,2.4,3.75,2.7,2.35,3.7,2.8,2.45,3.5,2.75,2.46,3.82,2.76,2.45,3.7,2.7,2.45,3.6,2.8,2.55,3.92,2.85,2.43,3.73,2.74,1.66,2.2,1.66,2.35,1.75,2.37,1.65,2.27,0.0,1.84,2.06,1.85,2.08,1.91,2.15,1.83,2.06
362,E0,19/07/2020,16:00,Tottenham,Leicester,3,0,H,3,0,H,A Taylor,7,24,3,6,15,10,4,13,2,1,0,0,2.25,3.3,3.25,2.35,3.3,3.1,2.35,3.25,3.05,2.33,3.53,3.15,2.3,3.4,3.1,2.3,3.3,3.2,2.41,3.58,3.25,2.32,3.42,3.11,1.9,1.9,2.01,1.91,2.04,1.95,1.95,1.87,-0.25,2.01,1.89,2.02,1.91,2.05,1.93,2.01,1.87,2.25,3.4,3.2,2.4,3.3,3.0,2.35,3.2,3.1,2.36,3.39,3.22,2.25,3.3,3.3,2.38,3.3,3.13,2.44,3.5,3.51,2.33,3.36,3.14,2.0,1.8,2.07,1.85,2.11,1.92,1.99,1.84,-0.25,2.03,1.87,2.03,1.89,2.07,1.9,2.02,1.86
363,E0,20/07/2020,18:00,Brighton,Newcastle,0,0,D,0,0,D,S Hooper,11,12,3,1,13,12,9,7,4,2,0,0,1.9,3.4,4.33,1.95,3.4,4.2,1.95,3.25,4.1,1.97,3.48,4.3,1.91,3.5,4.2,1.91,3.4,4.33,2.0,3.62,4.5,1.93,3.42,4.23,2.1,1.72,2.14,1.79,2.2,1.82,2.12,1.73,-0.5,1.95,1.95,1.97,1.96,1.98,1.99,1.93,1.93,1.95,3.5,4.0,1.95,3.5,4.0,2.0,3.3,3.85,2.01,3.55,4.0,2.0,3.5,3.9,2.0,3.5,3.9,2.05,3.67,4.2,1.99,3.49,3.92,1.9,1.9,1.93,1.99,1.98,2.03,1.9,1.92,-0.5,2.04,1.89,2.01,1.91,2.06,1.95,2.0,1.88
364,E0,20/07/2020,18:00,Sheffield United,Everton,0,1,A,0,0,D,S Attwell,8,5,0,2,11,19,7,1,1,3,0,0,2.2,3.25,3.5,2.15,3.3,3.5,2.2,3.1,3.55,2.18,3.42,3.57,2.15,3.3,3.5,2.15,3.2,3.7,2.27,3.45,3.7,2.17,3.29,3.52,2.3,1.61,2.37,1.66,2.43,1.7,2.32,1.62,-0.25,1.88,2.02,1.88,2.04,1.93,2.07,1.87,2.01,1.95,3.3,4.2,1.91,3.4,4.25,1.97,3.2,4.1,2.0,3.41,4.25,1.95,3.3,4.2,1.95,3.3,4.33,2.05,3.53,4.4,1.98,3.35,4.15,2.2,1.66,2.31,1.68,2.43,1.69,2.28,1.64,-0.5,2.0,1.93,2.0,1.93,2.05,1.95,1.99,1.89
365,E0,20/07/2020,20:15,Wolves,Crystal Palace,2,0,H,1,0,H,P Bankes,11,7,5,3,11,15,5,2,2,1,0,0,1.5,4.0,7.5,1.5,3.9,7.75,1.53,3.8,7.5,1.51,4.12,8.08,1.47,4.0,8.0,1.5,3.9,8.0,1.54,4.35,8.3,1.5,4.03,7.72,2.3,1.61,2.33,1.68,2.39,1.71,2.29,1.64,-1.0,1.94,1.96,1.97,1.96,1.99,2.02,1.9,1.97,1.4,4.2,9.5,1.44,4.25,8.25,1.45,4.0,8.25,1.46,4.31,8.7,1.42,4.2,9.0,1.45,4.2,9.0,1.49,4.55,10.3,1.44,4.24,8.68,2.2,1.66,2.21,1.75,2.27,1.82,2.18,1.69,-1.0,1.79,2.11,1.83,2.1,1.88,2.15,1.79,2.11
366,E0,21/07/2020,18:00,Watford,Man City,0,4,A,0,2,A,M Oliver,2,26,0,10,14,11,0,8,2,0,0,0,8.5,5.5,1.33,9.0,5.75,1.3,8.0,5.25,1.35,8.57,5.73,1.34,9.5,5.8,1.3,10.0,5.75,1.3,10.0,6.15,1.37,8.62,5.6,1.33,1.5,2.62,1.5,2.74,1.5,2.79,1.48,2.67,1.5,1.97,1.96,1.94,1.98,2.0,1.99,1.94,1.94,9.0,5.75,1.3,7.25,5.75,1.35,9.0,5.75,1.33,9.2,6.37,1.3,9.5,6.0,1.29,9.5,6.0,1.3,10.5,6.6,1.33,9.13,6.01,1.3,1.5,2.62,1.49,2.79,1.5,2.82,1.47,2.69,1.5,2.05,1.88,2.06,1.85,2.09,1.91,2.03,1.85
367,E0,21/07/2020,20:15,Aston Villa,Arsenal,1,0,H,1,0,H,C Kavanagh,8,7,3,0,13,19,8,9,2,4,0,0,3.3,3.6,2.1,3.25,3.9,2.05,3.25,3.55,2.15,3.24,3.94,2.14,3.2,3.8,2.1,3.3,3.7,2.15,3.4,3.98,2.2,3.24,3.78,2.11,1.66,2.2,1.7,2.28,1.74,2.37,1.66,2.25,0.25,2.04,1.89,2.04,1.88,2.09,1.92,2.02,1.85,3.1,3.75,2.2,3.1,3.6,2.2,3.1,3.65,2.25,3.14,3.86,2.21,3.2,3.7,2.15,3.13,3.6,2.25,3.35,3.88,2.26,3.13,3.7,2.2,1.72,2.1,1.76,2.18,1.79,2.25,1.73,2.13,0.25,1.96,1.97,1.98,1.94,1.99,2.0,1.95,1.94
368,E0,22/07/2020,18:00,Man United,West Ham,1,1,D,0,1,A,P Tierney,11,12,4,3,10,9,2,3,3,1,0,0,1.22,6.5,11.0,1.25,6.25,11.5,1.27,6.0,11.0,1.26,6.37,12.18,1.24,6.5,12.0,1.22,6.5,13.0,1.28,7.0,13.5,1.24,6.38,11.56,1.5,2.62,1.48,2.84,1.5,2.84,1.47,2.68,-1.75,1.95,1.98,1.97,1.95,1.97,2.15,1.92,1.96,1.22,6.5,12.0,1.25,6.5,10.5,1.25,6.0,9.0,1.27,6.55,10.55,1.25,6.0,12.0,1.29,6.0,11.5,1.3,7.0,13.0,1.26,6.4,10.94,1.44,2.75,1.45,2.92,1.47,2.93,1.44,2.81,-1.75,2.05,1.85,2.04,1.88,2.09,1.97,1.98,1.89
369,E0,22/07/2020,20:15,Liverpool,Chelsea,5,3,H,3,1,H,A Marriner,10,10,7,5,8,11,6,0,1,0,0,0,2.0,3.75,3.5,1.95,3.8,3.6,2.1,3.5,3.4,2.08,3.71,3.59,2.05,3.7,3.4,2.05,3.75,3.4,2.13,3.95,3.7,2.04,3.7,3.51,1.57,2.37,1.65,2.39,1.66,2.41,1.6,2.34,-0.5,2.09,1.84,2.08,1.85,2.13,1.87,2.05,1.82,1.9,3.8,3.8,1.9,3.8,3.8,2.0,3.55,3.75,1.93,3.8,4.09,1.88,3.7,4.0,1.87,3.75,4.1,2.0,3.9,4.25,1.91,3.73,3.97,1.72,2.1,1.78,2.16,1.81,2.23,1.72,2.13,-0.5,1.91,1.99,1.93,2.01,1.95,2.04,1.91,1.96



<div class="back_to_top" style="float:right;">

<a href="#contents" style="color:red;">[Back to Top &uarr;]</a>

</div>




<a id="dtypes"></a>
### Data Types (dtypes)

When doing any analysis, it's important to understand the kind of data you are working with. Pandas uses 7 general data types:


- **object** - (most often strings (text) or mixed numeric and non-numeric data e.g 10
- **int64** - integers - whole numbers
- **float64** - floats or floating points - numbers with a decimal point 5.3 or 10.0
- **bool** - boolean - True of False
- **datetime64** - datetime - date and time values
- **timedelta64** - timedelta - difference between two datetimes
- **category** - list of text

However there are specific datatypes within those. The dtypes we will be looking at in this walkthrough are **objects**, **ints**, and **floats**, but we will also look at booleans and how they can be used. Usually we could just use `df.dtypes` to see the data type of each columns, however we have many columns in our dataframe. Instead, let's just look at the first ten. We'll do this in a short for loop.

In [7]:
for col in df.columns[:10]:
    print("{}: {}".format(col,df[col].dtype))

Div: object
Date: object
Time: object
HomeTeam: object
AwayTeam: object
FTHG: int64
FTAG: int64
FTR: object
HTHG: int64
HTAG: int64


We can see that our columns `Div, Date, Time, HomeTeam, AwayTeam, FTR` are all object dtypes. Why does this matter? If we wanted to make sure our data is sorted chronologically we could use `sort_values`, but we have a problem...

In [8]:
df.sort_values("Date")

Unnamed: 0,Div,Date,Time,HomeTeam,AwayTeam,FTHG,FTAG,FTR,HTHG,HTAG,HTR,Referee,HS,AS,HST,AST,HF,AF,HC,AC,HY,AY,HR,AR,B365H,B365D,B365A,BWH,BWD,BWA,IWH,IWD,IWA,PSH,PSD,PSA,WHH,WHD,WHA,VCH,VCD,VCA,MaxH,MaxD,MaxA,AvgH,AvgD,AvgA,B365>2.5,B365<2.5,P>2.5,P<2.5,Max>2.5,Max<2.5,Avg>2.5,Avg<2.5,AHh,B365AHH,B365AHA,PAHH,PAHA,MaxAHH,MaxAHA,AvgAHH,AvgAHA,B365CH,B365CD,B365CA,BWCH,BWCD,BWCA,IWCH,IWCD,IWCA,PSCH,PSCD,PSCA,WHCH,WHCD,WHCA,VCCH,VCCD,VCCA,MaxCH,MaxCD,MaxCA,AvgCH,AvgCD,AvgCA,B365C>2.5,B365C<2.5,PC>2.5,PC<2.5,MaxC>2.5,MaxC<2.5,AvgC>2.5,AvgC<2.5,AHCh,B365CAHH,B365CAHA,PCAHH,PCAHA,MaxCAHH,MaxCAHA,AvgCAHH,AvgCAHA
202,E0,01/01/2020,15:00,Southampton,Tottenham,1,0,H,1,0,H,M Dean,12,11,3,5,21,8,6,9,3,4,0,0,3.30,3.50,2.10,3.40,3.60,2.10,3.25,3.55,2.15,3.41,3.54,2.20,3.30,3.50,2.15,3.25,3.60,2.15,3.51,3.88,2.25,3.34,3.54,2.16,1.66,2.20,1.69,2.30,1.70,2.33,1.66,2.23,0.25,2.00,1.90,2.02,1.91,2.04,1.93,1.99,1.87,3.50,3.90,1.95,3.50,3.80,2.05,3.35,3.60,2.10,3.51,3.91,2.05,3.70,3.75,1.95,3.50,3.75,2.05,3.94,4.00,2.14,3.51,3.84,2.02,1.66,2.20,1.70,2.28,1.72,2.31,1.67,2.22,0.50,1.87,2.06,1.88,2.05,1.93,2.09,1.86,2.02
203,E0,01/01/2020,15:00,Watford,Wolves,2,1,H,1,0,H,A Madley,9,16,3,4,12,6,4,7,3,1,1,0,3.00,3.40,2.30,3.20,3.25,2.30,3.10,3.35,2.30,3.25,3.46,2.31,3.20,3.30,2.30,3.20,3.30,2.30,3.30,3.47,2.42,3.16,3.37,2.32,2.00,1.80,2.01,1.91,2.08,1.91,2.00,1.82,0.25,1.89,2.01,1.93,2.00,1.93,2.05,1.87,1.99,3.20,3.30,2.30,3.30,3.30,2.30,3.20,3.20,2.30,3.17,3.27,2.45,3.30,3.20,2.30,3.20,3.25,2.40,3.58,3.46,2.51,3.19,3.26,2.38,2.20,1.66,2.31,1.68,2.35,1.87,2.21,1.68,0.25,1.82,2.11,1.83,2.09,1.90,2.17,1.83,2.05
204,E0,01/01/2020,17:30,Man City,Everton,2,1,H,0,0,D,A Marriner,16,7,7,2,11,11,3,6,0,4,0,0,1.25,6.50,10.00,1.26,6.25,10.50,1.28,6.10,9.50,1.28,6.15,11.04,1.27,6.00,11.00,1.25,6.00,12.00,1.30,6.63,12.00,1.27,6.19,10.49,1.36,3.20,1.40,3.13,1.41,3.20,1.37,3.09,-1.75,1.97,1.93,2.01,1.90,2.01,1.97,1.95,1.91,1.28,6.00,9.00,1.34,5.50,9.25,1.33,5.50,8.50,1.33,5.78,9.31,1.29,5.80,10.00,1.29,6.00,10.50,1.35,6.20,10.50,1.32,5.77,9.14,1.44,2.75,1.47,2.89,1.48,3.00,1.45,2.76,-1.50,1.88,2.05,1.92,2.01,1.95,2.14,1.88,2.00
205,E0,01/01/2020,17:30,Norwich,Crystal Palace,1,1,D,1,0,H,J Moss,15,12,4,3,12,9,2,5,5,0,0,0,2.50,3.40,2.75,2.55,3.50,2.70,2.50,3.40,2.75,2.56,3.51,2.83,2.55,3.40,2.75,2.55,3.40,2.75,2.60,3.57,2.88,2.53,3.46,2.77,1.90,1.90,1.95,1.97,1.95,2.07,1.88,1.94,0.00,1.86,2.04,1.86,2.06,1.88,2.06,1.84,2.02,2.35,3.40,3.00,2.35,3.40,3.10,2.35,3.30,3.05,2.37,3.45,3.14,2.45,3.30,3.00,2.40,3.30,3.13,2.58,3.50,3.25,2.36,3.37,3.10,2.00,1.80,2.10,1.82,2.27,1.82,2.09,1.76,-0.25,2.08,1.85,2.05,1.88,2.19,1.92,2.04,1.84
206,E0,01/01/2020,17:30,West Ham,Bournemouth,4,0,H,3,0,H,G Scott,14,3,7,2,3,12,9,2,1,2,0,0,1.90,3.75,3.80,1.95,3.80,3.60,1.95,3.80,3.65,1.97,3.99,3.72,1.91,3.80,3.75,1.90,3.75,3.90,2.02,4.00,3.90,1.95,3.84,3.68,1.66,2.20,1.71,2.26,1.72,2.31,1.67,2.22,-0.50,1.94,1.96,1.97,1.96,1.98,1.98,1.94,1.92,1.86,3.70,4.00,1.91,3.60,4.20,1.90,3.60,4.10,1.97,3.61,4.12,1.88,3.60,4.20,1.95,3.50,4.10,2.02,3.85,4.54,1.92,3.63,4.10,1.90,1.90,1.92,2.00,1.99,2.03,1.88,1.94,-0.50,1.97,1.96,1.97,1.96,2.00,2.00,1.93,1.94
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
33,E0,31/08/2019,15:00,Leicester,Bournemouth,3,1,H,2,1,H,P Bankes,13,8,5,2,9,11,4,5,1,3,0,0,1.70,4.00,4.75,1.70,4.00,4.75,1.73,4.05,4.50,1.71,4.12,4.95,1.70,3.90,4.80,1.73,4.10,5.00,1.77,4.12,5.00,1.72,4.02,4.81,1.66,2.20,1.67,2.33,1.70,2.33,1.65,2.26,-0.75,1.93,2.00,1.93,2.00,1.95,2.01,1.91,1.98,1.61,4.00,5.25,1.65,4.20,5.00,1.75,3.80,4.60,1.68,4.03,5.40,1.65,3.90,5.25,1.70,4.00,5.40,1.75,4.20,5.90,1.67,4.03,5.16,1.72,2.10,1.81,2.11,1.82,2.30,1.72,2.14,-0.75,1.87,2.06,1.88,2.04,1.90,2.08,1.86,2.03
34,E0,31/08/2019,15:00,Man City,Brighton,4,0,H,2,0,H,J Moss,15,6,6,2,10,6,8,1,1,1,0,0,1.08,10.00,26.00,1.07,11.50,31.00,1.08,10.50,29.00,1.08,13.01,34.71,1.07,11.00,34.00,1.10,12.00,34.00,1.10,13.01,36.00,1.09,11.73,30.92,1.30,3.50,1.30,3.82,1.32,3.82,1.30,3.53,-2.75,1.97,1.96,1.95,1.97,1.98,1.98,1.95,1.94,1.08,11.00,26.00,1.09,11.00,29.00,1.10,9.10,28.00,1.09,13.06,35.17,1.07,11.00,34.00,1.09,11.50,36.00,1.10,13.06,38.00,1.09,11.50,31.01,1.33,3.40,1.35,3.38,1.36,3.55,1.33,3.31,-2.50,1.98,1.95,1.96,1.96,2.00,2.05,1.94,1.95
35,E0,31/08/2019,15:00,Newcastle,Watford,1,1,D,1,1,D,G Scott,13,13,5,3,5,11,6,5,2,3,0,0,2.50,3.25,2.87,2.50,3.30,2.85,2.60,3.25,2.80,2.73,3.22,2.86,2.62,3.20,2.80,2.70,3.30,2.88,2.73,3.32,3.00,2.63,3.24,2.86,2.00,1.80,2.12,1.81,2.20,1.83,2.08,1.77,0.00,1.90,2.03,1.92,2.01,1.92,2.04,1.88,2.01,2.55,3.10,2.90,2.60,3.20,2.90,2.45,3.25,2.95,2.53,3.21,3.13,2.50,3.20,3.00,2.55,3.25,3.10,2.63,3.28,3.28,2.53,3.19,3.01,2.10,1.72,2.14,1.79,2.20,1.81,2.09,1.76,-0.25,2.07,1.72,2.15,1.79,2.20,1.82,2.15,1.77
37,E0,31/08/2019,17:30,Burnley,Liverpool,0,3,A,0,2,A,C Kavanagh,7,15,2,7,10,16,6,4,0,0,0,0,9.50,5.50,1.30,9.75,5.50,1.30,8.40,5.50,1.33,10.24,5.37,1.34,10.00,5.25,1.32,11.00,5.50,1.33,11.00,5.70,1.37,9.45,5.43,1.33,1.61,2.30,1.65,2.39,1.65,2.40,1.61,2.33,1.50,1.90,2.03,1.88,2.04,1.92,2.05,1.89,2.00,8.50,5.25,1.33,8.75,5.25,1.36,8.40,5.50,1.33,9.22,5.77,1.34,9.50,5.25,1.32,10.00,5.40,1.36,10.00,6.05,1.38,8.83,5.43,1.35,1.53,2.50,1.54,2.65,1.60,2.66,1.55,2.49,1.50,1.90,2.03,1.92,2.02,1.96,2.05,1.89,2.00


Our `Date` column is stored as a string, not a datetime, so pandas sorts the column alphanumerically. Before we fix this, let's sort a different column to explore `sort_values` further.

In [9]:
df.sort_values("HST").head(10) # HST is Home Shots on Target. head(10) used to limit output to first 10 rows

Unnamed: 0,Div,Date,Time,HomeTeam,AwayTeam,FTHG,FTAG,FTR,HTHG,HTAG,HTR,Referee,HS,AS,HST,AST,HF,AF,HC,AC,HY,AY,HR,AR,B365H,B365D,B365A,BWH,BWD,BWA,IWH,IWD,IWA,PSH,PSD,PSA,WHH,WHD,WHA,VCH,VCD,VCA,MaxH,MaxD,MaxA,AvgH,AvgD,AvgA,B365>2.5,B365<2.5,P>2.5,P<2.5,Max>2.5,Max<2.5,Avg>2.5,Avg<2.5,AHh,B365AHH,B365AHA,PAHH,PAHA,MaxAHH,MaxAHA,AvgAHH,AvgAHA,B365CH,B365CD,B365CA,BWCH,BWCD,BWCA,IWCH,IWCD,IWCA,PSCH,PSCD,PSCA,WHCH,WHCD,WHCA,VCCH,VCCD,VCCA,MaxCH,MaxCD,MaxCA,AvgCH,AvgCD,AvgCA,B365C>2.5,B365C<2.5,PC>2.5,PC<2.5,MaxC>2.5,MaxC<2.5,AvgC>2.5,AvgC<2.5,AHCh,B365CAHH,B365CAHA,PCAHH,PCAHA,MaxCAHH,MaxCAHA,AvgCAHH,AvgCAHA
344,E0,11/07/2020,20:00,Brighton,Man City,0,5,A,0,2,A,G Scott,3,26,0,8,7,3,2,7,1,0,0,0,10.0,5.5,1.28,10.0,5.5,1.3,9.6,5.6,1.3,10.19,5.83,1.31,11.0,5.5,1.29,12.0,5.5,1.29,12.0,6.05,1.35,9.95,5.62,1.31,1.53,2.5,1.54,2.64,1.57,2.66,1.52,2.55,1.5,2.04,1.86,2.08,1.85,2.08,1.94,2.03,1.86,11.0,7.0,1.22,12.0,7.0,1.22,8.4,5.7,1.32,12.68,6.77,1.24,13.0,6.5,1.22,13.0,7.0,1.22,16.25,7.5,1.32,12.13,6.65,1.24,1.44,2.75,1.47,2.85,1.5,2.96,1.45,2.74,1.75,2.05,1.85,2.07,1.85,2.16,1.93,2.03,1.85
170,E0,21/12/2019,12:30,Everton,Arsenal,0,0,D,0,0,D,K Friend,9,6,0,2,10,11,5,4,2,3,0,0,2.3,3.8,2.87,2.3,3.75,2.85,2.3,3.65,2.85,2.34,3.8,2.94,2.3,3.7,2.9,2.3,3.75,3.0,2.4,3.88,3.0,2.33,3.75,2.9,1.57,2.37,1.6,2.48,1.61,2.55,1.57,2.4,-0.25,2.05,1.88,2.05,1.87,2.09,1.91,2.03,1.84,2.05,3.8,3.25,2.1,3.8,3.3,2.1,3.8,3.2,2.14,3.84,3.34,2.2,3.8,3.25,2.1,3.8,3.3,2.22,3.95,3.42,2.12,3.82,3.25,1.53,2.5,1.56,2.6,1.56,2.79,1.52,2.53,-0.25,1.87,2.06,1.88,2.06,1.94,2.1,1.85,2.03
311,E0,28/06/2020,16:30,Watford,Southampton,1,3,A,0,1,A,M Oliver,8,11,0,7,18,9,5,3,3,0,0,0,2.5,3.3,2.87,2.5,3.3,2.9,2.5,3.3,2.9,2.51,3.39,2.96,2.45,3.3,2.9,2.5,3.3,2.9,2.6,3.42,3.1,2.49,3.32,2.92,2.0,1.8,2.06,1.86,2.1,1.91,2.0,1.83,0.0,1.8,2.1,1.81,2.13,1.86,2.16,1.8,2.09,2.4,3.25,3.1,2.35,3.3,3.1,2.35,3.2,3.2,2.41,3.28,3.31,2.35,3.2,3.2,2.38,3.2,3.25,2.49,3.34,3.4,2.38,3.23,3.2,2.2,1.66,2.28,1.71,2.34,1.75,2.21,1.68,-0.25,2.04,1.86,2.05,1.88,2.08,1.9,2.03,1.85
123,E0,23/11/2019,15:00,Brighton,Leicester,0,2,A,0,0,D,M Dean,7,19,0,9,9,1,6,7,2,0,0,0,3.75,3.5,2.0,3.7,3.6,2.0,3.6,3.5,2.05,3.73,3.58,2.09,3.7,3.4,2.05,3.75,3.5,2.05,3.85,3.7,2.1,3.69,3.53,2.04,1.9,1.9,1.93,1.98,1.95,1.99,1.89,1.93,0.25,2.07,1.72,2.13,1.8,2.17,1.8,2.13,1.76,3.75,3.6,1.95,3.75,3.6,2.0,3.65,3.55,2.0,3.9,3.56,2.05,3.75,3.5,2.0,3.8,3.6,2.0,4.14,3.71,2.08,3.8,3.57,2.01,1.9,1.9,1.93,2.0,1.95,2.01,1.88,1.95,0.5,1.87,2.03,1.88,2.05,1.91,2.06,1.86,2.02
187,E0,26/12/2019,20:00,Leicester,Liverpool,0,4,A,0,1,A,M Oliver,3,15,0,6,5,7,2,8,1,1,0,0,3.4,3.75,2.05,3.4,3.7,2.05,3.35,3.7,2.05,3.39,3.81,2.12,3.4,3.6,2.1,3.4,3.6,2.1,3.5,3.95,2.16,3.38,3.72,2.08,1.66,2.2,1.66,2.36,1.7,2.38,1.64,2.27,0.25,2.02,1.77,2.07,1.85,2.1,1.87,2.05,1.81,3.4,4.0,1.95,3.5,4.0,1.95,3.55,3.7,2.0,3.61,4.17,1.97,3.6,4.0,1.91,3.7,3.9,1.95,3.84,4.2,2.01,3.53,4.03,1.96,1.53,2.5,1.56,2.58,1.6,2.63,1.53,2.5,0.5,1.93,1.97,1.97,1.97,1.99,2.01,1.91,1.96
151,E0,07/12/2019,15:00,Bournemouth,Liverpool,0,3,A,0,2,A,C Kavanagh,3,21,0,9,5,6,1,3,0,1,0,0,7.0,5.25,1.4,7.25,5.0,1.4,6.7,4.9,1.43,7.01,5.22,1.44,7.5,5.0,1.4,8.0,5.0,1.4,8.0,5.3,1.47,7.01,5.03,1.42,1.5,2.62,1.51,2.74,1.54,2.74,1.5,2.59,1.25,1.98,1.95,1.98,1.95,1.98,1.96,1.96,1.92,6.0,4.5,1.5,6.25,4.6,1.5,7.0,4.8,1.43,6.58,4.63,1.51,7.0,4.5,1.47,6.5,4.6,1.5,7.0,5.0,1.55,6.39,4.63,1.5,1.57,2.37,1.63,2.43,1.63,2.5,1.59,2.38,1.0,2.05,1.88,2.06,1.87,2.17,1.9,2.07,1.82
231,E0,21/01/2020,19:30,Crystal Palace,Southampton,0,2,A,0,1,A,A Marriner,6,15,0,6,12,13,0,8,1,2,0,0,2.7,3.25,2.7,2.75,3.3,2.7,2.7,3.15,2.7,2.73,3.29,2.77,2.7,3.2,2.7,2.75,3.13,2.8,2.85,3.35,2.84,2.73,3.26,2.72,2.2,1.66,2.28,1.7,2.31,1.72,2.21,1.68,0.0,1.96,1.97,1.95,1.98,1.99,1.99,1.93,1.93,2.75,3.2,2.7,2.65,3.25,2.8,2.8,3.1,2.7,2.82,3.24,2.72,2.7,3.1,2.8,2.8,3.13,2.75,2.9,3.26,3.04,2.78,3.17,2.73,2.37,1.57,2.46,1.61,2.47,1.65,2.34,1.61,0.0,2.0,1.93,2.0,1.93,2.02,2.06,1.96,1.91
366,E0,21/07/2020,18:00,Watford,Man City,0,4,A,0,2,A,M Oliver,2,26,0,10,14,11,0,8,2,0,0,0,8.5,5.5,1.33,9.0,5.75,1.3,8.0,5.25,1.35,8.57,5.73,1.34,9.5,5.8,1.3,10.0,5.75,1.3,10.0,6.15,1.37,8.62,5.6,1.33,1.5,2.62,1.5,2.74,1.5,2.79,1.48,2.67,1.5,1.97,1.96,1.94,1.98,2.0,1.99,1.94,1.94,9.0,5.75,1.3,7.25,5.75,1.35,9.0,5.75,1.33,9.2,6.37,1.3,9.5,6.0,1.29,9.5,6.0,1.3,10.5,6.6,1.33,9.13,6.01,1.3,1.5,2.62,1.49,2.79,1.5,2.82,1.47,2.69,1.5,2.05,1.88,2.06,1.85,2.09,1.91,2.03,1.85
172,E0,21/12/2019,15:00,Bournemouth,Burnley,0,1,A,0,0,D,M Atkinson,3,2,0,1,13,21,4,5,2,4,0,0,2.4,3.4,2.9,2.4,3.4,2.95,2.4,3.35,2.95,2.5,3.36,3.01,2.45,3.4,2.9,2.4,3.4,3.0,2.55,3.5,3.05,2.43,3.38,2.97,1.9,1.9,1.99,1.93,2.05,1.98,1.92,1.9,-0.25,2.13,1.81,2.14,1.79,2.14,1.85,2.09,1.8,2.37,3.3,3.0,2.4,3.3,3.1,2.45,3.3,2.95,2.37,3.35,3.22,2.4,3.25,3.1,2.4,3.3,3.13,2.51,3.4,3.22,2.4,3.31,3.08,2.0,1.8,2.07,1.85,2.12,1.88,2.02,1.81,-0.25,2.05,1.88,2.04,1.88,2.09,1.9,2.04,1.84
364,E0,20/07/2020,18:00,Sheffield United,Everton,0,1,A,0,0,D,S Attwell,8,5,0,2,11,19,7,1,1,3,0,0,2.2,3.25,3.5,2.15,3.3,3.5,2.2,3.1,3.55,2.18,3.42,3.57,2.15,3.3,3.5,2.15,3.2,3.7,2.27,3.45,3.7,2.17,3.29,3.52,2.3,1.61,2.37,1.66,2.43,1.7,2.32,1.62,-0.25,1.88,2.02,1.88,2.04,1.93,2.07,1.87,2.01,1.95,3.3,4.2,1.91,3.4,4.25,1.97,3.2,4.1,2.0,3.41,4.25,1.95,3.3,4.2,1.95,3.3,4.33,2.05,3.53,4.4,1.98,3.35,4.15,2.2,1.66,2.31,1.68,2.43,1.69,2.28,1.64,-0.5,2.0,1.93,2.0,1.93,2.05,1.95,1.99,1.89


hmmm not very interesting to look at the fewest number of shots on target. Let's sort again to show the **most** shots on target. We do this by adding `ascending=False` argument to sort_values:


As a side note, we can peak what arguments we can pass to functions and methods in jupyter notebooks by pressing SHIFT + TAB
<br />
<br />
<div>
  <img src="./../public/imgs/peak-args.png" />
</div>

In [10]:
df.sort_values("HST", ascending=False).head(10) # again just looking at first 10 rows

Unnamed: 0,Div,Date,Time,HomeTeam,AwayTeam,FTHG,FTAG,FTR,HTHG,HTAG,HTR,Referee,HS,AS,HST,AST,HF,AF,HC,AC,HY,AY,HR,AR,B365H,B365D,B365A,BWH,BWD,BWA,IWH,IWD,IWA,PSH,PSD,PSA,WHH,WHD,WHA,VCH,VCD,VCA,MaxH,MaxD,MaxA,AvgH,AvgD,AvgA,B365>2.5,B365<2.5,P>2.5,P<2.5,Max>2.5,Max<2.5,Avg>2.5,Avg<2.5,AHh,B365AHH,B365AHA,PAHH,PAHA,MaxAHH,MaxAHA,AvgAHH,AvgAHA,B365CH,B365CD,B365CA,BWCH,BWCD,BWCA,IWCH,IWCD,IWCA,PSCH,PSCD,PSCA,WHCH,WHCD,WHCA,VCCH,VCCD,VCCA,MaxCH,MaxCD,MaxCA,AvgCH,AvgCD,AvgCA,B365C>2.5,B365C<2.5,PC>2.5,PC<2.5,MaxC>2.5,MaxC<2.5,AvgC>2.5,AvgC<2.5,AHCh,B365CAHH,B365CAHA,PCAHH,PCAHA,MaxCAHH,MaxCAHA,AvgCAHH,AvgCAHA
98,E0,27/10/2019,16:30,Liverpool,Tottenham,2,1,H,0,1,A,A Taylor,21,11,13,4,9,11,8,3,3,3,0,0,1.5,4.33,6.5,1.5,4.5,6.25,1.55,4.4,5.8,1.55,4.48,6.22,1.52,4.4,6.0,1.53,4.4,6.0,1.57,4.65,6.5,1.54,4.45,6.11,1.53,2.5,1.57,2.55,1.57,2.6,1.53,2.51,-1.0,1.92,2.01,1.93,2.0,1.94,2.06,1.87,2.0,1.45,4.75,6.5,1.48,4.75,6.5,1.55,4.5,5.4,1.47,5.04,6.57,1.44,4.75,7.0,1.45,4.8,7.0,1.55,5.09,7.3,1.48,4.8,6.44,1.4,3.0,1.42,3.09,1.5,3.22,1.42,2.88,-1.25,2.02,1.91,2.04,1.89,2.06,1.92,2.01,1.88
289,E0,17/06/2020,20:15,Man City,Arsenal,3,0,H,1,0,H,A Taylor,20,3,12,0,9,7,5,2,1,1,0,1,1.36,5.25,7.5,1.34,5.25,8.25,1.35,5.5,7.8,1.37,5.65,8.52,1.33,5.5,8.5,1.33,5.5,9.0,1.4,6.05,9.0,1.35,5.5,8.11,1.36,3.2,1.4,3.23,1.42,3.23,1.38,3.04,-1.5,1.93,1.97,1.93,2.0,1.99,2.01,1.92,1.95,1.28,6.0,9.0,1.28,6.25,9.25,1.35,5.5,7.7,1.29,6.6,9.96,1.27,6.0,10.0,1.25,6.5,9.0,1.35,6.9,11.0,1.28,6.3,9.64,1.44,2.75,1.48,2.87,1.49,3.34,1.42,2.85,-1.75,1.97,1.96,1.98,1.94,2.01,1.99,1.95,1.93
176,E0,21/12/2019,17:30,Man City,Leicester,3,1,H,2,1,H,M Dean,23,5,12,2,14,9,6,1,2,2,0,0,1.36,5.25,7.5,1.36,5.5,7.5,1.37,5.3,7.4,1.38,5.46,8.01,1.36,5.25,8.0,1.36,5.4,8.0,1.4,5.8,8.25,1.37,5.41,7.72,1.4,3.0,1.44,3.02,1.47,3.02,1.42,2.86,-2.5,3.3,1.32,3.65,1.33,3.65,1.35,3.38,1.32,1.36,5.75,7.0,1.35,6.0,7.5,1.37,5.5,7.2,1.37,5.85,7.73,1.36,5.5,7.5,1.36,5.5,8.0,1.42,6.15,8.0,1.38,5.67,7.28,1.36,3.2,1.37,3.34,1.4,3.36,1.37,3.1,-1.5,2.09,1.84,2.08,1.85,2.12,1.9,2.03,1.84
112,E0,09/11/2019,15:00,Burnley,West Ham,3,0,H,2,0,H,K Friend,14,7,12,4,17,10,11,4,2,1,0,0,2.25,3.5,3.1,2.3,3.5,3.0,2.25,3.55,3.1,2.27,3.59,3.24,2.25,3.5,3.1,2.25,3.5,3.2,2.33,3.73,3.3,2.27,3.55,3.13,1.66,2.2,1.69,2.29,1.75,2.29,1.71,2.17,-0.25,1.98,1.95,1.97,1.96,2.0,1.96,1.97,1.92,2.1,3.6,3.3,2.15,3.6,3.4,2.15,3.6,3.25,2.18,3.63,3.41,2.1,3.6,3.4,2.15,3.7,3.3,2.22,3.7,3.43,2.16,3.61,3.34,1.66,2.2,1.73,2.23,1.73,2.34,1.67,2.22,-0.25,1.87,2.06,1.89,2.03,1.91,2.06,1.88,2.01
54,E0,21/09/2019,15:00,Man City,Watford,8,0,H,5,0,H,M Dean,28,5,11,4,5,9,5,4,2,2,0,0,1.1,11.0,21.0,1.11,10.5,17.5,1.12,9.0,20.0,1.11,10.69,27.58,1.1,11.0,21.0,1.1,11.0,23.0,1.13,11.5,31.0,1.11,10.24,23.21,1.25,4.0,1.28,4.01,1.29,4.2,1.25,3.91,-2.5,1.98,1.95,1.97,1.95,1.99,1.98,1.95,1.94,1.08,11.0,23.0,1.11,10.5,23.0,1.12,9.0,20.0,1.11,11.33,25.74,1.1,11.0,19.0,1.09,11.5,29.0,1.13,12.5,29.0,1.1,10.72,24.55,1.25,4.0,1.28,4.01,1.28,4.33,1.25,3.99,-2.5,1.9,2.03,1.93,1.99,1.93,2.16,1.88,2.0
117,E0,10/11/2019,14:00,Man United,Brighton,3,1,H,2,0,H,J Moss,21,6,11,2,10,14,5,2,2,5,0,0,1.6,3.75,6.5,1.6,3.8,6.0,1.7,3.65,5.4,1.7,3.66,5.98,1.67,3.6,5.8,1.65,3.7,5.75,1.72,3.8,6.5,1.67,3.67,5.82,2.0,1.8,2.0,1.92,2.1,1.92,2.02,1.82,-0.75,1.9,2.03,1.92,2.01,1.92,2.04,1.89,1.99,1.6,3.8,6.0,1.62,3.75,6.25,1.6,3.8,6.0,1.66,3.69,6.41,1.6,3.75,6.5,1.65,3.8,5.75,1.68,3.9,6.75,1.64,3.77,6.18,1.9,1.9,2.04,1.88,2.06,1.94,1.96,1.87,-0.75,1.86,2.07,1.87,2.06,1.9,2.16,1.83,2.07
285,E0,08/03/2020,14:00,Chelsea,Everton,4,0,H,2,0,H,K Friend,17,3,11,1,8,10,6,1,1,2,0,0,1.8,3.9,4.2,1.83,3.6,4.4,1.87,3.75,4.05,1.9,3.72,4.28,1.85,3.7,4.2,1.83,3.75,4.2,1.95,4.0,4.4,1.87,3.74,4.2,1.72,2.1,1.72,2.25,1.75,2.26,1.71,2.14,-0.5,1.88,2.02,1.9,2.03,1.91,2.06,1.86,2.01,1.83,3.9,4.0,1.87,3.75,4.2,2.0,3.6,3.6,1.88,3.86,4.19,1.83,3.8,4.2,1.83,3.8,4.33,2.0,3.98,4.4,1.87,3.8,4.13,1.72,2.1,1.76,2.18,1.78,2.23,1.72,2.13,-0.5,1.85,2.05,1.88,2.04,2.0,2.11,1.86,2.01
261,E0,22/02/2020,15:00,Burnley,Bournemouth,3,0,H,0,0,D,M Dean,16,12,10,4,9,9,7,3,4,3,0,0,2.1,3.3,3.7,2.15,3.25,3.6,2.15,3.25,3.65,2.16,3.32,3.76,2.1,3.25,3.7,2.15,3.25,3.75,2.25,3.4,3.8,2.15,3.28,3.65,2.2,1.66,2.33,1.68,2.33,1.74,2.23,1.66,-0.25,1.86,2.07,1.85,2.08,1.9,2.09,1.83,2.05,2.15,3.25,3.5,2.3,3.3,3.3,2.2,3.25,3.45,2.22,3.38,3.53,2.2,3.3,3.4,2.2,3.25,3.6,2.34,3.45,3.78,2.25,3.3,3.42,2.2,1.66,2.28,1.7,2.31,1.75,2.21,1.68,-0.25,1.93,2.0,1.91,2.02,1.97,2.08,1.92,1.97
322,E0,04/07/2020,15:00,Man United,Bournemouth,5,2,H,3,1,H,M Dean,19,7,10,3,13,12,8,3,0,1,0,0,1.16,7.5,17.0,1.17,7.0,18.0,1.18,7.1,16.0,1.17,7.66,18.7,1.15,7.5,19.0,1.17,7.5,20.0,1.19,8.3,21.0,1.17,7.46,17.41,1.53,2.5,1.53,2.62,1.59,2.71,1.53,2.54,-2.0,1.92,1.98,1.93,1.98,1.99,2.02,1.92,1.96,1.18,7.5,13.0,1.18,7.5,14.5,1.22,6.2,13.5,1.19,7.81,16.0,1.17,7.5,17.0,1.2,7.0,17.0,1.22,8.2,19.5,1.19,7.3,15.59,1.53,2.5,1.54,2.61,1.61,2.63,1.55,2.48,-1.75,1.82,2.08,1.81,2.11,1.85,2.16,1.8,2.09
186,E0,26/12/2019,17:30,Man United,Newcastle,4,1,H,3,1,H,K Friend,22,7,10,2,10,7,5,0,2,2,0,0,1.33,5.25,9.0,1.35,5.25,8.5,1.35,5.0,8.9,1.35,5.32,9.59,1.33,5.25,9.5,1.33,5.0,10.0,1.39,5.5,10.0,1.35,5.19,9.18,1.72,2.1,1.79,2.14,1.81,2.2,1.72,2.13,-1.5,2.06,1.84,2.06,1.87,2.09,1.88,2.03,1.84,1.33,5.5,8.5,1.34,5.25,8.75,1.4,4.8,7.6,1.36,5.42,9.37,1.33,5.25,9.0,1.33,5.25,10.0,1.4,5.6,10.0,1.35,5.34,8.84,1.72,2.1,1.72,2.25,1.73,2.37,1.68,2.2,-1.5,2.04,1.86,2.04,1.89,2.1,1.9,2.03,1.85


Make note that the above does not change the order of the data in our dataframe. Think of it like a preview. If we wanted to make that sorting the new order of our dataframe, we can either add `inplace=True` to `sort_values()` or create a new dataframe. If we look at df again, it will show our original order.

In [11]:
df.head(10)

Unnamed: 0,Div,Date,Time,HomeTeam,AwayTeam,FTHG,FTAG,FTR,HTHG,HTAG,HTR,Referee,HS,AS,HST,AST,HF,AF,HC,AC,HY,AY,HR,AR,B365H,B365D,B365A,BWH,BWD,BWA,IWH,IWD,IWA,PSH,PSD,PSA,WHH,WHD,WHA,VCH,VCD,VCA,MaxH,MaxD,MaxA,AvgH,AvgD,AvgA,B365>2.5,B365<2.5,P>2.5,P<2.5,Max>2.5,Max<2.5,Avg>2.5,Avg<2.5,AHh,B365AHH,B365AHA,PAHH,PAHA,MaxAHH,MaxAHA,AvgAHH,AvgAHA,B365CH,B365CD,B365CA,BWCH,BWCD,BWCA,IWCH,IWCD,IWCA,PSCH,PSCD,PSCA,WHCH,WHCD,WHCA,VCCH,VCCD,VCCA,MaxCH,MaxCD,MaxCA,AvgCH,AvgCD,AvgCA,B365C>2.5,B365C<2.5,PC>2.5,PC<2.5,MaxC>2.5,MaxC<2.5,AvgC>2.5,AvgC<2.5,AHCh,B365CAHH,B365CAHA,PCAHH,PCAHA,MaxCAHH,MaxCAHA,AvgCAHH,AvgCAHA
0,E0,09/08/2019,20:00,Liverpool,Norwich,4,1,H,4,0,H,M Oliver,15,12,7,5,9,9,11,2,0,2,0,0,1.14,10.0,19.0,1.14,8.25,18.5,1.15,8.0,18.0,1.15,9.59,18.05,1.12,8.5,21.0,1.14,9.5,23.0,1.16,10.0,23.0,1.14,8.75,19.83,1.4,3.0,1.4,3.11,1.45,3.11,1.41,2.92,-2.25,1.96,1.94,1.97,1.95,1.97,2.0,1.94,1.94,1.14,9.5,21.0,1.14,9.0,20.0,1.15,8.0,18.0,1.14,10.43,19.63,1.11,9.5,21.0,1.14,9.5,23.0,1.16,10.5,23.0,1.14,9.52,19.18,1.3,3.5,1.34,3.44,1.36,3.76,1.32,3.43,-2.25,1.91,1.99,1.94,1.98,1.99,2.07,1.9,1.99
1,E0,10/08/2019,12:30,West Ham,Man City,0,5,A,0,1,A,M Dean,5,14,3,9,6,13,1,1,2,2,0,0,12.0,6.5,1.22,11.5,5.75,1.26,11.0,6.1,1.25,11.68,6.53,1.26,13.0,6.0,1.24,12.0,6.5,1.25,13.0,6.75,1.29,11.84,6.28,1.25,1.44,2.75,1.49,2.77,1.51,2.77,1.48,2.65,1.75,2.0,1.9,2.02,1.9,2.02,1.92,1.99,1.89,12.0,7.0,1.25,11.0,6.0,1.26,11.0,6.1,1.25,11.11,6.68,1.27,11.0,6.5,1.24,12.0,6.5,1.25,13.0,7.0,1.29,11.14,6.46,1.26,1.4,3.0,1.43,3.03,1.5,3.22,1.41,2.91,1.75,1.95,1.95,1.96,1.97,2.07,1.98,1.97,1.92
2,E0,10/08/2019,15:00,Bournemouth,Sheffield United,1,1,D,0,0,D,K Friend,13,8,3,3,10,19,3,4,2,1,0,0,1.95,3.6,3.6,1.95,3.6,3.9,1.97,3.55,3.8,2.04,3.57,3.9,2.0,3.5,3.8,2.0,3.6,4.0,2.06,3.65,4.0,2.01,3.53,3.83,1.9,1.9,1.96,1.96,2.0,1.99,1.9,1.93,-0.5,2.01,1.89,2.04,1.88,2.04,1.91,2.0,1.88,1.95,3.7,4.2,1.95,3.6,3.9,1.97,3.55,3.85,1.98,3.67,4.06,1.95,3.6,3.9,2.0,3.6,4.0,2.03,3.7,4.2,1.98,3.58,3.96,1.9,1.9,1.94,1.97,1.97,1.98,1.91,1.92,-0.5,1.95,1.95,1.98,1.95,2.0,1.96,1.96,1.92
3,E0,10/08/2019,15:00,Burnley,Southampton,3,0,H,0,0,D,G Scott,10,11,4,3,6,12,2,7,0,0,0,0,2.62,3.2,2.75,2.65,3.2,2.75,2.65,3.2,2.75,2.71,3.31,2.81,2.7,3.2,2.75,2.7,3.3,2.8,2.8,3.33,2.85,2.68,3.22,2.78,2.1,1.72,2.17,1.77,2.2,1.78,2.12,1.73,0.0,1.92,1.98,1.93,2.0,1.94,2.0,1.91,1.98,2.7,3.25,2.9,2.65,3.1,2.85,2.6,3.2,2.85,2.71,3.19,2.9,2.62,3.2,2.8,2.7,3.25,2.9,2.72,3.26,2.95,2.65,3.18,2.88,2.1,1.72,2.19,1.76,2.25,1.78,2.17,1.71,0.0,1.87,2.03,1.89,2.03,1.9,2.07,1.86,2.02
4,E0,10/08/2019,15:00,Crystal Palace,Everton,0,0,D,0,0,D,J Moss,6,10,2,3,16,14,6,2,2,1,0,1,3.0,3.25,2.37,3.2,3.2,2.35,3.1,3.2,2.4,3.21,3.37,2.39,3.1,3.3,2.35,3.2,3.3,2.45,3.21,3.4,2.52,3.13,3.27,2.4,2.2,1.66,2.23,1.74,2.25,1.74,2.18,1.7,0.25,1.85,2.05,1.88,2.05,1.88,2.09,1.84,2.04,3.4,3.5,2.25,3.3,3.3,2.25,3.4,3.3,2.2,3.37,3.45,2.27,3.3,3.3,2.25,3.4,3.3,2.25,3.55,3.5,2.34,3.41,3.37,2.23,2.2,1.66,2.22,1.74,2.28,1.77,2.17,1.71,0.25,1.82,2.08,1.97,1.96,2.03,2.08,1.96,1.93
5,E0,10/08/2019,15:00,Watford,Brighton,0,3,A,0,1,A,C Pawson,11,5,3,3,15,11,5,2,0,1,0,0,1.9,3.4,4.0,1.9,3.4,4.33,1.93,3.4,4.25,1.98,3.44,4.37,1.95,3.4,4.2,1.95,3.5,4.33,2.0,3.5,4.6,1.94,3.41,4.26,2.1,1.72,2.19,1.76,2.24,1.76,2.16,1.71,-0.5,1.95,1.95,1.98,1.95,1.98,1.98,1.94,1.94,2.1,3.25,4.2,2.1,3.1,4.0,2.05,3.2,4.0,2.05,3.38,4.12,2.05,3.25,4.0,2.15,3.3,3.9,2.15,3.38,4.2,2.07,3.27,4.04,2.1,1.72,2.16,1.78,2.2,1.78,2.14,1.73,-0.5,2.04,1.86,2.05,1.88,2.12,1.91,2.05,1.84
6,E0,10/08/2019,17:30,Tottenham,Aston Villa,3,1,H,0,1,A,C Kavanagh,31,7,7,4,13,9,14,0,1,0,0,0,1.3,5.25,10.0,1.3,5.5,10.0,1.3,5.5,9.6,1.3,5.84,10.96,1.29,5.5,10.0,1.3,5.5,12.0,1.33,5.95,12.0,1.3,5.53,10.51,1.66,2.2,1.64,2.4,1.7,2.4,1.65,2.26,-1.5,1.97,1.93,1.99,1.93,2.0,2.0,1.93,1.94,1.36,5.5,9.0,1.35,5.0,9.0,1.3,5.5,9.6,1.39,5.35,8.42,1.35,5.25,8.0,1.4,5.2,9.0,1.4,5.7,10.0,1.36,5.29,8.82,1.57,2.37,1.58,2.52,1.65,2.55,1.58,2.4,-1.5,2.1,1.7,2.18,1.77,2.21,1.87,2.08,1.8
7,E0,11/08/2019,14:00,Leicester,Wolves,0,0,D,0,0,D,A Marriner,15,8,1,2,3,13,12,3,0,2,0,0,2.2,3.2,3.4,2.25,3.3,3.3,2.2,3.25,3.45,2.21,3.34,3.66,2.2,3.25,3.5,2.25,3.3,3.6,2.29,3.38,3.66,2.22,3.28,3.48,2.2,1.66,2.23,1.74,2.25,1.74,2.17,1.7,-0.25,1.9,2.0,1.9,2.04,1.95,2.04,1.91,1.98,2.4,3.25,3.3,2.35,3.2,3.3,2.35,3.15,3.2,2.5,3.12,3.3,2.35,3.1,3.3,2.45,3.2,3.3,2.55,3.25,3.58,2.41,3.14,3.29,2.3,1.61,2.45,1.63,2.45,1.71,2.33,1.62,-0.25,2.07,1.83,2.11,1.83,2.12,1.98,2.06,1.84
8,E0,11/08/2019,14:00,Newcastle,Arsenal,0,1,A,0,0,D,M Atkinson,9,8,2,2,12,7,5,3,1,3,0,0,4.5,3.75,1.72,4.5,3.75,1.78,4.4,3.85,1.77,4.58,3.93,1.81,4.5,3.75,1.78,4.6,3.9,1.8,4.7,4.0,1.83,4.49,3.82,1.79,1.8,2.0,1.83,2.1,1.83,2.14,1.77,2.07,0.75,1.85,2.05,1.86,2.07,1.88,2.08,1.85,2.03,3.4,3.6,2.2,3.3,3.5,2.2,3.25,3.5,2.2,3.36,3.56,2.25,3.5,3.4,2.15,3.4,3.5,2.25,3.76,3.65,2.25,3.36,3.51,2.2,1.8,2.0,1.83,2.09,1.85,2.17,1.79,2.05,0.25,1.99,1.91,1.99,1.95,2.17,1.97,2.0,1.89
9,E0,11/08/2019,16:30,Man United,Chelsea,4,0,H,1,0,H,A Taylor,11,18,5,7,15,13,3,5,3,4,0,0,2.1,3.3,3.5,2.15,3.3,3.5,2.15,3.35,3.4,2.21,3.37,3.63,2.15,3.3,3.5,2.25,3.3,3.5,2.28,3.43,3.63,2.19,3.32,3.49,2.0,1.8,2.05,1.87,2.1,1.87,2.01,1.83,-0.25,1.9,2.0,1.9,2.04,1.92,2.04,1.89,2.0,2.05,3.5,4.0,2.1,3.3,3.8,2.05,3.3,3.85,2.04,3.44,4.14,2.0,3.4,4.0,2.05,3.4,4.1,2.2,3.5,4.4,2.05,3.36,3.99,2.0,1.8,2.05,1.88,2.07,2.04,1.99,1.84,-0.5,2.02,1.88,2.04,1.9,2.1,1.91,2.04,1.85


We'll come back to sort_values and more functionality shortly, but first let's get our Date column sorted

<div class="back_to_top" style="float:right;">

<a href="#contents" style="color:red;">[Back to Top &uarr;]</a>

</div>




<a id="string-to-date"></a>

### Converting Strings to Dates

We may want to look at specific periods over the season so it's important we can filter our data properly. In order  to do that we have convert the Date column above into a `datetime` object. The format of the date above is DD/MM/YYYY, we want that to be YYYY-MM-DD instead.

We create a new column by using the name of our dataframe - `df` - followed by the name we want for the column in square brackets - `['game_date']

In [12]:
df['game_date'] = pd.to_datetime(df.Date, dayfirst=True)

If we scroll across to the end of our dataframe we'll see our new game_date column

In [13]:
df.head()

Unnamed: 0,Div,Date,Time,HomeTeam,AwayTeam,FTHG,FTAG,FTR,HTHG,HTAG,HTR,Referee,HS,AS,HST,AST,HF,AF,HC,AC,HY,AY,HR,AR,B365H,B365D,B365A,BWH,BWD,BWA,IWH,IWD,IWA,PSH,PSD,PSA,WHH,WHD,WHA,VCH,VCD,VCA,MaxH,MaxD,MaxA,AvgH,AvgD,AvgA,B365>2.5,B365<2.5,P>2.5,P<2.5,Max>2.5,Max<2.5,Avg>2.5,Avg<2.5,AHh,B365AHH,B365AHA,PAHH,PAHA,MaxAHH,MaxAHA,AvgAHH,AvgAHA,B365CH,B365CD,B365CA,BWCH,BWCD,BWCA,IWCH,IWCD,IWCA,PSCH,PSCD,PSCA,WHCH,WHCD,WHCA,VCCH,VCCD,VCCA,MaxCH,MaxCD,MaxCA,AvgCH,AvgCD,AvgCA,B365C>2.5,B365C<2.5,PC>2.5,PC<2.5,MaxC>2.5,MaxC<2.5,AvgC>2.5,AvgC<2.5,AHCh,B365CAHH,B365CAHA,PCAHH,PCAHA,MaxCAHH,MaxCAHA,AvgCAHH,AvgCAHA,game_date
0,E0,09/08/2019,20:00,Liverpool,Norwich,4,1,H,4,0,H,M Oliver,15,12,7,5,9,9,11,2,0,2,0,0,1.14,10.0,19.0,1.14,8.25,18.5,1.15,8.0,18.0,1.15,9.59,18.05,1.12,8.5,21.0,1.14,9.5,23.0,1.16,10.0,23.0,1.14,8.75,19.83,1.4,3.0,1.4,3.11,1.45,3.11,1.41,2.92,-2.25,1.96,1.94,1.97,1.95,1.97,2.0,1.94,1.94,1.14,9.5,21.0,1.14,9.0,20.0,1.15,8.0,18.0,1.14,10.43,19.63,1.11,9.5,21.0,1.14,9.5,23.0,1.16,10.5,23.0,1.14,9.52,19.18,1.3,3.5,1.34,3.44,1.36,3.76,1.32,3.43,-2.25,1.91,1.99,1.94,1.98,1.99,2.07,1.9,1.99,2019-08-09
1,E0,10/08/2019,12:30,West Ham,Man City,0,5,A,0,1,A,M Dean,5,14,3,9,6,13,1,1,2,2,0,0,12.0,6.5,1.22,11.5,5.75,1.26,11.0,6.1,1.25,11.68,6.53,1.26,13.0,6.0,1.24,12.0,6.5,1.25,13.0,6.75,1.29,11.84,6.28,1.25,1.44,2.75,1.49,2.77,1.51,2.77,1.48,2.65,1.75,2.0,1.9,2.02,1.9,2.02,1.92,1.99,1.89,12.0,7.0,1.25,11.0,6.0,1.26,11.0,6.1,1.25,11.11,6.68,1.27,11.0,6.5,1.24,12.0,6.5,1.25,13.0,7.0,1.29,11.14,6.46,1.26,1.4,3.0,1.43,3.03,1.5,3.22,1.41,2.91,1.75,1.95,1.95,1.96,1.97,2.07,1.98,1.97,1.92,2019-08-10
2,E0,10/08/2019,15:00,Bournemouth,Sheffield United,1,1,D,0,0,D,K Friend,13,8,3,3,10,19,3,4,2,1,0,0,1.95,3.6,3.6,1.95,3.6,3.9,1.97,3.55,3.8,2.04,3.57,3.9,2.0,3.5,3.8,2.0,3.6,4.0,2.06,3.65,4.0,2.01,3.53,3.83,1.9,1.9,1.96,1.96,2.0,1.99,1.9,1.93,-0.5,2.01,1.89,2.04,1.88,2.04,1.91,2.0,1.88,1.95,3.7,4.2,1.95,3.6,3.9,1.97,3.55,3.85,1.98,3.67,4.06,1.95,3.6,3.9,2.0,3.6,4.0,2.03,3.7,4.2,1.98,3.58,3.96,1.9,1.9,1.94,1.97,1.97,1.98,1.91,1.92,-0.5,1.95,1.95,1.98,1.95,2.0,1.96,1.96,1.92,2019-08-10
3,E0,10/08/2019,15:00,Burnley,Southampton,3,0,H,0,0,D,G Scott,10,11,4,3,6,12,2,7,0,0,0,0,2.62,3.2,2.75,2.65,3.2,2.75,2.65,3.2,2.75,2.71,3.31,2.81,2.7,3.2,2.75,2.7,3.3,2.8,2.8,3.33,2.85,2.68,3.22,2.78,2.1,1.72,2.17,1.77,2.2,1.78,2.12,1.73,0.0,1.92,1.98,1.93,2.0,1.94,2.0,1.91,1.98,2.7,3.25,2.9,2.65,3.1,2.85,2.6,3.2,2.85,2.71,3.19,2.9,2.62,3.2,2.8,2.7,3.25,2.9,2.72,3.26,2.95,2.65,3.18,2.88,2.1,1.72,2.19,1.76,2.25,1.78,2.17,1.71,0.0,1.87,2.03,1.89,2.03,1.9,2.07,1.86,2.02,2019-08-10
4,E0,10/08/2019,15:00,Crystal Palace,Everton,0,0,D,0,0,D,J Moss,6,10,2,3,16,14,6,2,2,1,0,1,3.0,3.25,2.37,3.2,3.2,2.35,3.1,3.2,2.4,3.21,3.37,2.39,3.1,3.3,2.35,3.2,3.3,2.45,3.21,3.4,2.52,3.13,3.27,2.4,2.2,1.66,2.23,1.74,2.25,1.74,2.18,1.7,0.25,1.85,2.05,1.88,2.05,1.88,2.09,1.84,2.04,3.4,3.5,2.25,3.3,3.3,2.25,3.4,3.3,2.2,3.37,3.45,2.27,3.3,3.3,2.25,3.4,3.3,2.25,3.55,3.5,2.34,3.41,3.37,2.23,2.2,1.66,2.22,1.74,2.28,1.77,2.17,1.71,0.25,1.82,2.08,1.97,1.96,2.03,2.08,1.96,1.93,2019-08-10


But hold your horses. We also have a `Time` column! We can combine these to ensure our data is sorted by Date and Time!

In [14]:
df['date_and_time'] = df.Date.astype(str)+" "+df.Time.astype(str)

df['game_date'] = pd.to_datetime(df.date_and_time, dayfirst=True)
df.head()

Unnamed: 0,Div,Date,Time,HomeTeam,AwayTeam,FTHG,FTAG,FTR,HTHG,HTAG,HTR,Referee,HS,AS,HST,AST,HF,AF,HC,AC,HY,AY,HR,AR,B365H,B365D,B365A,BWH,BWD,BWA,IWH,IWD,IWA,PSH,PSD,PSA,WHH,WHD,WHA,VCH,VCD,VCA,MaxH,MaxD,MaxA,AvgH,AvgD,AvgA,B365>2.5,B365<2.5,P>2.5,P<2.5,Max>2.5,Max<2.5,Avg>2.5,Avg<2.5,AHh,B365AHH,B365AHA,PAHH,PAHA,MaxAHH,MaxAHA,AvgAHH,AvgAHA,B365CH,B365CD,B365CA,BWCH,BWCD,BWCA,IWCH,IWCD,IWCA,PSCH,PSCD,PSCA,WHCH,WHCD,WHCA,VCCH,VCCD,VCCA,MaxCH,MaxCD,MaxCA,AvgCH,AvgCD,AvgCA,B365C>2.5,B365C<2.5,PC>2.5,PC<2.5,MaxC>2.5,MaxC<2.5,AvgC>2.5,AvgC<2.5,AHCh,B365CAHH,B365CAHA,PCAHH,PCAHA,MaxCAHH,MaxCAHA,AvgCAHH,AvgCAHA,game_date,date_and_time
0,E0,09/08/2019,20:00,Liverpool,Norwich,4,1,H,4,0,H,M Oliver,15,12,7,5,9,9,11,2,0,2,0,0,1.14,10.0,19.0,1.14,8.25,18.5,1.15,8.0,18.0,1.15,9.59,18.05,1.12,8.5,21.0,1.14,9.5,23.0,1.16,10.0,23.0,1.14,8.75,19.83,1.4,3.0,1.4,3.11,1.45,3.11,1.41,2.92,-2.25,1.96,1.94,1.97,1.95,1.97,2.0,1.94,1.94,1.14,9.5,21.0,1.14,9.0,20.0,1.15,8.0,18.0,1.14,10.43,19.63,1.11,9.5,21.0,1.14,9.5,23.0,1.16,10.5,23.0,1.14,9.52,19.18,1.3,3.5,1.34,3.44,1.36,3.76,1.32,3.43,-2.25,1.91,1.99,1.94,1.98,1.99,2.07,1.9,1.99,2019-08-09 20:00:00,09/08/2019 20:00
1,E0,10/08/2019,12:30,West Ham,Man City,0,5,A,0,1,A,M Dean,5,14,3,9,6,13,1,1,2,2,0,0,12.0,6.5,1.22,11.5,5.75,1.26,11.0,6.1,1.25,11.68,6.53,1.26,13.0,6.0,1.24,12.0,6.5,1.25,13.0,6.75,1.29,11.84,6.28,1.25,1.44,2.75,1.49,2.77,1.51,2.77,1.48,2.65,1.75,2.0,1.9,2.02,1.9,2.02,1.92,1.99,1.89,12.0,7.0,1.25,11.0,6.0,1.26,11.0,6.1,1.25,11.11,6.68,1.27,11.0,6.5,1.24,12.0,6.5,1.25,13.0,7.0,1.29,11.14,6.46,1.26,1.4,3.0,1.43,3.03,1.5,3.22,1.41,2.91,1.75,1.95,1.95,1.96,1.97,2.07,1.98,1.97,1.92,2019-08-10 12:30:00,10/08/2019 12:30
2,E0,10/08/2019,15:00,Bournemouth,Sheffield United,1,1,D,0,0,D,K Friend,13,8,3,3,10,19,3,4,2,1,0,0,1.95,3.6,3.6,1.95,3.6,3.9,1.97,3.55,3.8,2.04,3.57,3.9,2.0,3.5,3.8,2.0,3.6,4.0,2.06,3.65,4.0,2.01,3.53,3.83,1.9,1.9,1.96,1.96,2.0,1.99,1.9,1.93,-0.5,2.01,1.89,2.04,1.88,2.04,1.91,2.0,1.88,1.95,3.7,4.2,1.95,3.6,3.9,1.97,3.55,3.85,1.98,3.67,4.06,1.95,3.6,3.9,2.0,3.6,4.0,2.03,3.7,4.2,1.98,3.58,3.96,1.9,1.9,1.94,1.97,1.97,1.98,1.91,1.92,-0.5,1.95,1.95,1.98,1.95,2.0,1.96,1.96,1.92,2019-08-10 15:00:00,10/08/2019 15:00
3,E0,10/08/2019,15:00,Burnley,Southampton,3,0,H,0,0,D,G Scott,10,11,4,3,6,12,2,7,0,0,0,0,2.62,3.2,2.75,2.65,3.2,2.75,2.65,3.2,2.75,2.71,3.31,2.81,2.7,3.2,2.75,2.7,3.3,2.8,2.8,3.33,2.85,2.68,3.22,2.78,2.1,1.72,2.17,1.77,2.2,1.78,2.12,1.73,0.0,1.92,1.98,1.93,2.0,1.94,2.0,1.91,1.98,2.7,3.25,2.9,2.65,3.1,2.85,2.6,3.2,2.85,2.71,3.19,2.9,2.62,3.2,2.8,2.7,3.25,2.9,2.72,3.26,2.95,2.65,3.18,2.88,2.1,1.72,2.19,1.76,2.25,1.78,2.17,1.71,0.0,1.87,2.03,1.89,2.03,1.9,2.07,1.86,2.02,2019-08-10 15:00:00,10/08/2019 15:00
4,E0,10/08/2019,15:00,Crystal Palace,Everton,0,0,D,0,0,D,J Moss,6,10,2,3,16,14,6,2,2,1,0,1,3.0,3.25,2.37,3.2,3.2,2.35,3.1,3.2,2.4,3.21,3.37,2.39,3.1,3.3,2.35,3.2,3.3,2.45,3.21,3.4,2.52,3.13,3.27,2.4,2.2,1.66,2.23,1.74,2.25,1.74,2.18,1.7,0.25,1.85,2.05,1.88,2.05,1.88,2.09,1.84,2.04,3.4,3.5,2.25,3.3,3.3,2.25,3.4,3.3,2.2,3.37,3.45,2.27,3.3,3.3,2.25,3.4,3.3,2.25,3.55,3.5,2.34,3.41,3.37,2.23,2.2,1.66,2.22,1.74,2.28,1.77,2.17,1.71,0.25,1.82,2.08,1.97,1.96,2.03,2.08,1.96,1.93,2019-08-10 15:00:00,10/08/2019 15:00


Now we can sort our data by most recent game!

In [15]:
df = df.sort_values("game_date", ascending=False)
df.head(10)

Unnamed: 0,Div,Date,Time,HomeTeam,AwayTeam,FTHG,FTAG,FTR,HTHG,HTAG,HTR,Referee,HS,AS,HST,AST,HF,AF,HC,AC,HY,AY,HR,AR,B365H,B365D,B365A,BWH,BWD,BWA,IWH,IWD,IWA,PSH,PSD,PSA,WHH,WHD,WHA,VCH,VCD,VCA,MaxH,MaxD,MaxA,AvgH,AvgD,AvgA,B365>2.5,B365<2.5,P>2.5,P<2.5,Max>2.5,Max<2.5,Avg>2.5,Avg<2.5,AHh,B365AHH,B365AHA,PAHH,PAHA,MaxAHH,MaxAHA,AvgAHH,AvgAHA,B365CH,B365CD,B365CA,BWCH,BWCD,BWCA,IWCH,IWCD,IWCA,PSCH,PSCD,PSCA,WHCH,WHCD,WHCA,VCCH,VCCD,VCCA,MaxCH,MaxCD,MaxCA,AvgCH,AvgCD,AvgCA,B365C>2.5,B365C<2.5,PC>2.5,PC<2.5,MaxC>2.5,MaxC<2.5,AvgC>2.5,AvgC<2.5,AHCh,B365CAHH,B365CAHA,PCAHH,PCAHA,MaxCAHH,MaxCAHA,AvgCAHH,AvgCAHA,game_date,date_and_time
369,E0,22/07/2020,20:15,Liverpool,Chelsea,5,3,H,3,1,H,A Marriner,10,10,7,5,8,11,6,0,1,0,0,0,2.0,3.75,3.5,1.95,3.8,3.6,2.1,3.5,3.4,2.08,3.71,3.59,2.05,3.7,3.4,2.05,3.75,3.4,2.13,3.95,3.7,2.04,3.7,3.51,1.57,2.37,1.65,2.39,1.66,2.41,1.6,2.34,-0.5,2.09,1.84,2.08,1.85,2.13,1.87,2.05,1.82,1.9,3.8,3.8,1.9,3.8,3.8,2.0,3.55,3.75,1.93,3.8,4.09,1.88,3.7,4.0,1.87,3.75,4.1,2.0,3.9,4.25,1.91,3.73,3.97,1.72,2.1,1.78,2.16,1.81,2.23,1.72,2.13,-0.5,1.91,1.99,1.93,2.01,1.95,2.04,1.91,1.96,2020-07-22 20:15:00,22/07/2020 20:15
368,E0,22/07/2020,18:00,Man United,West Ham,1,1,D,0,1,A,P Tierney,11,12,4,3,10,9,2,3,3,1,0,0,1.22,6.5,11.0,1.25,6.25,11.5,1.27,6.0,11.0,1.26,6.37,12.18,1.24,6.5,12.0,1.22,6.5,13.0,1.28,7.0,13.5,1.24,6.38,11.56,1.5,2.62,1.48,2.84,1.5,2.84,1.47,2.68,-1.75,1.95,1.98,1.97,1.95,1.97,2.15,1.92,1.96,1.22,6.5,12.0,1.25,6.5,10.5,1.25,6.0,9.0,1.27,6.55,10.55,1.25,6.0,12.0,1.29,6.0,11.5,1.3,7.0,13.0,1.26,6.4,10.94,1.44,2.75,1.45,2.92,1.47,2.93,1.44,2.81,-1.75,2.05,1.85,2.04,1.88,2.09,1.97,1.98,1.89,2020-07-22 18:00:00,22/07/2020 18:00
367,E0,21/07/2020,20:15,Aston Villa,Arsenal,1,0,H,1,0,H,C Kavanagh,8,7,3,0,13,19,8,9,2,4,0,0,3.3,3.6,2.1,3.25,3.9,2.05,3.25,3.55,2.15,3.24,3.94,2.14,3.2,3.8,2.1,3.3,3.7,2.15,3.4,3.98,2.2,3.24,3.78,2.11,1.66,2.2,1.7,2.28,1.74,2.37,1.66,2.25,0.25,2.04,1.89,2.04,1.88,2.09,1.92,2.02,1.85,3.1,3.75,2.2,3.1,3.6,2.2,3.1,3.65,2.25,3.14,3.86,2.21,3.2,3.7,2.15,3.13,3.6,2.25,3.35,3.88,2.26,3.13,3.7,2.2,1.72,2.1,1.76,2.18,1.79,2.25,1.73,2.13,0.25,1.96,1.97,1.98,1.94,1.99,2.0,1.95,1.94,2020-07-21 20:15:00,21/07/2020 20:15
366,E0,21/07/2020,18:00,Watford,Man City,0,4,A,0,2,A,M Oliver,2,26,0,10,14,11,0,8,2,0,0,0,8.5,5.5,1.33,9.0,5.75,1.3,8.0,5.25,1.35,8.57,5.73,1.34,9.5,5.8,1.3,10.0,5.75,1.3,10.0,6.15,1.37,8.62,5.6,1.33,1.5,2.62,1.5,2.74,1.5,2.79,1.48,2.67,1.5,1.97,1.96,1.94,1.98,2.0,1.99,1.94,1.94,9.0,5.75,1.3,7.25,5.75,1.35,9.0,5.75,1.33,9.2,6.37,1.3,9.5,6.0,1.29,9.5,6.0,1.3,10.5,6.6,1.33,9.13,6.01,1.3,1.5,2.62,1.49,2.79,1.5,2.82,1.47,2.69,1.5,2.05,1.88,2.06,1.85,2.09,1.91,2.03,1.85,2020-07-21 18:00:00,21/07/2020 18:00
365,E0,20/07/2020,20:15,Wolves,Crystal Palace,2,0,H,1,0,H,P Bankes,11,7,5,3,11,15,5,2,2,1,0,0,1.5,4.0,7.5,1.5,3.9,7.75,1.53,3.8,7.5,1.51,4.12,8.08,1.47,4.0,8.0,1.5,3.9,8.0,1.54,4.35,8.3,1.5,4.03,7.72,2.3,1.61,2.33,1.68,2.39,1.71,2.29,1.64,-1.0,1.94,1.96,1.97,1.96,1.99,2.02,1.9,1.97,1.4,4.2,9.5,1.44,4.25,8.25,1.45,4.0,8.25,1.46,4.31,8.7,1.42,4.2,9.0,1.45,4.2,9.0,1.49,4.55,10.3,1.44,4.24,8.68,2.2,1.66,2.21,1.75,2.27,1.82,2.18,1.69,-1.0,1.79,2.11,1.83,2.1,1.88,2.15,1.79,2.11,2020-07-20 20:15:00,20/07/2020 20:15
364,E0,20/07/2020,18:00,Sheffield United,Everton,0,1,A,0,0,D,S Attwell,8,5,0,2,11,19,7,1,1,3,0,0,2.2,3.25,3.5,2.15,3.3,3.5,2.2,3.1,3.55,2.18,3.42,3.57,2.15,3.3,3.5,2.15,3.2,3.7,2.27,3.45,3.7,2.17,3.29,3.52,2.3,1.61,2.37,1.66,2.43,1.7,2.32,1.62,-0.25,1.88,2.02,1.88,2.04,1.93,2.07,1.87,2.01,1.95,3.3,4.2,1.91,3.4,4.25,1.97,3.2,4.1,2.0,3.41,4.25,1.95,3.3,4.2,1.95,3.3,4.33,2.05,3.53,4.4,1.98,3.35,4.15,2.2,1.66,2.31,1.68,2.43,1.69,2.28,1.64,-0.5,2.0,1.93,2.0,1.93,2.05,1.95,1.99,1.89,2020-07-20 18:00:00,20/07/2020 18:00
363,E0,20/07/2020,18:00,Brighton,Newcastle,0,0,D,0,0,D,S Hooper,11,12,3,1,13,12,9,7,4,2,0,0,1.9,3.4,4.33,1.95,3.4,4.2,1.95,3.25,4.1,1.97,3.48,4.3,1.91,3.5,4.2,1.91,3.4,4.33,2.0,3.62,4.5,1.93,3.42,4.23,2.1,1.72,2.14,1.79,2.2,1.82,2.12,1.73,-0.5,1.95,1.95,1.97,1.96,1.98,1.99,1.93,1.93,1.95,3.5,4.0,1.95,3.5,4.0,2.0,3.3,3.85,2.01,3.55,4.0,2.0,3.5,3.9,2.0,3.5,3.9,2.05,3.67,4.2,1.99,3.49,3.92,1.9,1.9,1.93,1.99,1.98,2.03,1.9,1.92,-0.5,2.04,1.89,2.01,1.91,2.06,1.95,2.0,1.88,2020-07-20 18:00:00,20/07/2020 18:00
362,E0,19/07/2020,16:00,Tottenham,Leicester,3,0,H,3,0,H,A Taylor,7,24,3,6,15,10,4,13,2,1,0,0,2.25,3.3,3.25,2.35,3.3,3.1,2.35,3.25,3.05,2.33,3.53,3.15,2.3,3.4,3.1,2.3,3.3,3.2,2.41,3.58,3.25,2.32,3.42,3.11,1.9,1.9,2.01,1.91,2.04,1.95,1.95,1.87,-0.25,2.01,1.89,2.02,1.91,2.05,1.93,2.01,1.87,2.25,3.4,3.2,2.4,3.3,3.0,2.35,3.2,3.1,2.36,3.39,3.22,2.25,3.3,3.3,2.38,3.3,3.13,2.44,3.5,3.51,2.33,3.36,3.14,2.0,1.8,2.07,1.85,2.11,1.92,1.99,1.84,-0.25,2.03,1.87,2.03,1.89,2.07,1.9,2.02,1.86,2020-07-19 16:00:00,19/07/2020 16:00
361,E0,19/07/2020,14:00,Bournemouth,Southampton,0,2,A,0,1,A,C Pawson,11,16,3,8,12,11,12,6,2,2,0,0,2.4,3.6,2.8,2.3,3.7,2.9,2.45,3.4,2.85,2.43,3.78,2.82,2.4,3.6,2.8,2.38,3.5,2.9,2.5,3.82,3.01,2.39,3.65,2.84,1.72,2.1,1.74,2.22,1.79,2.25,1.71,2.16,-0.25,2.07,1.83,2.13,1.81,2.13,1.86,2.08,1.81,2.4,3.75,2.7,2.35,3.7,2.8,2.45,3.5,2.75,2.46,3.82,2.76,2.45,3.7,2.7,2.45,3.6,2.8,2.55,3.92,2.85,2.43,3.73,2.74,1.66,2.2,1.66,2.35,1.75,2.37,1.65,2.27,0.0,1.84,2.06,1.85,2.08,1.91,2.15,1.83,2.06,2020-07-19 14:00:00,19/07/2020 14:00
360,E0,18/07/2020,17:30,Norwich,Burnley,0,2,A,0,1,A,K Friend,6,23,2,8,9,16,6,8,0,0,2,0,3.2,3.4,2.25,3.4,3.3,2.2,3.3,3.15,2.25,3.4,3.43,2.25,3.3,3.3,2.25,3.4,3.3,2.25,3.52,3.43,2.36,3.32,3.33,2.25,2.0,1.8,2.07,1.85,2.12,1.88,2.05,1.78,0.25,1.97,1.93,1.98,1.94,1.98,1.95,1.94,1.93,3.1,3.4,2.3,3.3,3.4,2.2,3.15,3.15,2.35,3.19,3.45,2.34,3.2,3.4,2.25,3.3,3.3,2.3,3.37,3.6,2.4,3.2,3.39,2.3,2.1,1.72,2.14,1.79,2.21,1.83,2.13,1.73,0.25,1.9,2.0,1.9,2.02,2.01,2.04,1.9,1.98,2020-07-18 17:30:00,18/07/2020 17:30



<div class="back_to_top" style="float:right;">

<a href="#contents" style="color:red;">[Back to Top &uarr;]</a>

</div>





<a id="pick-n-rename"></a>

### Picking and Renaming Columns

For our purposes today, we don't need all of these columns. We'll choose a handful and rename some to make them clearer.

In [16]:
cols = ['Div','game_date','HomeTeam','AwayTeam',
        'FTHG','FTAG','FTR','HS','AS','HST','AST',
        'PSCH','PSCD','PSCA']

In the above we create a list of column names. we can now use this list to get those columns from our dataframe by using `df = df[cols]`. Before we do that though, let's see what it would look like:

In [17]:
df[cols]

Unnamed: 0,Div,game_date,HomeTeam,AwayTeam,FTHG,FTAG,FTR,HS,AS,HST,AST,PSCH,PSCD,PSCA
369,E0,2020-07-22 20:15:00,Liverpool,Chelsea,5,3,H,10,10,7,5,1.93,3.80,4.09
368,E0,2020-07-22 18:00:00,Man United,West Ham,1,1,D,11,12,4,3,1.27,6.55,10.55
367,E0,2020-07-21 20:15:00,Aston Villa,Arsenal,1,0,H,8,7,3,0,3.14,3.86,2.21
366,E0,2020-07-21 18:00:00,Watford,Man City,0,4,A,2,26,0,10,9.20,6.37,1.30
365,E0,2020-07-20 20:15:00,Wolves,Crystal Palace,2,0,H,11,7,5,3,1.46,4.31,8.70
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2,E0,2019-08-10 15:00:00,Bournemouth,Sheffield United,1,1,D,13,8,3,3,1.98,3.67,4.06
5,E0,2019-08-10 15:00:00,Watford,Brighton,0,3,A,11,5,3,3,2.05,3.38,4.12
3,E0,2019-08-10 15:00:00,Burnley,Southampton,3,0,H,10,11,4,3,2.71,3.19,2.90
1,E0,2019-08-10 12:30:00,West Ham,Man City,0,5,A,5,14,3,9,11.11,6.68,1.27


This looks good. Let's make this our new dataframe and rename our columns. We can do this through a method called **chaining**. We've already seen this earlier when we used

`df.sort_values("HST", ascending=False).head(10)` 

here we chained .sort_values() with .head()

Below we will do the same, however we will wrap the entire part in parenthesis `()`

In [18]:
(df[cols]
 .rename(columns={"Div":"competition","HomeTeam":"home","AwayTeam":"away",
                  "FTHG":"home_goals","FTAG":"away_goals","FTR":"result",
                  "HS":"home_shots","AS":"away_shots","HST":"home_sot","AST":"away_sot",
                  "PSCH":"home_odds","PSCD":"draw_odds","PSCA":"away_odds"})
)

Unnamed: 0,competition,game_date,home,away,home_goals,away_goals,result,home_shots,away_shots,home_sot,away_sot,home_odds,draw_odds,away_odds
369,E0,2020-07-22 20:15:00,Liverpool,Chelsea,5,3,H,10,10,7,5,1.93,3.80,4.09
368,E0,2020-07-22 18:00:00,Man United,West Ham,1,1,D,11,12,4,3,1.27,6.55,10.55
367,E0,2020-07-21 20:15:00,Aston Villa,Arsenal,1,0,H,8,7,3,0,3.14,3.86,2.21
366,E0,2020-07-21 18:00:00,Watford,Man City,0,4,A,2,26,0,10,9.20,6.37,1.30
365,E0,2020-07-20 20:15:00,Wolves,Crystal Palace,2,0,H,11,7,5,3,1.46,4.31,8.70
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2,E0,2019-08-10 15:00:00,Bournemouth,Sheffield United,1,1,D,13,8,3,3,1.98,3.67,4.06
5,E0,2019-08-10 15:00:00,Watford,Brighton,0,3,A,11,5,3,3,2.05,3.38,4.12
3,E0,2019-08-10 15:00:00,Burnley,Southampton,3,0,H,10,11,4,3,2.71,3.19,2.90
1,E0,2019-08-10 12:30:00,West Ham,Man City,0,5,A,5,14,3,9,11.11,6.68,1.27


This looks fine. We'll add `df =` to the very start of that to create our new dataframe

In [19]:
df = (df[cols]
 .rename(columns={"Div":"competition","HomeTeam":"home","AwayTeam":"away",
                  "FTHG":"home_goals","FTAG":"away_goals","FTR":"match_result",
                  "HS":"home_shots","AS":"away_shots","HST":"home_sot","AST":"away_sot",
                  "PSCH":"home_odds","PSCD":"draw_odds","PSCA":"away_odds"})
     )

In [20]:
df.head()

Unnamed: 0,competition,game_date,home,away,home_goals,away_goals,match_result,home_shots,away_shots,home_sot,away_sot,home_odds,draw_odds,away_odds
369,E0,2020-07-22 20:15:00,Liverpool,Chelsea,5,3,H,10,10,7,5,1.93,3.8,4.09
368,E0,2020-07-22 18:00:00,Man United,West Ham,1,1,D,11,12,4,3,1.27,6.55,10.55
367,E0,2020-07-21 20:15:00,Aston Villa,Arsenal,1,0,H,8,7,3,0,3.14,3.86,2.21
366,E0,2020-07-21 18:00:00,Watford,Man City,0,4,A,2,26,0,10,9.2,6.37,1.3
365,E0,2020-07-20 20:15:00,Wolves,Crystal Palace,2,0,H,11,7,5,3,1.46,4.31,8.7



<div class="back_to_top" style="float:right;">

<a href="#contents" style="color:red;">[Back to Top &uarr;]</a>

</div>





<a id="rename-vals"></a>

### Renaming Values

While in our dataset above it's fine to show result as either H, D, or A, we'll see how we can use `replace` to change values in columns. This will be useful when working with more than one competition and the need to replace the `competition` column's codes with the competition name becomes important. Note that we do not need to overwrite the column itself, but can do so in a new column with a different name. First we will create a dictionary of the existing values and what we would like them to be. We can check the existing values by using `unique()` on our column (also known as a **Series**)

In [21]:
df.match_result.unique()

array(['H', 'D', 'A'], dtype=object)

In [22]:
result_dict = {
    "H":"home",
    "D":"draw",
    "A":"away"
}

In [23]:
df["temp_result"] = df.match_result.replace(result_dict)
df.head()

Unnamed: 0,competition,game_date,home,away,home_goals,away_goals,match_result,home_shots,away_shots,home_sot,away_sot,home_odds,draw_odds,away_odds,temp_result
369,E0,2020-07-22 20:15:00,Liverpool,Chelsea,5,3,H,10,10,7,5,1.93,3.8,4.09,home
368,E0,2020-07-22 18:00:00,Man United,West Ham,1,1,D,11,12,4,3,1.27,6.55,10.55,draw
367,E0,2020-07-21 20:15:00,Aston Villa,Arsenal,1,0,H,8,7,3,0,3.14,3.86,2.21,home
366,E0,2020-07-21 18:00:00,Watford,Man City,0,4,A,2,26,0,10,9.2,6.37,1.3,away
365,E0,2020-07-20 20:15:00,Wolves,Crystal Palace,2,0,H,11,7,5,3,1.46,4.31,8.7,home


If we scroll to the end of our dataframe we see temp_result has changed `H`,`D`,`A` to `home`,`draw`,`away`. We can check this by looking at the unique values in temp_result:

In [24]:
df.temp_result.unique()

array(['home', 'draw', 'away'], dtype=object)

I'd prefer if these were used instead of the match_result column, so I will delete the temp_result and run the above again to overwrite our match_result column

In [25]:
del df['temp_result']

In [26]:
df['match_result'] = df.match_result.replace(result_dict)
df

Unnamed: 0,competition,game_date,home,away,home_goals,away_goals,match_result,home_shots,away_shots,home_sot,away_sot,home_odds,draw_odds,away_odds
369,E0,2020-07-22 20:15:00,Liverpool,Chelsea,5,3,home,10,10,7,5,1.93,3.80,4.09
368,E0,2020-07-22 18:00:00,Man United,West Ham,1,1,draw,11,12,4,3,1.27,6.55,10.55
367,E0,2020-07-21 20:15:00,Aston Villa,Arsenal,1,0,home,8,7,3,0,3.14,3.86,2.21
366,E0,2020-07-21 18:00:00,Watford,Man City,0,4,away,2,26,0,10,9.20,6.37,1.30
365,E0,2020-07-20 20:15:00,Wolves,Crystal Palace,2,0,home,11,7,5,3,1.46,4.31,8.70
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2,E0,2019-08-10 15:00:00,Bournemouth,Sheffield United,1,1,draw,13,8,3,3,1.98,3.67,4.06
5,E0,2019-08-10 15:00:00,Watford,Brighton,0,3,away,11,5,3,3,2.05,3.38,4.12
3,E0,2019-08-10 15:00:00,Burnley,Southampton,3,0,home,10,11,4,3,2.71,3.19,2.90
1,E0,2019-08-10 12:30:00,West Ham,Man City,0,5,away,5,14,3,9,11.11,6.68,1.27



<div class="back_to_top" style="float:right;">

<a href="#contents" style="color:red;">[Back to Top &uarr;]</a>

</div>






<a id="new-cols-existing-data"></a>

### Creating new columns from existing data

We can create some match totals for each game a couple of different ways. The quickest is to make use of `axis`. Here we will create columns to show the total goals, shots, and shots on target in each match.

#### Total Goals

In [27]:
cols = ['home_goals','away_goals']
df['total_goals'] = df[cols].sum(axis=1)
df.head()

Unnamed: 0,competition,game_date,home,away,home_goals,away_goals,match_result,home_shots,away_shots,home_sot,away_sot,home_odds,draw_odds,away_odds,total_goals
369,E0,2020-07-22 20:15:00,Liverpool,Chelsea,5,3,home,10,10,7,5,1.93,3.8,4.09,8
368,E0,2020-07-22 18:00:00,Man United,West Ham,1,1,draw,11,12,4,3,1.27,6.55,10.55,2
367,E0,2020-07-21 20:15:00,Aston Villa,Arsenal,1,0,home,8,7,3,0,3.14,3.86,2.21,1
366,E0,2020-07-21 18:00:00,Watford,Man City,0,4,away,2,26,0,10,9.2,6.37,1.3,4
365,E0,2020-07-20 20:15:00,Wolves,Crystal Palace,2,0,home,11,7,5,3,1.46,4.31,8.7,2


We set **axis=1** so it sums along our rows and not our columns. If we set axis=0, we would get the sum of all home and away goals for this season:

In [28]:
df[cols].sum(axis=0)

home_goals    558
away_goals    443
dtype: int64

We can do the same for our other metrics now too.

In [29]:
cols = ['home_shots','away_shots']
df['total_shots'] = df[cols].sum(axis=1)
df.head()

Unnamed: 0,competition,game_date,home,away,home_goals,away_goals,match_result,home_shots,away_shots,home_sot,away_sot,home_odds,draw_odds,away_odds,total_goals,total_shots
369,E0,2020-07-22 20:15:00,Liverpool,Chelsea,5,3,home,10,10,7,5,1.93,3.8,4.09,8,20
368,E0,2020-07-22 18:00:00,Man United,West Ham,1,1,draw,11,12,4,3,1.27,6.55,10.55,2,23
367,E0,2020-07-21 20:15:00,Aston Villa,Arsenal,1,0,home,8,7,3,0,3.14,3.86,2.21,1,15
366,E0,2020-07-21 18:00:00,Watford,Man City,0,4,away,2,26,0,10,9.2,6.37,1.3,4,28
365,E0,2020-07-20 20:15:00,Wolves,Crystal Palace,2,0,home,11,7,5,3,1.46,4.31,8.7,2,18


In [30]:
cols = ['home_sot','away_sot']
df['total_sot'] = df[cols].sum(axis=1)
df.head()

Unnamed: 0,competition,game_date,home,away,home_goals,away_goals,match_result,home_shots,away_shots,home_sot,away_sot,home_odds,draw_odds,away_odds,total_goals,total_shots,total_sot
369,E0,2020-07-22 20:15:00,Liverpool,Chelsea,5,3,home,10,10,7,5,1.93,3.8,4.09,8,20,12
368,E0,2020-07-22 18:00:00,Man United,West Ham,1,1,draw,11,12,4,3,1.27,6.55,10.55,2,23,7
367,E0,2020-07-21 20:15:00,Aston Villa,Arsenal,1,0,home,8,7,3,0,3.14,3.86,2.21,1,15,3
366,E0,2020-07-21 18:00:00,Watford,Man City,0,4,away,2,26,0,10,9.2,6.37,1.3,4,28,10
365,E0,2020-07-20 20:15:00,Wolves,Crystal Palace,2,0,home,11,7,5,3,1.46,4.31,8.7,2,18,8


### Other than sum we could use mean or median, which we'll talk about further in the next walkthrough when we explore filtering and grouping data. 


This was part one of ??. If there is interested I'm happy to continue. Any questions or feedback, feel free to reachout to me [@petermckeever](https://twitter.com/petermckeever)


<div class="back_to_top" style="float:right;">

<a href="#contents" style="color:red;">[Back to Top &uarr;]</a>

</div>