# Grouping Data
---

Sometimes we will want to group particular parts of our data set for analysis. We can use Pandas `groupby` method.

In [1]:
import pandas as pd

# data is in a dictionary. each entry will be a column
# the first part of the entry is the column name, the second the values

data = {
  'Opponent': ['Atletico Jave', 'Newtown FC', 'Buton Town', 'Fentborough Dynamo'],
  'Location': ['Home', 'Away', 'Away', 'Home'],
  'GoalsFor': [2, 4, 3, 0],
  'GoalsAgainst': [4, 0, 2, 2]
}

Matches = pd.DataFrame(data)
Matches

Unnamed: 0,Opponent,Location,GoalsFor,GoalsAgainst
0,Atletico Jave,Home,2,4
1,Newtown FC,Away,4,0
2,Buton Town,Away,3,2
3,Fentborough Dynamo,Home,0,2


An obvious way to group this data is by home and away matches.

Use the `.groupby()` method, and provide a column that we want to group by as an argument. In this case, `location`. We'll then against that to a variable, and then call `.mean()` to find the average.

In [5]:
HAMatches = Matches.groupby('Location')
HAMatches.mean(numeric_only=True)

Unnamed: 0_level_0,GoalsFor,GoalsAgainst
Location,Unnamed: 1_level_1,Unnamed: 2_level_1
Away,3.5,1.0
Home,1.0,3.0


Make sure you specify `numeric_only=True`

In [6]:
# Describes the dataset for each variable within
Matches.groupby('Location').describe()

Unnamed: 0_level_0,GoalsFor,GoalsFor,GoalsFor,GoalsFor,GoalsFor,GoalsFor,GoalsFor,GoalsFor,GoalsAgainst,GoalsAgainst,GoalsAgainst,GoalsAgainst,GoalsAgainst,GoalsAgainst,GoalsAgainst,GoalsAgainst
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max
Location,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2
Away,2.0,3.5,0.707107,3.0,3.25,3.5,3.75,4.0,2.0,1.0,1.414214,0.0,0.5,1.0,1.5,2.0
Home,2.0,1.0,1.414214,0.0,0.5,1.0,1.5,2.0,2.0,3.0,1.414214,2.0,2.5,3.0,3.5,4.0


In [7]:
# use transpose to flip it on its side
# in this next one, just select away data

Matches.groupby('Location').describe().transpose()['Away']

GoalsFor      count    2.000000
              mean     3.500000
              std      0.707107
              min      3.000000
              25%      3.250000
              50%      3.500000
              75%      3.750000
              max      4.000000
GoalsAgainst  count    2.000000
              mean     1.000000
              std      1.414214
              min      0.000000
              25%      0.500000
              50%      1.000000
              75%      1.500000
              max      2.000000
Name: Away, dtype: float64