Aggregation and Grouping

Goal: group the data imported from an external source and perform a calculation across the data, such as mean.

NOTE: it is important to convert to the correct datatype before performing the calculation.  This will boost performance of the algorithm

Exercise 6: Aggregation and grouping data

Import the required libraries:

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

Load the radionucleotide data:

In [5]:
df = pd.read_csv('radData.csv')

Group the DataFrame using the State column using the GroupBy method

In [8]:
df.groupby('State')

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x00000200B78E4A88>

Select the radionuclide Cs-134 and calculate the average value per group:

In [10]:
df.groupby('State')['Cs-134'].mean().head()

State
AK     0.0
AK     0.0
AL     0.3
AR     0.0
AZ     0.0
Name: Cs-134, dtype: float64

Do the same for all columns, grouping per state and applying directly the mean function:

In [11]:
df.groupby('State').mean().head()

Unnamed: 0_level_0,Ba-140,Co-60,Cs-134,Cs-136,Cs-137,I-131,I-132,I-133,Te-129,Te-129m,Te-132,Ba-140.1
State,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
AK,0.0,0.0,0.0,0.0,0.0,0.157143,0.0,0.0,0.0,0.0,0.0,
AK,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
AL,0.0,0.0,0.3,0.0,0.0,1.05,0.0,0.0,0.0,0.0,0.0,
AR,0.0,0.0,0.0,0.0,0.0,5.9,0.0,0.0,0.0,0.0,0.0,
AZ,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,


Now, group by more than one column, using a list of grouping columns.  Aggregate using several aggregation operations per column with the agg method. Use the State and Location columns:

In [15]:
df.groupby(["State","Location"]).agg({'Cs-134':['mean', 'std'], 'Te-129':['min', 'max']})

Unnamed: 0_level_0,Unnamed: 1_level_0,Cs-134,Cs-134,Te-129,Te-129
Unnamed: 0_level_1,Unnamed: 1_level_1,mean,std,min,max
State,Location,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
AK,Dutch Harbor,0.0,0.0,0.0,0.0
AK,Fairbanks,0.0,0.0,0.0,0.0
AK,Juneau,0.0,0.0,0.0,0.0
AK,Nome,0.0,0.0,0.0,0.0
AK,Juneau,0.0,,0.0,0.0
...,...,...,...,...,...
WA,Seattle,0.0,0.0,0.0,0.0
WA,Spokane,0.0,0.0,0.0,0.0
WA,Tacoma,0.0,0.0,0.0,0.0
WI,Madison,0.0,0.0,0.0,0.0


NumPy on Pandas

NumPy functions can be applied to DataFrames directly or through the apply and applymap methods. Other NumPy functions, such as np.where, also work with DataFrames.