# Groupby: Candy!
Use this method when you want to collapse a dataframe by a categorical column and run an aggregation. For example, if you had a list of candy consumed and wanted to look at the total caleries by brand or type. Let's look.

#### Load python tools

In [4]:
import pandas as pd

#### Read our candy dataset

In [5]:
# CSV from here: https://docs.google.com/spreadsheets/d/1aLw0zKcOQD7d16kmrjc6nFb_jdXmoEhSTJFzO2i5kPM/edit?usp=sharing

In [6]:
url = "https://docs.google.com/spreadsheets/d/e/2PACX-1vQ5SEEqfwggc80EtHHb4s5eYd06ulLjJbOtNcjcGvktph-\
jCp9d8llRsUNs8BB9hM6Ze4IZ25AC9fRe/pub?gid=0&single=true&output=csv"

#### Read the data

In [12]:
df = pd.read_csv(url)

#### First five rows

In [13]:
df.head()

Unnamed: 0,brand,colors,nuts,calories_per_serving,fat_per_serving,description,satisfacton_score,owner,group,id,weight
0,Hershey's,Brown,False,60,3.5,Milk Chocolate,6,The Hershey Company,A,1,13.0
1,Hershey's,Brown,False,60,3.5,Milk Chocolate,5,The Hershey Company,A,2,13.0
2,Hershey's,Brown,False,60,3.5,Milk Chocolate,4,The Hershey Company,A,3,13.0
3,Hershey's,Brown,False,60,3.5,Milk Chocolate,5,The Hershey Company,A,4,13.0
4,Hershey's,Brown,False,60,3.5,Milk Chocolate,10,The Hershey Company,A,5,13.0


#### Last five rows

In [14]:
df.tail()

Unnamed: 0,brand,colors,nuts,calories_per_serving,fat_per_serving,description,satisfacton_score,owner,group,id,weight
55,3 Musketeers,Silver,False,26,0.7,Candy Bar,2,Mars Wrigley,B,56,11.3
56,3 Musketeers,Silver,False,26,0.7,Candy Bar,7,Mars Wrigley,B,57,11.3
57,3 Musketeers,Silver,False,26,0.7,Candy Bar,9,Mars Wrigley,B,58,11.3
58,3 Musketeers,Silver,False,26,0.7,Candy Bar,7,Mars Wrigley,B,59,11.3
59,3 Musketeers,Silver,False,26,0.7,Candy Bar,10,Mars Wrigley,B,60,11.3


#### How many records? 

In [15]:
len(df)

60

#### How many calories we talking? 

In [17]:
df['calories_per_serving'].sum()

3416

In [18]:
df['fat_per_serving'].sum()

172.19999999999993

#### Just those with peanuts

In [20]:
len(df[df['nuts'] == True])

26

In [21]:
len(df[df['nuts'] == False])

34

#### Just those with a user rating above 5

In [22]:
len(df[df['satisfacton_score'] > 5])

29

In [24]:
df[df['satisfacton_score'] > 9]

Unnamed: 0,brand,colors,nuts,calories_per_serving,fat_per_serving,description,satisfacton_score,owner,group,id,weight
4,Hershey's,Brown,False,60,3.5,Milk Chocolate,10,The Hershey Company,A,5,13.0
5,Hershey's,Brown,False,60,3.5,Milk Chocolate,10,The Hershey Company,A,6,13.0
9,KitKat,Red,False,70,3.5,Wafer Bar,10,The Hershey Company,A,10,14.0
32,Snickers,Brown,True,43,2.0,Candy Bar,10,Mars Wrigley,B,33,9.0
49,Txix,Gold,False,50,2.33,Cookie Bar,10,Mars Wrigley,B,50,10.0
54,3 Musketeers,Silver,False,26,0.7,Candy Bar,10,Mars Wrigley,B,55,11.3
59,3 Musketeers,Silver,False,26,0.7,Candy Bar,10,Mars Wrigley,B,60,11.3


---

#### How many of each brand? 

In [25]:
df.groupby(['brand'])['id'].count()

brand
3 Musketeers     7
Almond Joy       7
Hershey's        9
KitKat           8
Reese's          6
Snickers        13
Txix            10
Name: id, dtype: int64

In [27]:
df.groupby(['brand']).agg({'id': 'count', 'satisfacton_score':'mean'}).reset_index()

Unnamed: 0,brand,id,satisfacton_score
0,3 Musketeers,7,7.285714
1,Almond Joy,7,5.142857
2,Hershey's,9,5.888889
3,KitKat,8,5.75
4,Reese's,6,4.833333
5,Snickers,13,5.153846
6,Txix,10,5.2


#### How many of each color? 

In [29]:
df.groupby(['colors'])['id'].count()

colors
Blue       7
Brown     22
Gold      10
Orange     6
Red        8
Silver     7
Name: id, dtype: int64

#### What's the average user rating for each brand? 

In [30]:
df.groupby(['brand'])['satisfacton_score'].mean()

brand
3 Musketeers    7.285714
Almond Joy      5.142857
Hershey's       5.888889
KitKat          5.750000
Reese's         4.833333
Snickers        5.153846
Txix            5.200000
Name: satisfacton_score, dtype: float64

#### What's the average rating for products related to peanuts? 

#### What's the average rating for products related to color? 

In [None]:
df.groupby()

---

#### What's the average calories, by type? 

#### Fat?

#### What if you ate all the Snickers. How many calories? 

#### Which owner has the highest average user score? 

#### What other questions? 