# Groupby: Candy!
Use this method when you want to collapse a dataframe by a categorical column and run an aggregation. For example, if you had a list of candy consumed and wanted to look at the total caleries by brand or type. Let's look.

#### Load python tools

In [59]:
import pandas as pd

#### Read our candy dataset

In [60]:
# CSV from here: https://docs.google.com/spreadsheets/d/1aLw0zKcOQD7d16kmrjc6nFb_jdXmoEhSTJFzO2i5kPM/edit?usp=sharing

In [61]:
url = "https://docs.google.com/spreadsheets/d/e/2PACX-1vQ5SEEqfwggc80EtHHb4s5eYd06ulLjJbOtNcjcGvktph-\
jCp9d8llRsUNs8BB9hM6Ze4IZ25AC9fRe/pub?gid=0&single=true&output=csv"

#### Read the data

In [62]:
candy_df = pd.read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vQ5SEEqfwggc80EtHHb4s5eYd06ulLjJbOtNcjcGvktph-\
jCp9d8llRsUNs8BB9hM6Ze4IZ25AC9fRe/pub?gid=0&single=true&output=csv")

#### First five rows

In [63]:
candy_df.head()

Unnamed: 0,brand,colors,nuts,calories_per_serving,fat_per_serving,description,satisfacton_score,owner,group,id,weight
0,Hershey's,Brown,False,60,3.5,Milk Chocolate,3,The Hershey Company,A,1,13.0
1,Hershey's,Brown,False,60,3.5,Milk Chocolate,9,The Hershey Company,A,2,13.0
2,Hershey's,Brown,False,60,3.5,Milk Chocolate,9,The Hershey Company,A,3,13.0
3,Hershey's,Brown,False,60,3.5,Milk Chocolate,5,The Hershey Company,A,4,13.0
4,Hershey's,Brown,False,60,3.5,Milk Chocolate,7,The Hershey Company,A,5,13.0


#### Last five rows

In [64]:
candy_df.tail()

Unnamed: 0,brand,colors,nuts,calories_per_serving,fat_per_serving,description,satisfacton_score,owner,group,id,weight
55,3 Musketeers,Silver,False,26,0.7,Candy Bar,8,Mars Wrigley,B,56,11.3
56,3 Musketeers,Silver,False,26,0.7,Candy Bar,4,Mars Wrigley,B,57,11.3
57,3 Musketeers,Silver,False,26,0.7,Candy Bar,8,Mars Wrigley,B,58,11.3
58,3 Musketeers,Silver,False,26,0.7,Candy Bar,5,Mars Wrigley,B,59,11.3
59,3 Musketeers,Silver,False,26,0.7,Candy Bar,10,Mars Wrigley,B,60,11.3


#### How many records? 

In [65]:
print(f"The total number of candy pieces are {len(candy_df)}.")

The total number of candy pieces are 60.


#### How many calories we talking? 

In [66]:
print(f"The total number of calories you would consume if you ate all 60 pieces of candy are {candy_df['calories_per_serving'].sum()}.")

The total number of calories you would consume if you ate all 60 pieces of candy are 3416.


In [67]:
print(f"The total number of calories you would consume if you ate all 60 pieces of candy are {candy_df['calories_per_serving'].sum()}.")
candy_df['fat_per_serving'].sum()

The total number of calories you would consume if you ate all 60 pieces of candy are 3416.


172.19999999999993

#### Just those with peanuts

In [68]:
candy_df[candy_df['nuts']==True]

Unnamed: 0,brand,colors,nuts,calories_per_serving,fat_per_serving,description,satisfacton_score,owner,group,id,weight
17,Almond Joy,Blue,True,85,4.5,Candy Bar,2,The Hershey Company,A,18,17.0
18,Almond Joy,Blue,True,85,4.5,Candy Bar,1,The Hershey Company,A,19,17.0
19,Almond Joy,Blue,True,85,4.5,Candy Bar,8,The Hershey Company,A,20,17.0
20,Almond Joy,Blue,True,85,4.5,Candy Bar,6,The Hershey Company,A,21,17.0
21,Almond Joy,Blue,True,85,4.5,Candy Bar,2,The Hershey Company,A,22,17.0
22,Almond Joy,Blue,True,85,4.5,Candy Bar,7,The Hershey Company,A,23,17.0
23,Almond Joy,Blue,True,85,4.5,Candy Bar,6,The Hershey Company,A,24,17.0
24,Reese's,Orange,True,80,4.5,Peanut Butter Cup,4,The Hershey Company,A,25,15.5
25,Reese's,Orange,True,80,4.5,Peanut Butter Cup,1,The Hershey Company,A,26,15.5
26,Reese's,Orange,True,80,4.5,Peanut Butter Cup,2,The Hershey Company,A,27,15.5


#### Just those with a user rating above 5

In [69]:
candy_df[candy_df['satisfacton_score'] > 9]

Unnamed: 0,brand,colors,nuts,calories_per_serving,fat_per_serving,description,satisfacton_score,owner,group,id,weight
27,Reese's,Orange,True,80,4.5,Peanut Butter Cup,10,The Hershey Company,A,28,15.5
50,Txix,Gold,False,50,2.33,Cookie Bar,10,Mars Wrigley,B,51,10.0
59,3 Musketeers,Silver,False,26,0.7,Candy Bar,10,Mars Wrigley,B,60,11.3


---

#### How many of each brand? 

In [70]:
candy_df.groupby(['brand'])['id'].count()

brand
3 Musketeers     7
Almond Joy       7
Hershey's        9
KitKat           8
Reese's          6
Snickers        13
Txix            10
Name: id, dtype: int64

In [71]:
candy_df.groupby(['brand']).agg({'id':'count','satisfacton_score':'mean'}).reset_index()

Unnamed: 0,brand,id,satisfacton_score
0,3 Musketeers,7,5.714286
1,Almond Joy,7,4.571429
2,Hershey's,9,4.888889
3,KitKat,8,5.375
4,Reese's,6,4.666667
5,Snickers,13,5.923077
6,Txix,10,4.8


#### How many of each color? 

In [72]:
candy_df.groupby(['colors'])['id'].count()

colors
Blue       7
Brown     22
Gold      10
Orange     6
Red        8
Silver     7
Name: id, dtype: int64

#### What's the average user rating for each brand? 

In [58]:
candy_df.groupby(['brand'])['satisfacton_score'].mean()

brand
3 Musketeers    6.285714
Almond Joy      6.285714
Hershey's       6.333333
KitKat          5.875000
Reese's         7.000000
Snickers        4.615385
Txix            4.200000
Name: satisfacton_score, dtype: float64

#### What's the average rating for products related to peanuts? 

#### Nuts vs No Nuts

In [74]:
candy_df.groupby(['nuts']).agg({'weight':'mean', 'id':'count'}).reset_index()

Unnamed: 0,nuts,weight,id
0,False,12.002941,34
1,True,12.653846,26


In [75]:
candy_df.groupby(['nuts']).agg({'weight':'sum'}).reset_index()

Unnamed: 0,nuts,weight
0,False,408.1
1,True,329.0


#### What's the average rating for products related to color? 

---

#### What's the average calories, by type? 

#### Fat?

#### What if you ate all the Snickers. How many calories? 

#### Which owner has the highest average user score? 

#### What other questions? 