### Group By

---

Allows to group data based on the others. Furthermore, it perform the analysis of the DataFrame based on one of the columns.

In [1]:
import pandas as pd
import numpy as np

In [2]:
df_books = pd.read_csv('/content/bestsellers-with-categories.csv', sep=',', header=0)
df_books

Unnamed: 0,Name,Author,User Rating,Reviews,Price,Year,Genre
0,10-Day Green Smoothie Cleanse,JJ Smith,4.7,17350,8,2016,Non Fiction
1,11/22/63: A Novel,Stephen King,4.6,2052,22,2011,Fiction
2,12 Rules for Life: An Antidote to Chaos,Jordan B. Peterson,4.7,18979,15,2018,Non Fiction
3,1984 (Signet Classics),George Orwell,4.7,21424,6,2017,Fiction
4,"5,000 Awesome Facts (About Everything!) (Natio...",National Geographic Kids,4.8,7665,12,2019,Non Fiction
...,...,...,...,...,...,...,...
545,Wrecking Ball (Diary of a Wimpy Kid Book 14),Jeff Kinney,4.9,9413,8,2019,Fiction
546,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2016,Non Fiction
547,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2017,Non Fiction
548,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2018,Non Fiction


### Let's get it started by grouping by Author and show the count of the data in the other columns.

In [3]:
df_books.groupby('Author').count()

Unnamed: 0_level_0,Name,User Rating,Reviews,Price,Year,Genre
Author,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Abraham Verghese,2,2,2,2,2,2
Adam Gasiewski,1,1,1,1,1,1
Adam Mansbach,1,1,1,1,1,1
Adir Levy,1,1,1,1,1,1
Admiral William H. McRaven,1,1,1,1,1,1
...,...,...,...,...,...,...
Walter Isaacson,3,3,3,3,3,3
William Davis,2,2,2,2,2,2
William P. Young,2,2,2,2,2,2
Wizards RPG Team,3,3,3,3,3,3


### Group by Author and show the average of the data in the other columns

In [4]:
df_books.groupby('Author').mean()

Unnamed: 0_level_0,User Rating,Reviews,Price,Year
Author,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Abraham Verghese,4.600000,4866.000000,11.000000,2010.500000
Adam Gasiewski,4.400000,3113.000000,6.000000,2017.000000
Adam Mansbach,4.800000,9568.000000,9.000000,2011.000000
Adir Levy,4.800000,8170.000000,13.000000,2019.000000
Admiral William H. McRaven,4.700000,10199.000000,11.000000,2017.000000
...,...,...,...,...
Walter Isaacson,4.566667,6222.666667,20.333333,2013.333333
William Davis,4.400000,7497.000000,6.000000,2012.500000
William P. Young,4.600000,19720.000000,8.000000,2013.000000
Wizards RPG Team,4.800000,16990.000000,27.000000,2018.000000


### The Author column, in the above cases, becomes the index.

---
*   We can use loc and access a specific piece of data from the DataFrame
*   Group by author and show the sum of the values of the other columns for William Davis.

In [5]:
df_books.groupby('Author').sum().loc['William Davis']

User Rating        8.8
Reviews        14994.0
Price             12.0
Year            4025.0
Name: William Davis, dtype: float64

### Group by author and show the sum of the values in the other columns. Place the indices that the DataFrame brings by default

In [6]:
df_books.groupby('Author').sum().reset_index()

Unnamed: 0,Author,User Rating,Reviews,Price,Year
0,Abraham Verghese,9.2,9732,22,4021
1,Adam Gasiewski,4.4,3113,6,2017
2,Adam Mansbach,4.8,9568,9,2011
3,Adir Levy,4.8,8170,13,2019
4,Admiral William H. McRaven,4.7,10199,11,2017
...,...,...,...,...,...
243,Walter Isaacson,13.7,18668,61,6040
244,William Davis,8.8,14994,12,4025
245,William P. Young,9.2,39440,16,4026
246,Wizards RPG Team,14.4,50970,81,6054


### The agg() function allows various functions to be applied to the DataFrame once it has been grouped according to a specific column. Group by Author and show the min and max of the other columns

In [9]:
df_books.groupby('Author').agg(['min','max'])

Unnamed: 0_level_0,Name,Name,User Rating,User Rating,Reviews,Reviews,Price,Price,Year,Year,Genre,Genre
Unnamed: 0_level_1,min,max,min,max,min,max,min,max,min,max,min,max
Author,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2
Abraham Verghese,Cutting for Stone,Cutting for Stone,4.6,4.6,4866,4866,11,11,2010,2011,Fiction,Fiction
Adam Gasiewski,Milk and Vine: Inspirational Quotes From Class...,Milk and Vine: Inspirational Quotes From Class...,4.4,4.4,3113,3113,6,6,2017,2017,Non Fiction,Non Fiction
Adam Mansbach,Go the F**k to Sleep,Go the F**k to Sleep,4.8,4.8,9568,9568,9,9,2011,2011,Fiction,Fiction
Adir Levy,What Should Danny Do? (The Power to Choose Ser...,What Should Danny Do? (The Power to Choose Ser...,4.8,4.8,8170,8170,13,13,2019,2019,Fiction,Fiction
Admiral William H. McRaven,Make Your Bed: Little Things That Can Change Y...,Make Your Bed: Little Things That Can Change Y...,4.7,4.7,10199,10199,11,11,2017,2017,Non Fiction,Non Fiction
...,...,...,...,...,...,...,...,...,...,...,...,...
Walter Isaacson,Leonardo da Vinci,Steve Jobs,4.5,4.6,3014,7827,20,21,2011,2017,Non Fiction,Non Fiction
William Davis,"Wheat Belly: Lose the Wheat, Lose the Weight, ...","Wheat Belly: Lose the Wheat, Lose the Weight, ...",4.4,4.4,7497,7497,6,6,2012,2013,Non Fiction,Non Fiction
William P. Young,The Shack: Where Tragedy Confronts Eternity,The Shack: Where Tragedy Confronts Eternity,4.6,4.6,19720,19720,8,8,2009,2017,Fiction,Fiction
Wizards RPG Team,Player's Handbook (Dungeons & Dragons),Player's Handbook (Dungeons & Dragons),4.8,4.8,16990,16990,27,27,2017,2019,Fiction,Fiction


### Group by Author, get the min and max of the 'Reviews' column and sum the values of the 'User Rating' column

In [10]:
df_books.groupby('Author').agg({'Reviews':['min','max'], 'User Rating':'sum'})

Unnamed: 0_level_0,Reviews,Reviews,User Rating
Unnamed: 0_level_1,min,max,sum
Author,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
Abraham Verghese,4866,4866,9.2
Adam Gasiewski,3113,3113,4.4
Adam Mansbach,9568,9568,4.8
Adir Levy,8170,8170,4.8
Admiral William H. McRaven,10199,10199,4.7
...,...,...,...
Walter Isaacson,3014,7827,13.7
William Davis,7497,7497,8.8
William P. Young,19720,19720,9.2
Wizards RPG Team,16990,16990,14.4


### Group by 'Author - Year' and count the values of the other columns

In [11]:
df_books.groupby(['Author','Year']).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,Name,User Rating,Reviews,Price,Genre
Author,Year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Abraham Verghese,2010,1,1,1,1,1
Abraham Verghese,2011,1,1,1,1,1
Adam Gasiewski,2017,1,1,1,1,1
Adam Mansbach,2011,1,1,1,1,1
Adir Levy,2019,1,1,1,1,1
...,...,...,...,...,...,...
Wizards RPG Team,2017,1,1,1,1,1
Wizards RPG Team,2018,1,1,1,1,1
Wizards RPG Team,2019,1,1,1,1,1
Zhi Gang Sha,2009,1,1,1,1,1
