---
author: Krtin Juneja (KJUNEJA@falcon.bentley.edu)
---

The solution below uses an example dataset about the teeth of 10 guinea pigs at three Vitamin C dosage levels (in mg) with two delivery methods (orange juice vs. ascorbic acid).  (See how to quickly load some sample data.)

In [None]:
from rdatasets import data
df = data('ToothGrowth')

To obtain the descriptive statistics of the quantitative column ("len" for length of teeth) based on the treatment levels ("supp"), we can combine the `groupby` and `describe` functions.

In [None]:
df.groupby('supp')['len'].describe()

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
supp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
OJ,30.0,20.663333,6.605561,8.2,15.525,22.7,25.725,30.9
VC,30.0,16.963333,8.266029,4.2,11.2,16.5,23.1,33.9


To choose which statistics you want to see, you could use the `agg` function and list the statistics you want.

In [None]:
df.groupby('supp')['len'].agg(['min','median','mean','max','std','count'])

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
supp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
OJ,30.0,20.663333,6.605561,8.2,15.525,22.7,25.725,30.9
VC,30.0,16.963333,8.266029,4.2,11.2,16.5,23.1,33.9


If your focus is on just one statistics, you can often use its name in place of `agg`, as shown below, using the `quantile` function.

In [None]:
df.groupby('supp')['len'].quantile([0.25,0.5,0.75]) # Quartiles - default is median, i.e. 0.5

supp      
OJ    0.25    15.525
      0.50    22.700
      0.75    25.725
VC    0.25    11.200
      0.50    16.500
      0.75    23.100
Name: len, dtype: float64

In this example, we grouped by just one category ("supp"), but the `groupby` function accepts a list of columns if you need to create subcategories, etc.