# SQL Tutorial 2: Aggregation

-   Reconnect to the penguins database and remind ourselves of the table structure

In [1]:
%load_ext sql
%config SqlMagic.autolimit = 0
%config SqlMagic.displaylimit = 0
%sql sqlite:///data/penguins.db

In [2]:
%%sql
select *
from penguins
limit 5;

species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex
Adelie,Torgersen,39.1,18.7,181.0,3750.0,MALE
Adelie,Torgersen,39.5,17.4,186.0,3800.0,FEMALE
Adelie,Torgersen,40.3,18.0,195.0,3250.0,FEMALE
Adelie,Torgersen,,,,,
Adelie,Torgersen,36.7,19.3,193.0,3450.0,FEMALE


-   How heavy are all the penguins?

In [3]:
%%sql
select sum(body_mass_g)
from penguins;

sum(body_mass_g)
1437000


-   Please rename columns when doing calculations

In [4]:
%%sql
select sum(body_mass_g) as total_mass_g
from penguins;

total_mass_g
1437000


> The answer *should* be `null` because some of the masses are missing,
> but SQL *aggregation functions* ignore nulls because that's what everyone wants

-   Calculate many aggregations at once

In [5]:
%%sql
select
    max(bill_length_mm) as longest_bill,
    min(flipper_length_mm) as shortest_flipper,
    avg(bill_length_mm) / avg(bill_depth_mm) as weird_ratio
from penguins;

longest_bill,shortest_flipper,weird_ratio
59.6,172,2.5608708253064427


-   Often want to calculate aggregations by group
-   Add a `group by` clause to the query

In [6]:
%%sql
select
    avg(body_mass_g) as average_mass_g
from penguins
group by sex;

average_mass_g
4005.555555555556
3862.272727272728
4545.684523809524


-   Why are there three results?
-   Which one is which?

In [7]:
%%sql
select
    sex,
    avg(body_mass_g) as average_mass_g
from penguins
group by sex;

sex,average_mass_g
,4005.555555555556
FEMALE,3862.272727272728
MALE,4545.684523809524


-   Put conditions on aggregations using `having`
    -   Some SQL dialects allow `where`, but `having` is clearer

In [8]:
%%sql
select
    sex,
    avg(body_mass_g) as average_mass_g
from penguins
group by sex
having average_mass_g > 4000.0;

sex,average_mass_g
,4005.555555555556
MALE,4545.684523809524


-   If the query doesn't specify an aggregation function, the database can choose what value to return

In [9]:
%%sql
select sex, island
from penguins
group by sex;

sex,island
,Torgersen
FEMALE,Torgersen
MALE,Torgersen


-   Use `count` to count values in a group

In [10]:
%%sql
select
    sex, sum(body_mass_g) / count(body_mass_g) as avg_mass_g
from penguins
group by sex;

sex,avg_mass_g
,4005
FEMALE,3862
MALE,4545


-   Very common to use `count(*)` since it doesn't matter which column's values we count

In [11]:
%%sql
select
    sex, sum(body_mass_g) / count(*) as avg_mass_g
from penguins
group by sex;

sex,avg_mass_g
,3277
FEMALE,3862
MALE,4545


-   And yes, you can group by multiple columns
    -   But grouping by integer or real fields doesn't usually make sense

In [12]:
%%sql
select
    sex, island, avg(body_mass_g)
from penguins
group by sex, island
order by island asc, sex asc;

sex,island,avg(body_mass_g)
,Biscoe,4587.5
FEMALE,Biscoe,4319.375
MALE,Biscoe,5104.518072289156
,Dream,2975.0
FEMALE,Dream,3446.311475409836
MALE,Dream,3987.096774193548
,Torgersen,3681.25
FEMALE,Torgersen,3395.833333333333
MALE,Torgersen,4034.782608695652
