In [1]:
%load_ext sql
%sql sqlite://

'Connected: @None'

# Groups

An aggregate function takes all values of some expression (e.g. column name) from all the rows in a group. By default, all rows that are used to compute the final table (the rows that pass the filtering with `where` clause) are on the same group. Thus, the result when we apply an aggregate function only has one row. However, it's actually possible to define a `select` statement with multiple groups. 

## Grouping Rows

Rows in a table can be grouped, and aggregation is performed on each group. 

Recall the structure of a `select` statement from previous lectures,

<img src = 'expression.png' width = 600/>

With grouping, the structure looks like the following,

<img src = 'having.png' width = 600/>

The number of groups is the **number of unique values** of an expression.

Now recall the `animals` table from previous video,

In [2]:
%%sql
create table animals as
    select "dog" as kind, 4 as legs, 20 as weight union
    select "cat", 4, 10 union
    select "ferret", 4, 10 union
    select "parrot", 2, 6 union
    select "penguin", 2, 10 union
    select "t-rex", 2, 12000;

 * sqlite://
Done.


[]

In [4]:
%%sql
select * from animals;

 * sqlite://
Done.


kind,legs,weight
cat,4,10
dog,4,20
ferret,4,10
parrot,2,6
penguin,2,10
t-rex,2,12000


The following is an example of grouping the `animals` by unique `legs` values,

In [6]:
%%sql
select legs from animals group by legs;

 * sqlite://
Done.


legs
2
4


We can combine the `select` statement above so that it also finds the maximum weight for each legs group.

In [5]:
%%sql
select legs, max(weight) from animals group by legs;

 * sqlite://
Done.


legs,max(weight)
2,12000
4,20


And we can also count the number of rows for each legs group.

In [7]:
%%sql
select legs, count(*) from animals group by legs;

 * sqlite://
Done.


legs,count(*)
2,3
4,3


It is also possible to group by more than one column. 

In [8]:
%%sql
select legs, weight from animals group by legs, weight;

 * sqlite://
Done.


legs,weight
2,6
2,10
2,12000
4,10
4,20


Above, we have all the unique combinations of pairs of `legs` and `weight`. 

The `[expression]` that we use for `group by` can be a result of an operation as well.

In [10]:
%%sql
select max(kind), weight/legs from animals group by weight/legs;

 * sqlite://
Done.


max(kind),weight/legs
ferret,2
parrot,3
penguin,5
t-rex,6000


By default, SQL performs integer division rather than float division. 

As we can see, `cat` is not included because `cat` and `ferret` has the same ratio value, but `ferret` is greater alphabetically. The same goes for between `dog` and `penguin`.

## Selecting Groups

Sometimes when we group rows together, we want to filter the result as well. The `having` clause filters the set of groups that are aggregated. `Having` works like `where`. Below is an example of the same select statement as the previous, with an additional condition that we only want the result for ones that have more than one row.

In [11]:
%%sql
select weight/legs, count(*) from animals group by weight/legs having count(*)>1;

 * sqlite://
Done.


weight/legs,count(*)
2,2
5,2
