# Basic Aggregation Functions Practice


    
## Implementation in queries

We will again be using the PostgreSQL database to query the data and see how the `Aggregation` Function works. 

Connect again by using the command:

In [None]:
%load_ext sql
%sql postgres://dsa_ro_user:readonly@pgsql.dsa.lan/dsa_ro


### COUNT

The main use for count in a system is to return the number of rows in a database table or table expression (result of join)

To do so you simply use a `COUNT(*)` as the column.

## <span style="background:yellow">Your Turn</span>

Find the number of cities in China 

The number you receive should be 61
  


In [None]:
%%sql
SELECT count(*)
FROM cities 
WHERE country = 'China'




Find the number of cities in Canada with a population greater than 2,000,000

The number you receive should be 1

In [None]:
%%sql
SELECT count(*) 
FROM cities 
WHERE country = 'Canada' 
AND population > 2000000;





### MIN

This function will allow you to return the minimum value of a given column in the database table.

## <span style="background-color:yellow">Your Turn</span>
What if we wanted to search for the minimum population of cities in Japan.

How would we write this?

The number you receive should be 1063100


In [None]:
%%sql 
SELECT MIN(population) 
FROM cities 
WHERE country = 'Japan';




### MAX

This function will allow you to return the maximum value of a given column in the database table.


## <span style="background:yellow">Your Turn</span>

What if we wanted to search for the maximum population of cities in Canada. 

How would we write this?

The number you receive should be 2600000

In [None]:
%%sql 
SELECT MAX(population) 
FROM cities 
WHERE country = 'Canada';





### AVG

This function will return the average value of a given column in the database table. 

## <span style="background:yellow">Your Turn</span>

What if we wanted to find the average population of cities that are in the United States.

How would you write this?

The number you receive should be 2385623.076923076923


In [None]:
%%sql 
SELECT AVG(population) 
FROM cities 
WHERE country = 'United States';





### SUM

This function will allow you to return the sum of multiple rows in the database table. 


## <span style="background:yellow">Your Turn</span>
What if we wanted the sum of all people living in Canada or Mexico?

The number you should receive is 30334100

In [None]:
%%sql
SELECT SUM(population) 
FROM cities 
WHERE country in ('Canada','Mexico');








# GROUP BY

`GROUP BY` groups all the records with the same value for the specified grouping field(s) together so that aggregation can process each set separately. 


Think of the **groups** as a set of rows from the table.

Each attribute that is in the SELECT column set and not used in an aggregate function must appear in the `GROUP BY` clause.

## <span style="background:yellow">Your Turn</span>

Write a SELECT statement to display each country's name and its average city population



In [None]:
%%sql
SELECT country, AVG(population) 
FROM cities 
GROUP BY country;





Write a SELECT statement to display each country's name and the population of the city with the highest population

In [None]:
%%sql

SELECT country, MAX(population) 
FROM cities 
GROUP BY country;




# HAVING Clause

This clause will allow the user to see data that has a certain aggregate function value, thereby only returning the sets that return true on the aggregate comparison.


In [None]:
%%sql 
SELECT country, count(*) 
FROM cities 
GROUP BY country 
HAVING count(*) > 10;

This simply means that if the country is used more than 10 times (count(country) > 10) then we will list it in the results of this query. 



## <span style="background:yellow">Your Turn</span>

Write a SELECT statement to show each country's name and its average population but only for countries whose largest city has less than 5,000,000 people


In [None]:
%%sql
SELECT country, AVG(population)
FROM cities 
GROUP BY country 
HAVING MAX(population) > 5000000





Write a SELECT statement to show each country's name and the population of its smallest city but only for countries with an average city population between 2 and 5 million people

In [None]:
%%sql
SELECT country, MIN(population)
FROM cities 
GROUP BY country 
HAVING AVG(population) BETWEEN 2000000 AND 5000000





# Combining JOIN and GROUPING for aggregates

As foreshadowed, the true power of the relational database comes from combining tables and computing statistics.

Consider the following database tables:
  * us_second_order_divisions
  * util_us_states

```SQL
dsa_ro=> \d us_second_order_divisions
        Table "public.us_second_order_divisions"
       Column       |          Type          | Modifiers 
--------------------+------------------------+-----------
 state_number_code  | smallint               | not null
 county_number_code | character varying(5)   | not null
 county_name        | character varying(100) | 
Indexes:
    "us_second_order_divisions_pkey" PRIMARY KEY, btree (state_number_code, county_number_code)

dsa_ro=> \d util_us_states
             Table "public.util_us_states"
      Column       |         Type          | Modifiers 
-------------------+-----------------------+-----------
 state_alpha_code  | character(2)          | not null
 state_number_code | smallint              | 
 state_name        | character varying(50) | 
Indexes:
    "util_us_states_pkey" PRIMARY KEY, btree (state_alpha_code)
    "util_us_states_state_number_code" btree (state_number_code)
```

Imagine we want a list of the state names and the number of counties per state. 
What would the SQL Look like?

We will build it up in pieces, to help you develop a methodology of query construction.

**First**: We see that counties are listed in the `us_second_order_divisions`.
We can go there for a count of the number of counties per state.

## <span style="background:yellow">Your Turn</span>
Write a SELECT statement that shows the state name and number of counties for states with less than 20 counties



In [None]:
%%sql
SELECT state_name, count(*)
FROM us_second_order_divisions as C
JOIN util_us_states as S
    ON (C.state_number_code=S.state_number_code)
GROUP BY S.state_name
HAVING COUNT(*) < 20;





Write a SELECT statement that shows the five state names with the fewest number of counties.  
Write the code intially in the EXPLAIN cell, then copy the SQL without EXPLAIN into the next cell.

In [None]:
%%sql
EXPLAIN
SELECT state_name, count(*)
FROM us_second_order_divisions as C
JOIN util_us_states as S
    ON (C.state_number_code=S.state_number_code)
GROUP BY S.state_name
ORDER BY COUNT(*) asc
LIMIT 5;





In [None]:
%%sql
SELECT state_name, count(*)
FROM us_second_order_divisions as C
JOIN util_us_states as S
    ON (C.state_number_code=S.state_number_code)
GROUP BY S.state_name
ORDER BY COUNT(*) asc
LIMIT 5;





# Save your Notebook, then `File > Close and Halt`

---