Group By Aggregation in SQL
-----

In [None]:
%load_ext sql
%sql sqlite:///world-db

Let's see how to use `GROUP BY`. The following SQL query computes the number of countries in every continent.

In [None]:
%%sql
SELECT Continent, COUNT(*) 
FROM Country 
GROUP BY Continent;

We can combine `GROUP BY` with `ORDER BY` as well. The following SQL query finds out how many countries speak each language with percentage > 50% in decreasing order.

In [None]:
%%sql
SELECT Language, COUNT(CountryCode) AS N
FROM CountryLanguage
WHERE Percentage >= 50
GROUP BY Language
ORDER BY N DESC ;

The `HAVING` clause allows us to express conditions over properties of *groups*, and not only individual tuples. `HAVING` always follows a `GROUP BY`. As an example, the following SQL query finds out the languages that are spoken in at least 3 different countries with percentage at least 50.

In [None]:
%%sql
SELECT Language
FROM CountryLanguage
WHERE Percentage >= 50 
GROUP BY Language
HAVING COUNT(CountryCode) > 2;

**Exercise #1**: Write a query that outputs for each country the population of the most populated city, for countries with at least 10 cities.

Let's see how the use of `HAVING` compares with the use of correlated queries. Suppose that we want to find the names of the countries that have more than 10 cities with population at least 1 million. Here is a nested query that computes that: 

In [None]:
%%sql
SELECT C.name
FROM Country C
WHERE (SELECT COUNT(*) 
       FROM City
       WHERE City.CountryCode = C.Code
       AND City.Population >= 1000000) > 10; 

We can measure the execution time of the query using `%time`.

In [None]:
%time %sql SELECT C.name FROM Country C WHERE (SELECT COUNT(*) FROM City WHERE City.CountryCode=C.Code AND City.Population >= 1000000) > 10; 

**Exercise #2**: Write the above query using `HAVING` and time its execution. How much faster does it run?