# 4. Subqueries
**In this chapter, you'll learn how to use nested queries and you'll use what you’ve learned in this course to solve three challenge problems.**

In [1]:
%load_ext sql
%sql sqlite://

## Subqueries inside WHERE and SELECT clauses
This chapter is focused on embedding queries inside of queries. These are called nested queries and also known as subqueries as you saw in Chapter 3. The most common type of subquery is one inside of a `WHERE` statement. Let's check out another one of these now with a little bit of setting up to do first, of course.

### Subquery inside WHERE clause set-up
You've seen many examples of using a subquery inside a `WHERE` clause already with the semi-join and anti-join examples and exercises you just completed. With the `WHERE` clause being the most common place for a subquery to be found, it's important that you see just one more example of doing so. With this being the final chapter, it's time to unveil the remaining fields in the states table. 

Note that the `continent` field is not shown to display all the fields here. The `fert_rate` field gives an estimate for the average number of babies born per woman in each country. The `women_parli_perc` field gives the percentage of women in the elected federal parliament for each country. Across these 13 countries, how would you determine the average fertility rate?

### Average fert_rate
We will use the average fertility rate as part of a subquery. Recall how this is done. The average babies born to women across these countries is 2-point-28 children.

In [2]:
%sql sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite

In [3]:
%%sql
SELECT AVG(fert_rate)
FROM states;

   sqlite://
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


AVG(fert_rate)
2.2838461538461536


### Asian countries below average `fert_rate`
Let's use the previous slide's query as a subquery to determine Asian countries that fall under this average. You'll see the code in a couple steps. First we select the country name and the fertility rate for Asian countries.

Next, we want to choose records where `fert_rate` is smaller than What comes next?

The subquery is to get the average fertility rate! And now we can check out our result to make sure it makes sense.

It appears so. These are the two Asian countries we were looking for with fertility rates below 2-point-28 babies per woman.

In [4]:
%%sql
SELECT name, fert_rate
FROM states
WHERE continent = 'Asia'
    AND fert_rate <
        (SELECT AVG(fert_rate)
        FROM states);

   sqlite://
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


name,fert_rate
Brunei,1.96
Vietnam,1.7




### Subqueries inside SELECT clauses - setup
The second most common type of a subquery is inside of a SELECT clause. The task here is to count the number of countries listed in the states table for each continent in the `prime_ministers` table. Let's again take the stepwise approach to setting up the problem. What does this code do?

In [5]:
%%sql
SELECT DISTINCT continent
FROM prime_ministers;

   sqlite://
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


continent
Africa
Europe
Asia
North America
Oceania


It gives each of the five continents in the prime_ministers table. Let's keep building our answer in the next slide.

### Subquery inside SELECT clause - complete
Next is determining the counts of the number of countries in states for each of the continents in the last slide. Combining a `COUNT` clause with a `WHERE` statement matching the continent fields in the two tables gets us there. 

Let's check out the code and then discuss a bit further. The subquery involving states also can reference the prime_ministers table in the main query. Any time you do a subquery inside a `SELECT` statement like this, you need to give the subquery an alias like `countries_num` here. Please pause the video here and carefully review this code. The result of this query comes next.

In [6]:
%%sql
SELECT DISTINCT continent,
    (SELECT COUNT(*)
    FROM states
    WHERE prime_ministers.continent = states.continent) AS countries_num
FROM prime_ministers;

   sqlite://
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


continent,countries_num
Africa,2
Europe,3
Asia,4
North America,1
Oceania,1


It's kinda like magic that this works, huh?! If you haven't discovered it already, there are often many different ways to solve problems with SQL queries. You could use a carefully constructed `JOIN` to achieve this same result, for example.

## Subquery inside where
You'll now try to figure out which countries had high average life expectancies (at the country level) in 2015.

- Begin by calculating the average life expectancy across all countries for 2015.

In [7]:
%sql sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite

In [8]:
%%sql
SELECT AVG(life_expectancy)
FROM populations
WHERE year = 2015;

   sqlite://
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


AVG(life_expectancy)
71.67634158659767


- Recall that you can use SQL to do calculations for you. Suppose we wanted only records that were above `1.15 * 100` in terms of life expectancy for 2015:
```sql
SELECT *
  FROM populations
WHERE life_expectancy > 1.15 * 100
  AND year = 2015;
```
Select all fields from `populations` with records corresponding to larger than 1.15 times the average you calculated in the first task for 2015. In other words, change the `100` in the example above with a subquery.

In [9]:
%%sql
SELECT * FROM populations
WHERE life_expectancy >
  1.15 * 
  (SELECT AVG(life_expectancy)
  FROM populations
  WHERE year = 2015)
  AND year = 2015;

   sqlite://
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


pop_id,country_code,year,fertility_rate,life_expectancy,size
21,AUS,2015,1.833,82.4512195121951,23789752
376,CHE,2015,1.54,83.1975609756098,8281430
356,ESP,2015,1.32,83.3804878048781,46443994
134,FRA,2015,2.01,82.6707317073171,66538391
170,HKG,2015,1.195,84.2780487804878,7305700
174,ISL,2015,1.93,82.8609756097561,330815
190,ITA,2015,1.37,83.490243902439,60730582
194,JPN,2015,1.46,83.8436585365854,126958472
340,SGP,2015,1.24,82.5951219512195,5535002
374,SWE,2015,1.88,82.5512195121951,9799186


## Subquery inside where (2)
Use your knowledge of subqueries in `WHERE` to get the urban area population for only capital cities.

- Make use of the `capital` field in the `countries` table in your subquery.
- Select the city name, country code, and urban area population fields.

In [10]:
%%sql
SELECT name, country_code, urbanarea_pop
FROM cities
WHERE name IN
  (SELECT capital
   FROM countries)
ORDER BY urbanarea_pop DESC
LIMIT 10;

   sqlite://
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


name,country_code,urbanarea_pop
Beijing,CHN,21516000
Dhaka,BGD,14543124
Tokyo,JPN,13513734
Moscow,RUS,12197596
Cairo,EGY,10230350
Kinshasa,COD,10130000
Jakarta,IDN,10075310
Seoul,KOR,9995784
Mexico City,MEX,8974724
Lima,PER,8852000


## Subquery inside select
In this exercise, you'll see how some queries can be written using either a join or a subquery.

You have seen previously how to use `GROUP BY` with aggregate functions and an inner join to get summarized information from multiple tables.

The code given in the first query selects the top nine countries in terms of number of cities appearing in the `cities` table. Recall that this corresponds to the most populous cities in the world. Your task will be to convert the second query to get the same result as the provided code.

In [17]:
%%sql
SELECT countries.country_name AS country, COUNT(*) AS cities_num
  FROM cities
    INNER JOIN countries
    ON countries.code = cities.country_code
GROUP BY country
ORDER BY cities_num DESC, country
LIMIT 9;

   sqlite://
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


country,cities_num
China,36
India,18
Japan,11
Brazil,10
Pakistan,9
United States,9
Indonesia,7
Russian Federation,7
South Korea,7


- Convert the `GROUP BY` code to use a subquery inside of `SELECT` by filling in the blanks to get a result that matches the one given using the `GROUP BY` code in the first query.
- Again, sort the result by `cities_num` descending and then by `country` ascending.

In [18]:
%%sql
SELECT countries.country_name AS country,
  (SELECT COUNT(*)
   FROM cities
   WHERE countries.code = cities.country_code) AS cities_num
FROM countries
ORDER BY cities_num DESC, country
LIMIT 9;

   sqlite://
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


country,cities_num
China,36
India,18
Japan,11
Brazil,10
Pakistan,9
United States,9
Indonesia,7
Russian Federation,7
South Korea,7


---
## Subquery inside FROM clause
The last basic type of a subquery exists inside of a `FROM` clause. A motivating example pertaining to the percentage of women in parliament will be used now to help you understand this style of subquery. Let's dig in!

### Build-up
First, let's determine the maximum percentage of women in parliament for each continent listed in states. Recall that this query will only work if you include continent as one of the fields in the `SELECT` clause since we are grouping based on that field. 

In [23]:
%sql sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite

In [26]:
%%sql
SELECT continent, MAX(women_parli_perc) AS max_perc
FROM states
GROUP BY continent
ORDER BY max_perc DESC;

   sqlite://
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


continent,max_perc
Europe,39.6
Oceania,32.74
Asia,24.0
South America,22.31
Africa,14.9
North America,2.74


Let's check out the result. We see that Europe has the largest value and North America has the smallest value for the countries listed in the states table.

### Focusing on records in monarchs
What if you weren't interested in all continents, but specifically those in the monarchs table. You haven't seen this yet in the course but you can include multiple tables in a `FROM` clause by adding a comma between them. Let's investigate a way to get the continents only in monarchs using this new trick.

In [31]:
%%sql
SELECT monarchs.continent
FROM monarchs, states
WHERE monarchs.continent = states.continent
ORDER BY monarchs.continent;

   sqlite://
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


continent
Asia
Asia
Asia
Asia
Asia
Asia
Asia
Asia
Europe
Europe


 We have at least part of our answer here, but how do we get rid of those duplicate entries? And what about the maximum column?

### Finishing off the subquery
To get Asia and Europe to appear only once, use the `DISTINCT` command in your `SELECT` statement. But now how do you get that maximum column to also come along with Asia and Europe? Instead of including states in the `FROM` clause, include the subquery instead and alias it with a name like subquery. There you have it. That's how to include a subquery as a temporary table in your `FROM` clause.

In [34]:
%%sql
SELECT DISTINCT monarchs.continent, subquery.max_perc
FROM monarchs,
    (SELECT continent, MAX(women_parli_perc) AS max_perc
    FROM states
    GROUP BY continent) AS subquery
WHERE monarchs.continent = subquery.continent
ORDER BY monarchs.continent;

   sqlite://
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


continent,max_perc
Asia,24.0
Europe,39.6


## Subquery inside from
The last type of subquery you will work with is one inside of `FROM`.

You will use this to determine the number of languages spoken for each country, identified by the country's local name. (Note this may be different than the `name` field and is stored in the `local_name` field.)

- Begin by determining for each country code how many `languages` are listed in the languages table using `SELECT`, `FROM`, and `GROUP BY`.
- Alias the aggregated field as `lang_num`.

In [36]:
%sql sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite

In [37]:
%%sql
SELECT code, COUNT(*) AS lang_num
FROM languages
GROUP BY code
LIMIT 10;

   sqlite://
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


code,lang_num
ABW,7
AFG,4
AGO,12
AIA,1
ALB,4
AND,4
ARE,5
ARG,6
ARM,3
ASM,5


```
Showing 10 out of 212 rows
```

- Include the previous query (aliased as `subquery`) as a subquery in the `FROM` clause of a new query.
- Select the local name of the country from `countries`.
- Also, select `lang_num` from `subquery`.
- Make sure to use `WHERE` appropriately to match `code` in `countries` and in `subquery`.
- Sort by `lang_num` in descending order.

In [38]:
%%sql
SELECT countries.local_name, subquery.lang_num
FROM countries,
  	(SELECT code, COUNT(*) AS lang_num
  	 FROM languages
  	 GROUP BY code) AS subquery

WHERE countries.code = subquery.code
ORDER BY lang_num DESC
LIMIT 10;

   sqlite://
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


local_name,lang_num
Zambia,19
YeItyop´iya,16
Zimbabwe,16
Bharat/India,14
Nepal,14
South Africa,13
Mali,13
France,13
Angola,12
Malawi,12


```
Showing 10 out of 198 rows
```

## Advanced subquery
You can also nest multiple subqueries to answer even more specific questions.

In this exercise, for each of the six continents listed in 2015, you'll identify which country had the maximum inflation rate, and how high it was, using multiple subqueries. The table result of your final query should look something like the following, where anything between `<` `>` will be filled in with appropriate values:
```
+------------+---------------+-------------------+
| name       | continent     | inflation_rate    |
|------------+---------------+-------------------|
| <country1> | North America | <max_inflation1>  |
| <country2> | Africa        | <max_inflation2>  |
| <country3> | Oceania       | <max_inflation3>  |
| <country4> | Europe        | <max_inflation4>  |
| <country5> | South America | <max_inflation5>  |
| <country6> | Asia          | <max_inflation6>  |
+------------+---------------+-------------------+
```
Again, there are multiple ways to get to this solution using only joins, but the focus here is on showing you an introduction into advanced subqueries.

- Create an `INNER JOIN` with `countries` on the left and `economies` on the right with `USING`, without aliasing your tables or columns.
- Retrieve the country's name, continent, and inflation rate for 2015.

In [40]:
%%sql
SELECT country_name, continent, inflation_rate
FROM countries
    INNER JOIN economies
        ON countries.code = economies.code
WHERE year = 2015
LIMIT 10;

   sqlite://
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


country_name,continent,inflation_rate
Afghanistan,Asia,-1.549
Angola,Africa,10.287
Albania,Europe,1.896
United Arab Emirates,Asia,4.07
Argentina,South America,
Armenia,Asia,3.731
Antigua and Barbuda,North America,0.969
Australia,Oceania,1.461
Austria,Europe,0.81
Azerbaijan,Asia,4.049


```
Showing 10 out of 184 rows
```

- Select the maximum inflation rate in 2015 `AS max_inf` grouped by continent using the previous step's query as a subquery in the `FROM` clause.
    - Thus, in your subquery you should:
        - Create an inner join with `countries` on the left and `economies` on the right with `USING` (without aliasing your tables or columns).
        - Retrieve the country name, continent, and inflation rate for 2015.
        - Alias the subquery as `subquery`.
- This will result in the six maximum inflation rates in 2015 for the six continents as one field table. Make sure to not include `continent` in the outer `SELECT` statement.

In [59]:
%%sql
SELECT MAX(cast(inflation_rate as unsigned)) AS max_inf
FROM (
    SELECT country_name, continent, inflation_rate
    FROM countries
        INNER JOIN economies
            USING (code)
    WHERE year = 2015) AS subquery
GROUP BY continent;

   sqlite://
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


max_inf
21.858
39.403
48.684
7.524
9.784
121.738


- Now it's time to append your second query to your first query using `AND` and `IN` to obtain the name of the country, its continent, and the maximum inflation rate for each continent in 2015.
- For the sake of practice, change all joining conditions to use `ON` instead of `USING`.

In [60]:
%%sql
SELECT country_name, continent, inflation_rate
FROM countries
    INNER JOIN economies
        ON countries.code = economies.code
WHERE year = 2015
    AND inflation_rate IN (
        SELECT MAX(cast(inflation_rate as unsigned)) AS max_inf
        FROM (
             SELECT country_name, continent, inflation_rate
             FROM countries
             INNER JOIN economies
             ON countries.code = economies.code
             WHERE year = 2015) AS subquery
        GROUP BY continent);

   sqlite://
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


country_name,continent,inflation_rate
Haiti,North America,7.524
Malawi,Africa,21.858
Nauru,Oceania,9.784
Ukraine,Europe,48.684
Venezuela,South America,121.738
Yemen,Asia,39.403


*This code works since each of the six maximum inflation rate values occur only once in the 2015 data. Think about whether this particular code involving subqueries would work in cases where there are ties for the maximum inflation rate values.*

## Subquery challenge
Let's test your understanding of the subqueries with a challenge problem! Use a subquery to get 2015 economic data for countries that do **not** have `gov_form` of `'Constitutional Monarchy'` or
`'Republic'` in their `gov_form`.

Here, `gov_form` stands for the form of the government for each country. Review the different entries for `gov_form` in the `countries` table.

- Select the country code, inflation rate, and unemployment rate.
- Order by inflation rate ascending.
- Do not use table aliasing in this exercise.

In [62]:
%%sql
SELECT code, inflation_rate, unemployment_rate
FROM economies
WHERE year = 2015 AND code NOT IN
  	(SELECT code
  	 FROM countries
  	 WHERE (gov_form = 'Constitutional Monarchy' 
            OR gov_form LIKE '%Republic%'))
ORDER BY cast(inflation_rate as unsigned)
LIMIT 10;

   sqlite://
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


code,inflation_rate,unemployment_rate
AFG,-1.549,
CHE,-1.14,3.178
PRI,-0.751,12.0
ROU,-0.596,6.812
BRN,-0.423,6.9
TON,-0.283,
OMN,0.065,
TLS,0.553,
BEL,0.62,8.492
CAN,1.132,6.9


```
Showing 10 out of 26 rows
```

---
## Course review
Before you tackle the three challenge problems, let's review the main topics covered throughout the course.

### Types of joins
In SQL, a join combines columns from one or more tables in a relational database via a lookup process. There are four different types of joins you learned about in this course. 
1. an `INNER JOIN` is also denoted as just JOIN in SQL. A special case of an `INNER JOIN` you explored is called a self-join. 
2. there are three `OUTER JOIN`s denoted as `LEFT JOIN` (or `LEFT OUTER JOIN`), `RIGHT JOIN` (or `RIGHT OUTER JOIN`), and `FULL JOIN` (or `FULL OUTER JOIN`). 
3. you worked with `CROSS JOIN`s to create all possible combinations between two tables. 
4. you investigated semi-joins and anti-joins. Remember that words appearing in ALL capital letters correspond to the joins having simple SQL syntax. Self-joins, semi-joins, and anti-joins don't have built-in SQL syntax.

### INNER JOIN vs LEFT JOIN
An `INNER JOIN` keeps only the records in which the key field (or fields) is in both tables. A `LEFT JOIN` keeps all the records in fields specified in the left table and includes the matches in the right table based on the key field or fields. Key field values that don't match in the right table are included as missing data in the resulting table of a `LEFT JOIN`.

### RIGHT JOIN vs FULL JOIN
A `RIGHT JOIN` keeps all the records specified in the right table and includes the matches from the key field (or fields) in the left table. Those that don't match are included as missing values in the resulting table from the `RIGHT JOIN` query. A `FULL JOIN` is a combination of a `LEFT JOIN` and a `RIGHT JOIN` showing exactly which values appear in both tables and those that appear in only one or the other table.

### CROSS JOIN with code
A `CROSS JOIN` matches all records from fields specified in one table with all records from fields specified in another table. Remember that a `CROSS JOIN` does not have an `ON` or `USING` clause, but otherwise looks very similar to the code for an `INNER JOIN`, `LEFT JOIN`, `RIGHT JOIN`, or `FULL JOIN`.

### Set Theory Clauses
Recall that `UNION` includes every record in both tables but **DOES NOT** double count those that are in both tables whereas `UNION ALL` **DOES** replicate those that are in both tables. `INTERSECT` gives only those records found in both of the two tables. `EXCEPT` gives only those records in one table **BUT NOT** the other.

### Semi-joins and Anti-joins
When you'd like to filter your first table based on conditions set on a second table, you should use a semi-join to accomplish your task. If instead you'd like to filter your first table based on conditions **NOT** being met on a second table, you should use an anti-join. Anti-joins are particularly useful in diagnosing problems with other joins in terms of getting fewer or more records than you expected.

### Types of basic subqueries
The most common type of subquery is done inside of a `WHERE` clause. The next most frequent types of subqueries are inside `SELECT` clauses and inside `FROM` clauses. As you'll see in the challenge exercises, subqueries can also find their way into the ON statement of a join in ways similar to what you've seen inside `WHERE` clauses too.

## Final challenge
 The next three exercises will test your knowledge of the content covered in this course and apply many of the ideas you've seen to difficult problems. 

In this exercise, you'll need to get the country names and other 2015 data in the `economies` table and the `countries` table for **Central American countries with an official language**.

- Select unique country names. Also select the total investment and imports fields.
- Use a left join with `countries` on the left. (An inner join would also work, but please use a left join here.)
- Match on `code` in the two tables `AND` use a subquery inside of `ON` to choose the appropriate `languages` records.
- Order by country name ascending.
- Use table aliasing but **not** field aliasing in this exercise.

In [70]:
%%sql
SELECT DISTINCT country_name, total_investment, imports
FROM countries AS c
    LEFT JOIN economies AS e
        ON (c.code = e.code 
            AND c.code IN (
                SELECT l.code
                FROM languages AS l
                WHERE official = 'TRUE'
            ))
WHERE region = 'Central America' AND year = 2015
ORDER BY country_name;

   sqlite://
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


country_name,total_investment,imports
Belize,22.014,6.743
Costa Rica,20.218,4.629
El Salvador,13.983,8.193
Guatemala,13.433,15.124
Honduras,24.633,9.353
Nicaragua,31.862,11.665
Panama,46.557,5.898


## Final challenge (2)
Let's ease up a bit and calculate the average fertility rate for each region in 2015.

- Include the name of region, its continent, and average fertility rate aliased as `avg_fert_rate`.
- Sort based on `avg_fert_rate` ascending.
- Remember that you'll need to `GROUP BY` all fields that aren't included in the aggregate function of `SELECT`.

In [72]:
%%sql
SELECT region, continent, AVG(fertility_rate) AS avg_fert_rate
FROM countries AS c
    INNER JOIN populations AS p
        ON c.code = p.country_code
WHERE year = 2015
GROUP BY region, continent
ORDER BY avg_fert_rate;

   sqlite://
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


region,continent,avg_fert_rate
Southern Europe,Europe,1.4261
Eastern Europe,Europe,1.4908888888888887
Baltic Countries,Europe,1.6033333333333333
Eastern Asia,Asia,1.6207142857142856
Western Europe,Europe,1.6324999999999998
North America,North America,1.7657500000000002
British Islands,Europe,1.875
Nordic Countries,Europe,1.893333333333333
Australia and New Zealand,Oceania,1.9115
Caribbean,North America,1.9505714285714284


```
Showing 10 out of 23 rows
```

*It seems that the average fertility rate is lowest in Southern Europe and highest in Central Africa.*

## Final challenge (3)
You are now tasked with determining the top 10 capital cities in Europe and the Americas in terms of a calculated percentage using `city_proper_pop` and `metroarea_pop` in `cities`.

Do not use table aliasing in this exercise.

- Select the city name, country code, city proper population, and metro area population.
- Calculate the percentage of metro area population composed of city proper population for each city in `cities`, aliased as `city_perc`.
- Focus only on capital cities in Europe and the Americas in a subquery.
- Make sure to exclude records with missing data on metro area population.
- Order the result by `city_perc` descending.
- Then determine the top 10 capital cities in Europe and the Americas in terms of this `city_perc` percentage.

In [79]:
%%sql
SELECT name, country_code, city_proper_pop, metroarea_pop, 
       city_proper_pop * 100 / metroarea_pop AS city_perc
FROM cities
WHERE name IN 
    (SELECT capital
     FROM countries
     WHERE (continent = 'Europe'
        OR continent LIKE '%America'))
       AND metroarea_pop IS NOT NULL
ORDER BY city_perc DESC
LIMIT 10;

   sqlite://
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


name,country_code,city_proper_pop,metroarea_pop,city_perc
Lima,PER,8852000,10750000,82
Bogota,COL,7878783,9800000,80
Moscow,RUS,12197596,16170000,75
Vienna,AUT,1863881,2600000,71
Montevideo,URY,1305082,1947604,67
Caracas,VEN,1943901,2923959,66
Rome,ITA,2877215,4353775,66
Brasilia,BRA,2556149,3919864,65
London,GBR,8673713,13879757,62
Budapest,HUN,1759407,2927944,60
