# Introduction to JOINs
## Using INNER JOIN
* "id" field is also the "key" field.  
* An INNER JOIN only includes records in which the key is in both tables.
* With INNER JOIN we look for matches in the right table corresponding to all entries in the key  field in the left table.

In [0]:
SELECT *
FROM table_1;

## INNER JOIN in SQL

In [0]:
SELECT p1.country, p1.continent, prime_minister, president
FROM prime_ministers AS p1
INNER JOIN presidents AS p2
ON p1.country = p2.country;

In [0]:
-- 1. Select name fields (with alias) and region 
SELECT cities.name AS city, countries.name AS country, region
FROM cities
  INNER JOIN countries
    ON cities.country_code = countries.code;

Instead of writing the full table name, you can use table aliasing as a shortcut. For tables you also use `AS` to add the alias immediately after the table name with a space. Check out the aliasing of cities and countries below.
```
SELECT c1.name AS city, c2.name AS country
FROM cities AS c1
INNER JOIN countries AS c2
ON c1.country_code = c2.code;
```
Notice that to select a field in your query that appears in multiple tables, you'll need to identify which table/table alias you're referring to by using a `.` in your `SELECT` statement.

Sometimes it's easier to write SQL code out of order: you write the `SELECT` statement after you've done the `JOIN`.

In [0]:
-- 3. Select fields with aliases
SELECT c.code AS country_code, name, year, inflation_rate
FROM countries AS c
  -- 1. Join to economies (alias e)
  INNER JOIN economies AS e
    -- 2. Match on code
    ON c.code = e.code;

The ability to combine multiple joins in a single query is a powerful feature of SQL, e.g:
```
SELECT *
FROM left_table
  INNER JOIN right_table
    ON left_table.id = right_table.id
  INNER JOIN another_table
    ON left_table.id = another_table.id;
```
As you can see here it becomes tedious to continually write long table names in joins. This is when it becomes useful to alias each table using the first letter of its name (e.g. `countries AS c`)! It is standard practice to alias in this way and, if you choose to alias tables or are asked to specifically for an exercise in this course, you should follow this protocol.

In [0]:
-- 4. Select fields
SELECT c.code, c.name, c.region, p.year, p.fertility_rate
  -- 1. From countries (alias as c)
  FROM countries AS c
  -- 2. Join with populations (as p)
  INNER JOIN populations AS p
    -- 3. Match on country code
    ON c.code = p.country_code

## INNER JOIN with USING

In [0]:
SELECT left_table.id AS L_id, left_table.val AS L_val, right_table.val AS R_val
FROM left_table
INNER JOIN right_table
ON left_table.id = right_table.id;

When the key field you'd like to JOIN ON is the same name in both tables, you can use a `USING` clause instead of the `ON` clause.

In [0]:
SELECT left_table.id AS L_id, left_table.val AS L_val, right_table.val AS R_val
FROM left_table
INNER JOIN right_table
USING (id);

## Self-ish joins, just in CASE
### Join prime_ministers to itself
The `prime_minsters` table is on both the left and the right. The vital step here is setting the key columns by which we match the table to itself. For each country, we will have a match, if the country in the "right table" (that is also '`prime_ministers`') is in the same continent.

In [0]:
SELECT p1.country AS country1, p2.country AS country2, p1.continent
FROM prime_ministers AS p1
INNER JOIN prime_ministers AS p2
ON p1. continent = p2.continent
LIMIT 14;

Because the resulting table has country names matching itself in a row, we need a way to avoide this redundant row.

The `AND` clause can check that multiple conditions are met. Here, a match will not be made between '`prime_ministers`' and itself if the countries match.

In [0]:
SELECT p1.country AS country1, p2.country AS country2, p1.continent
FROM prime_ministers AS p1
INNER JOIN prime_ministers AS p2
ON p1. continent = p2.continent AND p1.country <> p2.country
LIMIT 13;

# Outer joins and cross joins
## CASE WHEN and THEN
`CASE` is a way to do multiple if-then-else statements in a simplified way in SQL. 

In [0]:
"""This is the basic layout for creating a new field containing the groupings."""
SELECT name, continent, indep_year,
  CASE WHEN _____ < _____ THEN 'before 1900'
    WHEN indep_year <= 1930 THEN '____'
    ELSE '____' END
    AS indep_year_group
FROM states
ORDER BY indep_year_group;

In [0]:
SELECT name, continent, indep_year,
  CASE WHEN indep_year < 1900 THEN 'before 1900'
    WHEN indep_year <= 1930 THEN 'between 1900 and 1930'
    ELSE 'after 1930' END
    AS indep_year_group
FROM states
ORDER BY indep_year_group;

In [0]:
SELECT p1.country_code,
       p1.size AS size2010, 
       p2.size AS size2015,
       -- 1. calculate growth_perc
       ((p2.size - p1.size)/p1.size * 100.0) AS growth_perc
-- 2. From populations (alias as p1)
FROM populations AS p1
  -- 3. Join to itself (alias as p2)
  INNER JOIN populations AS p2
    -- 4. Match on country code
    ON p1.country_code = p2.country_code
        -- 5. and year (with calculation)
        AND p1.year = p2.year - 5;

In [0]:
SELECT name, continent, code, surface_area,
    CASE WHEN surface_area > 2000000
            THEN 'large'
       WHEN surface_area > 350000
            THEN 'medium'
       ELSE 'small' END
       AS geosize_group
INTO countries_plus
FROM countries;

In [0]:
SELECT country_code, size,
  CASE WHEN size > 50000000
            THEN 'large'
       WHEN size > 1000000
            THEN 'medium'
       ELSE 'small' END
       AS popsize_group
INTO pop_plus       
FROM populations
WHERE year = 2015;

-- 5. Select fields
SELECT name, continent, geosize_group, popsize_group
-- 1. From countries_plus (alias as c)
FROM countries_plus AS c
  -- 2. Join to pop_plus (alias as p)
  INNER JOIN pop_plus AS p
    -- 3. Match on country code
    ON c.code = p.country_code
-- 4. Order the table    
ORDER BY geosize_group;

## LEFT and RIGHT JOINs
Whereas `INNER JOIN` keeps only the records IN both tables, there are three different types of `OUTER JOIN`s:
* `LEFT JOIN`
* `RIGHT JOIN`
* `FULL JOIN`

## The syntax of `LEFT JOIN`

In [0]:
SELECT p1.country, prime_minister, president
FROM prime_minister AS p1
LEFT JOIN presidents AS p2
ON p1.country = p2.country;

In [0]:
SELECT c1.name AS city, code, c2.name AS country,
       region, city_proper_pop
FROM cities AS c1
  -- 1. Join right table (with alias)
  LEFT JOIN countries AS c2
    -- 2. Match on country code
    ON c1.country_code = c2.code
-- 3. Order by descending country code
ORDER BY code DESC;

In [0]:
/*
5. Select country name AS country, the country's local name,
the language name AS language, and
the percent of the language spoken in the country
*/
SELECT c.name AS country, local_name, l.name AS language, percent
-- 1. From left table (alias as c)
FROM countries AS c
  -- 2. Join to right table (alias as l)
  LEFT JOIN languages AS l
    -- 3. Match on fields
    ON c.code = l.code
-- 4. Order by descending country
ORDER BY country DESC;

## The syntax of `RIGHT JOIN`


In [0]:
SELECT right_table.id AS R_id, left_table.val AS L_val, right_table.val AS R_val
FROM left_table
RIGHT JOIN right_table
ON left_table.id = right_table.id;

In [0]:
-- convert this code to use RIGHT JOINs instead of LEFT JOINs
/*
SELECT cities.name AS city, urbanarea_pop, countries.name AS country,
       indep_year, languages.name AS language, percent
FROM cities
  LEFT JOIN countries
    ON cities.country_code = countries.code
  LEFT JOIN languages
    ON countries.code = languages.code
ORDER BY city, language;
*/

SELECT cities.name AS city, urbanarea_pop, countries.name AS country,
       indep_year, languages.name AS language, percent
FROM languages
  RIGHT JOIN countries
    ON countries.code = languages.code
  RIGHT JOIN cities
    ON cities.country_code = countries.code
ORDER BY city, language;

## `FULL JOIN`s

A `FULL JOIN` combines a `LEFT JOIN` and a `RIGHT JOIN`.

In [0]:
SELECT left_table.id AS L_id, right_table.id AS R_id, left_table.val AS L_val, right_table.val AS R_val
FROM left_table
FULL JOIN right_table
USING(id);

In [0]:
'''Example case of using FULL JOIN'''

SELECT p1.country AS pm_co, p2.country AS pres_co, prime_minister, president
FROM prime_minister AS p1
FULL JOIN presidents AS p2
ON p1.country = p2.country;

## `CROSS JOIN`

`CROSS JOIN`s create all possible combinations of two tables

In [0]:
SELECT prime_minister, president
FROM prime_ministers AS p1
CROSS JOIN presidents AS p2
WHERE p1.continent IN ('North America', 'Oceania');

# Set Theory Clauses
## `UNION` and `UNION ALL`
* `UNION` includes every record in both tables but DOES NOT double count those that are in both tables.
* `UNION ALL` includes every record in both tables and DOES replicate those that are in both tables.
* `INTERSECT` results in only those records found in both of the two tables.
* `EXCEPT` results in only those records in one table BUT NOT the other.

The fields included in the operation must be of the same data type since they come back as just a single field. You can't stack a number on top of a character field, in other words.


In [0]:
SELECT prime_minister AS leader, country
FROM prime_ministers
UNION 
SELECT monarch, country
FROM monarchs
ORDER BY country;

In [0]:
-- Select fields from 2010 table
SELECT *
  -- From 2010 table
  FROM economies2010
	-- Set theory clause
	UNION 
-- Select fields from 2015 table
SELECT *
  -- From 2015 table
  FROM economies2015
-- Order by code and year
ORDER BY code, year;

## `INTERSECT`
Intersect only includes those records common in both tables and fields selected.

In [0]:
SELECT id
FROM left_one
INTERSECT
SELECT id
FROM right_one;

### `INTERSECT` on two fields
The following code yields no results because `INTERSECT` looks for records in common, not individual key fields like what a `JOIN` does to match.

In [0]:
SELECT country, prime_minister AS leader
FROM prime_minister
INTERSECT
SELECT country, president
FROM presidents;

## `EXCEPT`
`EXCEPT` allows you to include only the records that are in one table, but not the other.

In [0]:
SELECT monarch, country
FROM monarchs
EXCEPT
SELECT prime_minister, country
FROM prime_ministers;

## Semi-join and Anti-joins
### Intro to subqueries
A subquery is just a query that sits inside of another query.

The following is an Semi-join:

In [0]:
SELECT president, country, continent
FROM presidents
WHERE country IN
  (SELECT name
   FROM states
   WHERE indep_year < 1800);

In [0]:
-- Select distinct fields
SELECT DISTINCT name
  -- From languages
  FROM languages
-- Where in statement
WHERE code IN
  -- Subquery
  (SELECT code
  FROM countries
WHERE region = 'Middle East')
-- Order by name
ORDER BY name;

The following is an Anti-join:

In [0]:
SELECT president, country, continent
FROM presidents
WHERE continent LIKE '%America'
  AND country NOT IN
    (SELECT name
     FROM states
     WHERE indep_year < 1800);

In [0]:
-- Select the city name
SELECT name
  -- Alias the table where city name resides
  FROM cities AS c1
  -- Choose only records matching the result of multiple set theory clauses
  WHERE country_code IN
(
    -- Select appropriate field from economies AS e
    SELECT e.code
    FROM economies AS e
    -- Get all additional (unique) values of the field from currencies AS c2  
    UNION
    SELECT c2.code
    FROM currencies AS c2
    -- Exclude those appearing in populations AS p
    EXCEPT
    SELECT p.country_code
    FROM populations AS p
);

# Subqueries
## Subqueries inside `WHERE` and `SELECT` clauses



In [0]:
SELECT name, fert_rate
FROM states
WHERE continent = 'Asia'
  AND fert_rate < 
    (SELECT AVG(fert_rate)
     FROM states);

## Subqueries inside SELECT clauses 

Anytime you do a subquery inside a `SELECT` statement like this, you need to give the subquery an alias like '`countries_num`' here. 

In [0]:
SELECT DISTINCT continent,
  (SELECT COUNT(*)
   FROM states
   WHERE prime_ministers.continent = states.continent) AS countries_num
FROM prime_ministers;

In [0]:
-- Select fields
SELECT *
  -- From populations
  FROM populations
-- Where life_expectancy is greater than
WHERE life_expectancy >
  -- 1.15 * subquery
  1.15 * 
   (SELECT AVG(life_expectancy)
    FROM populations
    WHERE year = 2015)
  AND year = 2015;

## Subquery inside `FROM` clause
### Build-up

In [0]:
SELECT continent, MAX(women_parli_perc) AS max_perc
FROM states
GROUP BY continent
ORDER BY continent;

### Focusing on records in monarchs
You can include multiple tables in a '`FROM`' clause by adding a comma between them.

In [0]:
SELECT monarchs.continent
FROM monarchs, states
WHERE monarchs.continent = states.continent
ORDER BY continent;

### Finishing off the subquery

In [0]:
SELECT DISTINCT monarchs.continent, subquery.max_perc
FROM monarchs, 
  (SELECT continent, MAX(women_parli_perc) AS max_perc
   FROM states
   GROUP BY continent) AS subquery
WHERE monarchs.continent = subquery.continent
ORDER BY continent;

In [0]:
-- Select fields
SELECT name, continent, inflation_rate
  -- From countries
  FROM countries
	-- Join to economies
	INNER JOIN economies
	-- Match on code
	ON countries.code = economies.code
  -- Where year is 2015
  WHERE year = 2015
    -- And inflation rate in subquery (alias as subquery)
    AND inflation_rate IN (
        SELECT MAX(inflation_rate) AS max_inf
        FROM (
             SELECT name, continent, inflation_rate
             FROM countries
             INNER JOIN economies
             ON countries.code = economies.code
             WHERE year = 2015) AS subquery
        GROUP BY continent);

In [0]:
-- Select fields
SELECT DISTINCT name, total_investment, imports
  -- From table (with alias)
  FROM countries AS c
    -- Join with table (with alias)
    LEFT JOIN economies AS e
      -- Match on code
      ON (c.code = e.code
      -- and code in Subquery
        AND c.code IN (
          SELECT l.code
          FROM languages AS l
          WHERE official = 'true'
        ) )
  -- Where region and year are correct
  WHERE region = 'Central America' AND year = 2015
-- Order by field
ORDER BY name;

In [0]:
-- Select fields
SELECT region, continent, AVG(fertility_rate) AS avg_fert_rate
  -- From left table
  FROM countries AS c
    -- Join to right table
    INNER JOIN populations AS p
      -- Match on join condition
      ON c.code = p.country_code
  -- Where specific records matching some condition
  WHERE year = 2015
-- Group appropriately
GROUP BY region, continent
-- Order appropriately
ORDER BY avg_fert_rate;