# **Outline**

- [**1. Introduction INNER JOIN**](#-1.-introduction-inner-join)
    - [1.1. Alising, DISTINCT, VIEW, COUNT](#1.1.-Alising,-DISTINCT,-VIEW,-COUNT)
- [**2. Filtering Numbers**](#-2.-filtering-numbers)

In [1]:
import duckdb
import pandas as pd

First, establish a connection to DuckDB and create new tables from the csv files.

In [2]:
conn = duckdb.connect(  database='data/database.duckdb',
                        read_only=False)


# Creating and loading tables from CSV files
conn.sql("CREATE OR REPLACE TABLE cities AS SELECT * FROM read_csv_auto('data/countries/cities.csv');")
conn.sql("CREATE OR REPLACE TABLE countries AS SELECT * FROM read_csv_auto('data/countries/countries.csv');")
conn.sql("CREATE OR REPLACE TABLE economies AS SELECT * FROM read_csv_auto('data/countries/economies.csv');")
conn.sql("CREATE OR REPLACE TABLE languages AS SELECT * FROM read_csv_auto('data/countries/languages.csv');")
conn.sql("CREATE OR REPLACE TABLE populations AS SELECT * FROM read_csv_auto('data/countries/populations.csv');")
conn.sql("CREATE OR REPLACE TABLE currencies AS SELECT * FROM read_csv_auto('data/countries/currencies.csv');")
conn.sql("CREATE OR REPLACE TABLE economies2015 AS SELECT * FROM read_csv_auto('data/countries/economies2015.csv');")
conn.sql("CREATE OR REPLACE TABLE economies2019 AS SELECT * FROM read_csv_auto('data/countries/economies2019.csv');")


conn.sql("SHOW TABLES")

┌───────────────┐
│     name      │
│    varchar    │
├───────────────┤
│ cities        │
│ countries     │
│ currencies    │
│ economies     │
│ economies2015 │
│ economies2019 │
│ languages     │
│ populations   │
└───────────────┘

In [3]:
# For simplicity
def sql(query):
    display(conn.sql(query).df())

# **1. INNER JOIN**

- `INNER JOIN`: returns records that have matching values in both tables.
    - `ON`: keyword used to match the columns in the tables.
    - `USING()`: Can also be used to match the columns in the tables. But its only valid when the columns have the same name in both tables.

```sql
SELECT left_table.id, left_val, right_val
FROM left_table
INNER JOIN right_table
ON left_table.id = right_table.id;
```
Or to shorter the code, we can use the `USING()` clause to specify the column to join on and alias the table names.

```sql
SELECT left.id, left_val, right_val
FROM left_table AS left
INNER JOIN right_table AS right
USING(id);
```

The result would be the following:

<center>
<img src="figures/diagrams/inner_join2.png" alt="drawing" width = 600/>
</center>


The commands is executed in the following order:

4º SELECT left_table.id, left_val, right_val

1º FROM left_table

2º INNER JOIN right_table

3º ON left_table.id = right_table.id;

The `table.column_name` format must be used when selecting columns that exist in both
tables to avoid a SQL error.


In [4]:

# Perform an inner join with the cities table on the left and the countries table on the right on the country_code column.
sql(
"""
SELECT *
FROM cities
INNER JOIN countries
ON cities.country_code = countries.code
LIMIT 3;
"""
)


# Complete the SELECT statement to keep only the name of the city, the name of the country, and the region the country is located in.
# Alias the name of the city AS city and the name of the country AS country.

sql(
"""
SELECT  
    cities.name AS city,          
    countries.country_name AS country,  
    countries.region
FROM cities
INNER JOIN countries
ON cities.country_code = countries.code
LIMIT 3;
"""
)


Unnamed: 0,name,country_code,city_proper_pop,metroarea_pop,urbanarea_pop,code,country_name,continent,region,surface_area,indep_year,local_name,gov_form,capital,cap_long,cap_lat
0,Abidjan,CIV,4765000,,4765000,CIV,Cote d'Ivoire,Africa,Western Africa,322463.0,1960,Cote dIvoire,Republic,Yamoussoukro,-4.0305,5.332
1,Abu Dhabi,ARE,1145000,,1145000,ARE,United Arab Emirates,Asia,Middle East,83600.0,1971,Al-Imarat al-´Arabiya al-Muttahida,Emirate Federation,Abu Dhabi,54.3705,24.4764
2,Abuja,NGA,1235880,6000000.0,1235880,NGA,Nigeria,Africa,Western Africa,923768.0,1960,Nigeria,Federal Republic,Abuja,7.48906,9.05804


Unnamed: 0,city,country,region
0,Abidjan,Cote d'Ivoire,Western Africa
1,Abu Dhabi,United Arab Emirates,Middle East
2,Abuja,Nigeria,Western Africa


In [5]:
# join the tables countries AS c (left) with economies (right), aliasing economies AS e.
# Select from the countries table (aliased as country_code), name, year, and inflation_rate.
sql(
"""
SELECT  
    c.code AS country_code,          
    c.country_name,
    e.year,
    e.inflation_rate,  
FROM countries AS c
INNER JOIN economies AS e 
ON c.code = e.code
LIMIT 3;
"""
)

Unnamed: 0,country_code,country_name,year,inflation_rate
0,AFG,Afghanistan,2010,2.179
1,AFG,Afghanistan,2015,-1.549
2,AGO,Angola,2010,14.48


In [6]:
# Inner join countries (left) with alias c and languages (right) with alias l
# Select contry_name with alias country, name from languages alias with language, and the official column from languages.
# use USING with code instead of ON

sql(
"""
SELECT 
    c.country_name AS country, 
    l.name AS language,
    official
FROM countries AS c
INNER JOIN languages AS l
USING(code)
LIMIT 3;
"""
)

Unnamed: 0,country,language,official
0,Afghanistan,Dari,True
1,Afghanistan,Pashto,True
2,Afghanistan,Turkic,False


## 1.1 Multiple Joins

We can combine multiple joins in a single query. The order of the joins is important. The first join is executed, and then the second join is executed on the result of the first join. 

```sql
SELECT *
FROM left_table
INNER JOIN right_table
ON left_table.id = right_table.id
INNER JOIN another_table
ON left_table.id = another_table.id;
```

Also, if a field to join two tables is not unique, the result will be a combination of all the rows from the first table with all the rows from the second table. 



```sql
SELECT *
FROM left_table
INNER JOIN right_table
ON left_table.id = right_table.id
```

<center>
<img src="figures/diagrams/multiple_keys1.png" alt="drawing" width = 600/>
</center>

To avoid this, we can use the `AND` clause to join the tables on multiple columns.

```sql
SELECT *
FROM left_table
INNER JOIN right_table
ON left_table.id = right_table.id
    AND left_table.date = right_table.date;
```


<center>
<img src="figures/diagrams/multiple_keys2.png" alt="drawing" width = 600/>
</center>

In [7]:

sql("""
    SELECT *
    FROM populations
    """)

Unnamed: 0,pop_id,country_code,year,fertility_rate,life_expectancy,size
0,20,ABW,2010,1.704,74.953537,101597.0
1,19,ABW,2015,1.647,75.573585,103889.0
2,2,AFG,2010,5.746,58.970829,27962207.0
3,1,AFG,2015,4.653,60.717171,32526562.0
4,12,AGO,2010,6.416,50.654171,21219954.0
...,...,...,...,...,...,...
429,352,ZAF,2015,2.339,57.440902,55011977.0
430,431,ZMB,2010,5.687,56.383854,13917439.0
431,430,ZMB,2015,5.284,60.785683,16211767.0
432,433,ZWE,2010,4.048,49.574659,13973897.0


In [8]:
sql("""
    SELECT *
    FROM countries
    """)

Unnamed: 0,code,country_name,continent,region,surface_area,indep_year,local_name,gov_form,capital,cap_long,cap_lat
0,AFG,Afghanistan,Asia,Southern and Central Asia,652090.0,1919.0,Afganistan/Afqanestan,Islamic Emirate,Kabul,69.17610,34.5228
1,NLD,Netherlands,Europe,Western Europe,41526.0,1581.0,Nederland,Constitutional Monarchy,Amsterdam,4.89095,52.3738
2,ALB,Albania,Europe,Southern Europe,28748.0,1912.0,Shqiperia,Republic,Tirane,19.81720,41.3317
3,DZA,Algeria,Africa,Northern Africa,2381740.0,1962.0,Al-Jazair/Algerie,Republic,Algiers,3.05097,36.7397
4,ASM,American Samoa,Oceania,Polynesia,199.0,,Amerika Samoa,US Territory,Pago Pago,-170.69100,-14.2846
...,...,...,...,...,...,...,...,...,...,...,...
200,EST,Estonia,Europe,Baltic Countries,45227.0,1991.0,Eesti,Republic,Tallinn,24.75860,59.4392
201,USA,United States,North America,North America,9363520.0,1776.0,United States,Federal Republic,Washington D.C.,-77.03200,38.8895
202,VIR,"Virgin Islands, U.S.",North America,Caribbean,347.0,,Virgin Islands of the United States,US Territory,Charlotte Amalie,-64.89630,18.3358
203,ZWE,Zimbabwe,Africa,Eastern Africa,390757.0,1980.0,Zimbabwe,Republic,Harare,31.06720,-17.8312


In [9]:
# Perform an inner join of countries AS c (left) with populations AS p (right), on code.
# Select name, year and fertility_rate.
sql(
"""
SELECT 
    c.country_name, 
    p.year,
    p.fertility_rate
FROM countries AS c
INNER JOIN populations AS p
ON c.code = p.country_code
LIMIT 3;
"""
)


# Chain another inner join to your query with the economies table AS e, using code.
# Select name, and using table aliases, select year and unemployment_rate from economies.
# filter the record to show only Albania
sql(
"""
SELECT 
    c.country_name, 
    p.year,
    p.fertility_rate,
    e.year,
    e.unemployment_rate,
FROM countries AS c
INNER JOIN populations AS p
ON c.code = p.country_code
INNER JOIN economies AS e
ON c.code = e.code
WHERE c.country_name = 'Albania';
"""
)

# The last join was performed on c.code = e.code, without also joining on year.
# This result in four records for Albania, and not just two for the year 2010 and 2015.
# To fix this, modify the query to join on code and year to show just two records for Albania.

sql(
"""
SELECT 
    c.country_name, 
    p.year,
    p.fertility_rate,
    e.year,
    e.unemployment_rate,
FROM countries AS c
INNER JOIN populations AS p
ON c.code = p.country_code
INNER JOIN economies AS e
ON c.code = e.code
    AND e.year = p.year
WHERE c.country_name = 'Albania';
"""
)


Unnamed: 0,country_name,year,fertility_rate
0,Aruba,2010,1.704
1,Aruba,2015,1.647
2,Afghanistan,2010,5.746


Unnamed: 0,country_name,year,fertility_rate,year_2,unemployment_rate
0,Albania,2010,1.663,2015,17.1
1,Albania,2015,1.793,2015,17.1
2,Albania,2010,1.663,2010,14.0
3,Albania,2015,1.793,2010,14.0


Unnamed: 0,country_name,year,fertility_rate,year_2,unemployment_rate
0,Albania,2010,1.663,2010,14.0
1,Albania,2015,1.793,2015,17.1


# **2. Other types of JOINs**


<center>
<img src="figures/diagrams/all-joins.png" alt="drawing" />
</center>

## **2.1. LEFT and RIGHT JOINs**

- `LEFT JOIN`: Returns all records from the left table, and the matched records from the right table. The result is NULL from the right side if there is no match. Is also commonly referred to as a `LEFT OUTER JOIN`.

```sql
SELECT *
FROM left_table
LEFT JOIN right_table
ON left_table.id = right_table.id
```

<center>
<img src="figures/diagrams/left_join2.png" alt="drawing" width = 600/>
</center>

- `RIGHT JOIN`: Returns all records from the right table, and the matched records from the left table. The result is NULL from the left side if there is no match. Is also commonly referred to as a `RIGHT OUTER JOIN`.

```sql
SELECT *
FROM left_table
RIGHT JOIN right_table
ON left_table.id = right_table.id
```

<center>
<img src="figures/diagrams/right_join2.png" alt="drawing" width = 600/>
</center>


RIGHT JOIN is less common than `LEFT JOIN`. In practice, it is more common to use `LEFT JOIN` than `RIGHT JOIN`, because any `RIGHT JOIN` can be rewritten as a `LEFT JOIN`.

In [10]:
sql("""
    SELECT * 
    FROM countries
    """)

Unnamed: 0,code,country_name,continent,region,surface_area,indep_year,local_name,gov_form,capital,cap_long,cap_lat
0,AFG,Afghanistan,Asia,Southern and Central Asia,652090.0,1919.0,Afganistan/Afqanestan,Islamic Emirate,Kabul,69.17610,34.5228
1,NLD,Netherlands,Europe,Western Europe,41526.0,1581.0,Nederland,Constitutional Monarchy,Amsterdam,4.89095,52.3738
2,ALB,Albania,Europe,Southern Europe,28748.0,1912.0,Shqiperia,Republic,Tirane,19.81720,41.3317
3,DZA,Algeria,Africa,Northern Africa,2381740.0,1962.0,Al-Jazair/Algerie,Republic,Algiers,3.05097,36.7397
4,ASM,American Samoa,Oceania,Polynesia,199.0,,Amerika Samoa,US Territory,Pago Pago,-170.69100,-14.2846
...,...,...,...,...,...,...,...,...,...,...,...
200,EST,Estonia,Europe,Baltic Countries,45227.0,1991.0,Eesti,Republic,Tallinn,24.75860,59.4392
201,USA,United States,North America,North America,9363520.0,1776.0,United States,Federal Republic,Washington D.C.,-77.03200,38.8895
202,VIR,"Virgin Islands, U.S.",North America,Caribbean,347.0,,Virgin Islands of the United States,US Territory,Charlotte Amalie,-64.89630,18.3358
203,ZWE,Zimbabwe,Africa,Eastern Africa,390757.0,1980.0,Zimbabwe,Republic,Harare,31.06720,-17.8312


In [11]:
# Perform an inner join with cities AS c1 on the left and countries as c2 on the right.
# Use code as the field to merge your tables on.
sql(
"""
SELECT 
    c1.name AS city,
    code,
    c2.country_name AS country,
    region,
    city_proper_pop
FROM cities AS c1
INNER JOIN countries AS c2
ON c1.country_code = c2.code
ORDER BY code DESC
LIMIT 3;
"""  
)


#  Perform a LEFT JOIN instead of an INNER JOIN.
sql(
"""
SELECT 
    c1.name AS city,
    code,
    c2.country_name AS country,
    region,
    city_proper_pop
FROM cities AS c1
LEFT JOIN countries AS c2
ON c1.country_code = c2.code
ORDER BY code DESC
LIMIT 3;
"""  
)



Unnamed: 0,city,code,country,region,city_proper_pop
0,Harare,ZWE,Zimbabwe,Eastern Africa,1606000
1,Lusaka,ZMB,Zambia,Eastern Africa,1742979
2,Cape Town,ZAF,South Africa,Southern Africa,3740026


Unnamed: 0,city,code,country,region,city_proper_pop
0,Harare,ZWE,Zimbabwe,Eastern Africa,1606000
1,Lusaka,ZMB,Zambia,Eastern Africa,1742979
2,Ekurhuleni,ZAF,South Africa,Southern Africa,3178470


In [12]:
# use tables countries (left) as c and economies (right) as e.
# To calculate per capita GDP per region, begin by grouping by region.
# After your GROUP BY, choose region in your SELECT statement, followed by average GDP per capita using the AVG() function, with AS avg_gdp as your alias.
# Order the result set by the average GDP per capita from highest to lowest.
# Return only the first 10 records in your result.

sql(
"""
SELECT 
    region,
    AVG(gdp_percapita) AS avg_gdp
FROM countries AS c
LEFT JOIN economies AS e
USING(code)
WHERE year = 2010
GROUP BY region
ORDER BY avg_gdp DESC
LIMIT 10;
"""  
)

# Modify this query to use RIGHT JOIN instead of LEFT JOIN
sql(
"""
SELECT 
    countries.country_name AS country, 
    languages.name AS language,
    percent
FROM countries
LEFT JOIN languages
USING(code)
ORDER BY language
LIMIT 3;
"""  
)

sql(
"""
SELECT 
    countries.country_name AS country, 
    languages.name AS language,
    percent
FROM languages
RIGHT JOIN countries
USING(code)
ORDER BY language
LIMIT 3;
"""  
)

Unnamed: 0,region,avg_gdp
0,Western Europe,58130.962857
1,Nordic Countries,57073.998
2,North America,47911.51
3,Australia and New Zealand,44792.385
4,British Islands,43588.33
5,Eastern Asia,24962.808
6,Southern Europe,22926.410909
7,Middle East,18204.641765
8,Baltic Countries,12631.03
9,Caribbean,11413.339462


Unnamed: 0,country,language,percent
0,Ethiopia,Afar,1.7
1,Eritrea,Afar,
2,Djibouti,Afar,


Unnamed: 0,country,language,percent
0,Ethiopia,Afar,1.7
1,Eritrea,Afar,
2,Djibouti,Afar,


## **2.2. FULL JOIN**

- `FULL JOIN`: Returns all records when there is a match in either left or right table. The result is NULL from both sides when there is no match.

```sql
SELECT *
FROM left_table
FULL JOIN right_table
ON left_table.id = right_table.id
```


<center>
<img src="figures/diagrams/full_join2.png" alt="drawing" width = 600/>
</center>

In [13]:
# select country_name as country, code, region, and basic_unit from countries (left)
# Perform a full join with countries (left) and currencies (right).
# Filter for the North America region or NULL country names.
print('FULL JOIN\n')
sql(
"""
SELECT  country_name AS country,
        code,
        region,
        basic_unit
FROM countries
FULL JOIN currencies
USING (code)
WHERE region = 'North America'
	OR country_name IS NULL
ORDER BY region;
"""  
)


# Repeat the same query as before, turning your full join into a left join with the currencies table.
print('LEFT JOIN\n')
sql(
"""
SELECT  country_name AS country,
        code,
        region,
        basic_unit
FROM countries
LEFT JOIN currencies
USING (code)
WHERE region = 'North America'
	OR country_name IS NULL
ORDER BY region;
"""  
)


# Repeat the same query again, this time performing an inner join of countries with currencies.
print('INNER JOIN\n')
sql(
"""
SELECT  country_name AS country,
        code,
        region,
        basic_unit
FROM countries
INNER JOIN currencies
USING (code)
WHERE region = 'North America'
	OR country_name IS NULL
ORDER BY region;
"""  
)



FULL JOIN



Unnamed: 0,country,code,region,basic_unit
0,Bermuda,BMU,North America,Bermudian dollar
1,Canada,CAN,North America,Canadian dollar
2,United States,USA,North America,United States dollar
3,Greenland,GRL,North America,
4,,CCK,,Australian dollar
5,,AIA,,East Caribbean dollar
6,,FLK,,Falkland Islands pound
7,,NIU,,New Zealand dollar
8,,SHN,,Saint Helena pound
9,,IOT,,United States dollar


LEFT JOIN



Unnamed: 0,country,code,region,basic_unit
0,Bermuda,BMU,North America,Bermudian dollar
1,Canada,CAN,North America,Canadian dollar
2,United States,USA,North America,United States dollar
3,Greenland,GRL,North America,


INNER JOIN



Unnamed: 0,country,code,region,basic_unit
0,Bermuda,BMU,North America,Bermudian dollar
1,Canada,CAN,North America,Canadian dollar
2,United States,USA,North America,United States dollar


In [14]:
# select name, region, basic_unit and frac_unit
# Complete the FULL JOIN with countries as c1 on the left and languages as l on the right, using code to perform this join.
# Next, chain this join with another FULL JOIN, placing currencies on the right, joining on code again.

sql(
"""
SELECT 
	c1.country_name AS country, 
    region, 
    l.name AS language,
	basic_unit, 
    frac_unit
FROM countries as c1 
FULL JOIN languages as l
USING(code)
FULL JOIN currencies as c2
USING(code)
WHERE region LIKE 'M%esia';
"""  
)


Unnamed: 0,country,region,language,basic_unit,frac_unit
0,Kiribati,Micronesia,English,Australian dollar,Cent
1,Marshall Islands,Micronesia,Other,United States dollar,Cent
2,Nauru,Micronesia,Other,Australian dollar,Cent
3,Palau,Micronesia,Other,United States dollar,Cent
4,Papua New Guinea,Melanesia,Other,Papua New Guinean kina,Toea
5,Solomon Islands,Melanesia,indigenous,Solomon Islands dollar,Cent
6,New Caledonia,Melanesia,Other,CFP franc,Centime
7,Vanuatu,Melanesia,Other,Vanuatu vatu,
8,Kiribati,Micronesia,Kiribati,Australian dollar,Cent
9,Marshall Islands,Micronesia,Marshallese,United States dollar,Cent


## 2.3 CROSS JOIN

- `CROSS JOIN`: Returns the Cartesian product of the sets of records from the two or more tables involved in the join. This means that for every row from the first table, it joins every row from the second table, resulting in a table that contains all possible combinations of rows between the joined tables. This operation does not require a join condition because it automatically combines each row from the first table with every row from the second table.

```sql
SELECT id1, id2
FROM table1
CROSS JOIN table2
```

<center>
<img src="figures/diagrams/cross_join.png" alt="drawing" width = 600/>
</center>

In [15]:
print('INNER JOIN\n')
sql(
"""
SELECT 
    c.country_name AS country,
    l.name AS language
FROM countries AS c
INNER JOIN languages AS l
USING(code)
WHERE c.code IN ('PAK','IND')
	AND l.code IN ('PAK','IND');
"""  
)

# Change INNER JOIN to a different kind of join to look at possible combinations of languages that could have been spoken 
print('CROSS JOIN\n')
sql(
"""
SELECT 
    c.country_name AS country,
    l.name AS language
FROM countries AS c
CROSS JOIN languages AS l
WHERE c.code IN ('PAK','IND')
	AND l.code IN ('PAK','IND');
"""  
)

INNER JOIN



Unnamed: 0,country,language
0,Pakistan,Punjabi
1,Pakistan,Sindhi
2,Pakistan,Saraiki
3,Pakistan,Pashto
4,Pakistan,Urdu
5,Pakistan,Balochi
6,Pakistan,Hindko
7,Pakistan,Brahui
8,Pakistan,English
9,Pakistan,Burushaski


CROSS JOIN



Unnamed: 0,country,language
0,Pakistan,Punjabi
1,Pakistan,Sindhi
2,Pakistan,Saraiki
3,Pakistan,Pashto
4,Pakistan,Urdu
5,Pakistan,Balochi
6,Pakistan,Hindko
7,Pakistan,Brahui
8,Pakistan,English
9,Pakistan,Burushaski


## 2.4 SELF JOIN

- `SELF JOIN`: A regular join, but the table is joined with itself. This can be useful for comparing values in a column to other values in the same column within the same table.

In [16]:
# Perform an inner join of populations with itself ON country_code, aliased p1 and p2 respectively.
# Select the country_code from p1 and the size field from both p1 and p2, 
# aliasing p1.size as size2010 and p2.size as size2015 (in that order).

sql(
"""
SELECT 
	p1.country_code, 
    p1.size AS size2010, 
    p2.size AS size2015
FROM populations AS p1
INNER JOIN populations AS p2
USING(country_code)
LIMIT 5;
"""  
)

# Since you want to compare records from 2010 and 2015, 
# eliminate unwanted records by extending the WHERE statement 
# to include only records where the p1.year matches p2.year - 5.

sql(
"""
SELECT 
	p1.country_code, 
    p1.size AS size2010, 
    p2.size AS size2015
FROM populations AS p1
INNER JOIN populations AS p2
USING(country_code)
WHERE p1.year = 2010
    AND p1.year = p2.year - 5
LIMIT 5;
"""  
)

Unnamed: 0,country_code,size2010,size2015
0,ABW,101597,103889
1,ABW,103889,103889
2,AFG,27962207,32526562
3,AFG,32526562,32526562
4,AGO,21219954,25021974


Unnamed: 0,country_code,size2010,size2015
0,ABW,101597,103889
1,AFG,27962207,32526562
2,AGO,21219954,25021974
3,ALB,2913021,2889167
4,AND,84419,70473


# **3. Set Theory for SQL Joins**

UNION, INTERSECT, and EXCEPT are set operations that allow to combine or compare the results of two or more SELECT statements.


<center>
<img src="figures/diagrams/venn.png" alt="drawing" width = 500/>
</center>


## **3.1 UNION**


- `UNION`: returns all unique rows from both tables. 
    - `UNION ALL`: returns all rows from both tables, including duplicates.


<center>
<img src="figures/diagrams/union.png" alt="drawing" width = 400 style="margin-right: 30px;"/>

<img src="figures/diagrams/union_all.png" alt="drawing" width = 335/>
</center>

The syntax for the `UNION` and `UNION ALL` is the following:

```sql
SELECT *
FROM left_table
UNION/UNION ALL
SELECT *
FROM right_table;
```

To use `UNION` the following is required:

<center>
<img src="figures/diagrams/union_syntax_2.png" alt="drawing" width = 500/>
</center>

This is not required for `UNION ALL`.

In [21]:
# Begin your query by selecting all fields from economies2015.
# Create a second query that selects all fields from economies2019.
# Perform a set operation to combine the two queries you just created, ensuring you do not return duplicates.
# order by code and year

sql(
"""
SELECT *
FROM economies2015
UNION 
SELECT * 
FROM economies2019
ORDER BY code, year
LIMIT 5;
"""  
)

Unnamed: 0,code,year,income_group,gross_savings
0,ABW,2015,High income,14.867852
1,AGO,2015,Lower middle income,25.021327
2,AGO,2019,Lower middle income,25.524848
3,ALB,2015,Upper middle income,16.863981
4,ALB,2019,Upper middle income,14.499826


In [24]:
# Perform an appropriate set operation that determines all pairs of country code and year (in that order) 
# from economies and populations, excluding duplicates.
# Order by country code and year.

sql(
"""
SELECT 
    code, 
    year
FROM economies
UNION
SELECT 
    country_code AS code,
    year
FROM populations
ORDER BY code, year
LIMIT 5;
"""  
)


# Amend the query to return all combinations (including duplicates) of country code and year in the economies or the populations tables.

sql(
"""
SELECT 
    code, 
    year
FROM economies
UNION ALL
SELECT 
    country_code AS code,
    year
FROM populations
ORDER BY code, year
LIMIT 5;
"""  
)

Unnamed: 0,code,year
0,ABW,2010
1,ABW,2015
2,AFG,2010
3,AFG,2015
4,AGO,2010


Unnamed: 0,code,year
0,ABW,2010
1,ABW,2015
2,AFG,2010
3,AFG,2010
4,AFG,2015


## 3.2 INTERSECT

- `INTERSECT`: returns all unique rows that are in both tables.

The syntax for the `INTERSECT` is the following:

```sql
SELECT id, val
FROM left_table
INTERSECT
SELECT id, val
FROM right_table;
```

<center>
<img src="figures/diagrams/intersect_two_columns.png" alt="drawing" width = 500/>
</center>

The difference between `INTERSECT` and `INNER JOIN` is that `INTERSECT` returns only the distinct rows that appear in the result sets of both `SELECT` statements. In contrast, `INNER JOIN` combines rows from two or more tables where the join condition is true, resulting in rows that include columns from both tables for which the specified condition matches.

<center>
<img src="figures/diagrams/diff_innerjoin_intersect.png" alt="drawing" width = 700/>
</center>

In [25]:
# Return all city names that are also country names.
sql(
"""
SELECT 
    name
FROM cities
INTERSECT
SELECT 
    country_name AS name
FROM countries;
"""  
)

Unnamed: 0,name
0,Singapore


## 3.2 EXCEPT

- `EXCEPT`: returns all unique rows from the first table (left) that are not in the second table (right).

```sql
SELECT *
FROM left_table
EXCEPT
SELECT *
FROM right_table;
```

<center>
<img src="figures/diagrams/except.png" alt="drawing" width = 600/>
</center>

In [26]:
# Return all cities that do not have the same name as a country.
sql(
"""
SELECT 
    name
FROM cities
EXCEPT
SELECT 
    country_name AS name
FROM countries
ORDER BY name
LIMIT 5;
"""  
)

Unnamed: 0,name
0,Abidjan
1,Abu Dhabi
2,Abuja
3,Accra
4,Addis Ababa
...,...
230,Yerevan
231,Yokohama
232,Zhengzhou
233,Zhongshan


# **4. Subquerying**

## 4.1 semi join and anti joins

semi join and anti join are types of joins that are used to filter the results of a query based on the results of a subquery. 

- `semi join`: Returns rows from the first table (left) where a match is found in the second table (right). This can be achieved using the WHERE and IN clause.

```sql
SELECT *
FROM left_table
WHERE col1 IN
    (SELECT col2
    FROM right_table);
```

<center>
<img src="figures/diagrams/semi2.png" alt="drawing" width = 600/>
</center>

- `anti join`: Returns rows from the first table where a match is not found in the second table. This can be achieved using the WHERE and NOT IN clause.

```sql
SELECT *
FROM left_table
WHERE col1 NOT IN
    (SELECT col2
    FROM right_table);
```

<center>
<img src="figures/diagrams/anti2.png" alt="drawing" width = 600/>
</center>

In [None]:
# Select country code as a single field from the countries table, filtering for countries in the 'Middle East' region.
sql(
"""
SELECT 
"""
)