# 3. Set theory clauses
**In this chapter, you'll learn more about set theory using Venn diagrams and get an introduction to union, union all, intersect, and except clauses. You'll finish by investigating semi joins and anti joins, which provide a nice introduction to subqueries.**

In [1]:
%load_ext sql
%sql sqlite://

## State of the UNION
Next are set theory clauses. We'll focus on the operations of `UNION` and `UNION ALL`. In addition to joining diagrams, in this chapter, you'll also see how Venn diagrams can be used to represent set operations. Let's begin with these Venn diagrams now.

![](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTWsR4KMIchYhAmiAfrw4Tx8q_2JDYQl1aF4w&usqp=CAU)

### Set Theory Venn Diagrams
You can think of each circle as representing a table of data. The shading represents what is included in the result of the set operation from each table.

- `UNION` includes every record in both tables but **DOES NOT double count** those that are in both tables. 
- `UNION ALL` includes every record in both tables and **DOES replicate** those that are in both tables. This is why the center is shaded black. 

The two diagrams on the bottom represent only subsets of data being selected. 

- `INTERSECT` results in only those records found in both of the two tables. 
- `EXCEPT` results in only those records in one table **BUT NOT** the other. 

Let's investigate what `UNION` looks like as a joining diagram.

### UNION diagram
You have two tables with names `left_one` and `right_one`. The "one" here corresponds to each table only having one field. If you run a `UNION` on these two fields you get each record appearing in either table, but notice that the id values of 1 and 4 in `right_one` are not included again in the `UNION` since they were already found in the `left_one` table.


In [2]:
%sql sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/diagrams.sqlite

In [3]:
%sql SELECT * FROM left_one;

   sqlite://
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/diagrams.sqlite
Done.


id
1
2
3
4
1
4
5
6
1
2


In [4]:
%sql SELECT * FROM right_one;

   sqlite://
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/diagrams.sqlite
Done.


id,val
1,L1
2,L2
3,L3
4,L4
1,R1
4,R2
5,R3
6,R4
1,R1
1,R2


In [5]:
%%sql
SELECT id FROM left_one
UNION
SELECT id FROM right_one;

   sqlite://
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/diagrams.sqlite
Done.


id
1
2
3
4
5
6
A
B
C


### UNION ALL diagram
By contrast (with the same two tables `left_one` and `right_one`), `UNION ALL` includes all duplicates in its result. 

In [6]:
%%sql
SELECT id FROM left_one
UNION ALL
SELECT id FROM right_one;

   sqlite://
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/diagrams.sqlite
Done.


id
1
2
3
4
1
4
5
6
1
2


If it were the case that `right_one` had these same four values and also one more value of 1 for id, you'd see three entries for 1 in the resulting `UNION ALL`. Let's check out the SQL syntax using the leaders database for both `UNION` and `UNION ALL`, but first you'll see one more table in the leaders database.

In [7]:
%sql sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite

In [8]:
%%sql
SELECT *
FROM monarchs;

   sqlite://
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/diagrams.sqlite
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


country,continent,monarch
Brunei,Asia,Hassanal Bolkiah
Oman,Asia,Qaboos bin Said al Said
Norway,Europe,Harald V
Spain,Europe,Felipe VI


### monarchs table
Check out the `monarchs` table in the `leaders` database that we will use in examples here. The table lists the `country`, `continent`, and the name of the `monarch` for that country. Do some of these names look familiar based on the other tables you've seen? They should! We'll come back to this.

### All prime ministers and monarchs
You can use `UNION` on the `prime_ministers` and `monarchs` table to show all of the different prime ministers and monarchs in these two tables. The `country` field is also included here for reference. Note that the `prime_minister` field has been aliased `as leader`. In fact, the resulting field from the `UNION will` have the name of leader. That's an important property of the set theory clauses you will see in this chapter. The fields included in the operation must be of the same data type since they come back as just a single field. You can't stack a number on top of a character field in other words.


In [9]:
%%sql
SELECT prime_minister AS leader, country
FROM prime_ministers
UNION
SELECT monarch, country
FROM monarchs
ORDER BY country;

   sqlite://
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/diagrams.sqlite
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


leader,country
Malcolm Turnbull,Australia
Hassanal Bolkiah,Brunei
Sherif Ismail,Egypt
Jack Guy Lafontant,Haiti
Narendra Modi,India
Erna Solberg,Norway
Harald V,Norway
Qaboos bin Said al Said,Oman
Antonio Costa,Portugal
Felipe VI,Spain


### Resulting table from UNION
Our resulting table from the `UNION` gives all the leaders and their corresponding country. Does something stand out to you here?

### UNION ALL with leaders
The countries of Brunei and Oman were listed only once in the `UNION` table. These countries have monarchs that also act as prime ministers. This can be seen in the `UNION ALL` results. You've seen

In [10]:
%%sql
SELECT prime_minister AS leader, country
FROM prime_ministers
UNION ALL
SELECT monarch, country
FROM monarchs
ORDER BY country;

   sqlite://
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/diagrams.sqlite
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


leader,country
Malcolm Turnbull,Australia
Hassanal Bolkiah,Brunei
Hassanal Bolkiah,Brunei
Sherif Ismail,Egypt
Jack Guy Lafontant,Haiti
Narendra Modi,India
Erna Solberg,Norway
Harald V,Norway
Qaboos bin Said al Said,Oman
Qaboos bin Said al Said,Oman


## Union
You have two new tables, `economies2010` and `economies2015`, available to you. The `economies` table is also included for reference.

In [11]:
%sql sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite

- Combine the two new tables into one table containing all of the fields in `economies2010`.
- Sort this resulting single table by country code and then by year, both in ascending order.

In [12]:
%%sql
SELECT * FROM economies2010
UNION
SELECT * FROM economies2015
ORDER BY code, year
LIMIT 10;

   sqlite://
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/diagrams.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


code,year,income_group,gross_savings
AFG,2010,Low income,37.133
AFG,2015,Low income,21.466
AGO,2010,Upper middle income,23.534
AGO,2015,Upper middle income,-0.425
ALB,2010,Upper middle income,20.011
ALB,2015,Upper middle income,13.84
ARE,2010,High income,27.073
ARE,2015,High income,34.106
ARG,2010,Upper middle income,17.361
ARG,2015,Upper middle income,14.111


```
Showing 10 out of 380 rows
```

## Union (2)
`UNION` can also be used to determine all occurrences of a field across multiple tables. Try out this exercise with no starter code.

- Determine all (non-duplicated) country codes in either the `cities` or the `currencies` table. The result should be a table with only one field called `country_code`.
- Sort by `country_code` in alphabetical order.

In [13]:
%%sql
SELECT country_code FROM cities
UNION
SELECT code FROM currencies
ORDER BY country_code
LIMIT 10;

   sqlite://
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/diagrams.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


country_code
ABW
AFG
AGO
AIA
ALB
AND
ARE
ARG
ARM
ATG


```
Showing 10 out of 205 rows
```

## Union all
As you saw, duplicates were removed from the previous two exercises by using `UNION`.

To include duplicates, you can use `UNION ALL`.

- Determine all combinations (include duplicates) of country code and year that exist in either the `economies` or the `populations` tables. Order by `code` then `year`.
- The result of the query should only have two columns/fields. Think about how many records this query should result in.

In [14]:
%%sql
SELECT code, year FROM economies
UNION All
SELECT country_code, year FROM populations
ORDER BY code, year
LIMIT 10;

   sqlite://
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/diagrams.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


code,year
ABW,2010
ABW,2015
AFG,2010
AFG,2010
AFG,2015
AFG,2015
AGO,2010
AGO,2010
AGO,2015
AGO,2015


```
Showing 10 out of 814 rows
```

*Can you spot some duplicates in the query result?* **YES**

---
## INTERSECTional data science
You saw with `UNION` and `UNION ALL` via examples that they do not do quite the same thing as what a join does. They only bind fields on top of one another in the two tables. The set theory clause `INTERSECT` works in a similar fashion to `UNION` and `UNION ALL`, but remember from the Venn diagram that `INTERSECT` only includes those records in common to both tables and fields selected. Let's investigate the diagram for `INTERSECT` and the corresponding SQL code to achieve it.

### INTERSECT diagram and SQL code
The result of the `INTERSECT` on `left_one` and `right_one` is only the records in common to both `left_one` and `right_one`. 

In [15]:
%sql sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/diagrams.sqlite

In [16]:
%%sql
SELECT id FROM left_one
INTERSECT
SELECT id FROM right_one;

   sqlite://
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/diagrams.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


id
1
2
3
4
5
6


Let's next see how you could use `INTERSECT` to determine all countries having both a prime minister and a president.

### Prime minister and president countries
The code for each of these set operations has a similar layout. You first select which fields you'd like to include in your first table, and then you specify the name of the first table. Next you specify the set operation to perform. Lastly, you denote which fields in the second table you'd like to include and then the name of the second table. The result of the query is the four countries with both a prime minister and a president in the leaders database.

In [17]:
%sql sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite

In [18]:
%%sql
SELECT country FROM prime_ministers
INTERSECT
SELECT country FROM presidents;

   sqlite://
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/diagrams.sqlite
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


country
Egypt
Haiti
Portugal
Vietnam


### INTERSECT on two fields
Next, let's think about what would happen if we tried to select two columns instead of one from our previous example. The code shown does just that. What will be the result of this query? Will this also give you the names of the countries that have both a prime minister and a president? 

In [19]:
%%sql
SELECT country, prime_minister AS leader
FROM prime_ministers
INTERSECT
SELECT country, president
FROM presidents;

   sqlite://
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/diagrams.sqlite
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


country,leader


The actual result is an empty table. Why is that? When `INTERSECT` looks at two columns it includes both columns in the search. So it didn't find any countries with prime ministers `AND` presidents having the same name. `INTERSECT` looks for RECORDS in common, not individual key fields like what a join does to match. This is an important distinction.

## Intersect
`UNION ALL` will extract all records from two tables, while `INTERSECT` will only return records that both tables have in common. In this exercise, you will create a similar query as before, however, this time you will look at the records in common for country code and year for the `economies` and `populations` tables.

Note the number of records from the result of this query compared to the similar `UNION ALL` query result of 814 records.

- Use `INTERSECT` to determine the records in common for country code and year for the `economies` and `populations` tables.
- Again, order by `code` and then by `year`, both in ascending order.

In [20]:
%sql sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite

In [21]:
%%sql
SELECT code, year FROM economies
INTERSECT
SELECT country_code, year FROM populations
ORDER BY code, year
LIMIT 10;

   sqlite://
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/diagrams.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


code,year
AFG,2010
AFG,2015
AGO,2010
AGO,2015
ALB,2010
ALB,2015
ARE,2010
ARE,2015
ARG,2010
ARG,2015


```
Showing 10 out of 434 rows
```

## Intersect (2)
As you think about major world cities and their corresponding country, you may ask *which countries also have a city with the same name as their country name?*

- Use `INTERSECT` to answer this question with `countries` and `cities`.

In [22]:
%%sql
SELECT country_name FROM countries
INTERSECT
SELECT name FROM cities;

   sqlite://
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/diagrams.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


country_name
Hong Kong
Singapore


---
## EXCEPTional
You've made it to the last of the four set theory clauses in this course. `EXCEPT` allows you to include only the records that are in one table, but not the other. Let's mix things up and look into the SQL code and result first and then dive into the diagram.

### Monarchs that aren't prime ministers
You saw earlier that there are some monarchs that also act as the prime minister for their country. One way to determine those monarchs in the `monarchs` table that do not also hold the title of prime minister is to use the `EXCEPT` clause. 

This SQL query selects the monarch field from monarchs and then looks for common entries with the `prime_ministers` field, while also keeping track of the country for each leader.

In [23]:
%sql sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite

In [25]:
%%sql
SELECT monarch, country FROM monarchs
EXCEPT
SELECT prime_minister, country
FROM prime_ministers;

   sqlite://
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/diagrams.sqlite
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


monarch,country
Felipe VI,Spain
Harald V,Norway


You can see in the resulting query that only the two European monarchs are not also prime ministers in the leaders database.

### EXCEPT diagram
This diagram gives the structure of `EXCEPT` clauses. 

![](https://www.sqlservertutorial.net/wp-content/uploads/SQL-Server-EXCEPT-illustration.png)

Only the records that appear in the left table **BUT DO NOT** appear in the right table are included.

## Except
Get the names of cities in `cities` which are not noted as capital cities in `countries` as a single field result.

Note that there are some countries in the world that are not included in the `countries` table, which will result in some cities not being labeled as capital cities when in fact they are.

- Order the resulting field in ascending order.
- Can you spot the city/cities that are actually capital cities which this query misses?

In [34]:
%sql sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite

In [27]:
%%sql
SELECT name FROM cities
EXCEPT
SELECT capital FROM countries
ORDER BY name
LIMIT 10;

   sqlite://
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/diagrams.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


name
Abidjan
Ahmedabad
Alexandria
Almaty
Auckland
Bandung
Barcelona
Barranquilla
Basra
Belo Horizonte


```
Showing 10 out of 170 rows
```

## Except (2)
Now you will complete the previous query in reverse!

Determine the names of capital cities that are **not** listed in the `cities` table.

- Order by `capital` in ascending order.
- The `cities` table contains information about 236 of the world's most populous cities. The result of your query may surprise you in terms of the number of capital cities that **do not** appear in this list.

In [31]:
%%sql
SELECT capital FROM countries
WHERE capital IS NOT NULL
EXCEPT
SELECT name FROM cities
ORDER BY capital
LIMIT 10;

   sqlite://
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/diagrams.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


capital
Agana
Amman
Amsterdam
Andorra la Vella
Antananarivo
Apia
Ashgabat
Asmara
Astana
Asuncion


---
## Semi-joins and Anti-joins
 The six joins you've worked with so far are all additive joins in that they add columns to the original "left" table. The name of all six: 1. INNER JOIN, 2. self-join, 3. LEFT JOIN, 4. RIGHT JOIN, 5. FULL JOIN, and 6. CROSS JOIN.

### Building up to a semi-join
The last two joins we will cover use a right table to determine which records to keep in the left table. In other words, you use these last two joins (semi-join and anti-join) in a way similar to a `WHERE` clause dependent on the values of a second table. Let's try out some examples of semi-joins and anti-joins and then return to the diagrams for each. Suppose that you are interested in determining the presidents of countries that gained independence before 1800. Let's first determine which countries this corresponds to in the states table.

In [37]:
%sql sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite

In [38]:
%%sql
SELECT name
FROM states
WHERE indep_year < 1800;

   sqlite://
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/diagrams.sqlite
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


name
Portugal
Spain


Recall from your knowledge of SQL before you knew anything about `JOIN`s how this could be done. To get only the countries meeting this condition you can use the `WHERE` clause. 

### Another step towards the semi-join
We'll next set up the other part of the query to get the presidents we want. What code is needed to retrieve the `president`, `country`, and `continent` columns from the `presidents` table in that order?


In [39]:
%%sql
SELECT president, country, continent
FROM presidents

   sqlite://
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/diagrams.sqlite
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


president,country,continent
Abdel Fattah el-Sisi,Egypt,Africa
Marcelo Rebelo de Sousa,Portugal,Europe
Jovenel Moise,Haiti,North America
Jose Mujica,Uruguay,South America
Ellen Johnson Sirleaf,Liberia,Africa
Michelle Bachelet,Chile,South America
Tran Dai Quang,Vietnam,Asia


Now we need to use this result with the one in the previous slide to further filter the country field in the presidents table to give us the correct result. Let's see how this might be done next.

### Finish the semi-join (an intro to subqueries)
In the first query of this example, we determined that Portugal and Spain were both independent before 1800. In the second query, we determined how to display the table in a nice form to answer our question. In order to combine the two tables together we will again use a `WHERE` clause and then use the first query as the condition to check in the `WHERE` clause. Check it out.


In [40]:
%%sql
SELECT president, country, continent
FROM presidents
WHERE country IN
    (SELECT name
    FROM states
    WHERE indep_year < 1800);

   sqlite://
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/diagrams.sqlite
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


president,country,continent
Marcelo Rebelo de Sousa,Portugal,Europe


This is your first example of a subquery: a query that sits inside of another query. 

What does this give as a result? Is it the presidents of Spain and of Portugal? Since Spain does not have a president, it is not included here and only the Portuguese president is listed. The semi-join chooses records in the first table where a condition **IS** met in a second table. An anti-join chooses records in the first table where a condition **IS NOT** met in the second table. How might you determine countries in the Americas founded after 1800?

### An anti-join
```sql
SELECT president, country, continent
FROM presidents
WHERE ___ LIKE '___'
    AND country ___ IN
        (SELECT name
         FROM states
         WHERE indep_year < 1800);
```
Using the code from the previous example, you only need to add a few pieces of code. So what goes in the blanks? 

Fill in the `WHERE` clause by choosing only those continents ending in America and then fill in the other space with a **NOT** to exclude those countries in the subquery. ### The result of the anti-join
The presidents of countries in the Americas founded after 1800 are given in the table.

### Semi-join and anti-join diagrams
The semi-join matches records by key field in the right table with those in the left. It then picks out only the rows in the left table that match that condition. The anti-join picks out those columns in the left table that do not match the condition on the right table. Semi-joins and anti-joins don't have the same built-in SQL syntax that INNER JOIN and LEFT JOIN have. They are useful tools in filtering one table's records on the records of another table.

In [42]:
%%sql
SELECT president, country, continent
FROM presidents
WHERE continent LIKE '%America'
    AND country NOT IN
        (SELECT name
         FROM states
         WHERE indep_year < 1800);

   sqlite://
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/diagrams.sqlite
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


president,country,continent
Jovenel Moise,Haiti,North America
Jose Mujica,Uruguay,South America
Michelle Bachelet,Chile,South America


### The result of the anti-join
The presidents of countries in the Americas founded after 1800 are given in the table.

### Semi-join and anti-join diagrams
![](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTWMJdmwvrnjUtkAUD-ZAGFgByMyalJp5odvA&usqp=CAU)

The semi-join matches records by key field in the right table with those in the left. It then picks out only the rows in the left table that match that condition. The anti-join picks out those columns in the left table that do not match the condition on the right table. Semi-joins and anti-joins don't have the same built-in SQL syntax that `INNER JOIN` and `LEFT JOIN` have. They are useful tools in filtering one table's records on the records of another table.

## Semi-join
You are now going to use the concept of a semi-join to identify languages spoken in the Middle East.

- Begin by selecting all country codes in the Middle East as a single field result using `SELECT`, `FROM`, and `WHERE`.

In [44]:
%sql sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite

In [45]:
%%sql
SELECT code FROM countries
WHERE region = 'Middle East';

   sqlite://
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/diagrams.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


code
ARE
ARM
AZE
BHR
GEO
IRQ
ISR
YEM
JOR
KWT


- Select only unique languages by name appearing in the `languages` table.
- Order the resulting single field table by `name` in ascending order.

In [47]:
%%sql
SELECT DISTINCT name
FROM languages
ORDER BY name
LIMIT 10;

   sqlite://
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/diagrams.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


name
Afar
Afrikaans
Akyem
Albanian
Alsatian
Amerindian
Amharic
Angolar
Antiguan creole
Arabic


```
Showing 10 out of 396 rows
```

- Combine the previous two queries into one query by adding a `WHERE IN` statement to the `SELECT DISTINCT` query to determine the unique languages spoken in the Middle East.
- Order the result by `name` in ascending order.

In [49]:
%%sql
SELECT DISTINCT name
FROM languages
WHERE code IN
    (SELECT code
    FROM countries
    WHERE region = 'Middle East')
ORDER BY name
LIMIT 10;

   sqlite://
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/diagrams.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


name
Arabic
Aramaic
Armenian
Azerbaijani
Azeri
Baluchi
Bulgarian
Circassian
English
Farsi


```
Showing 10 out of 27 rows
```

## Relating semi-join to a tweaked inner join
Let's revisit the code from the previous exercise, which retrieves languages spoken in the Middle East.
```sql
SELECT DISTINCT name
FROM languages
WHERE code IN
  (SELECT code
   FROM countries
   WHERE region = 'Middle East')
ORDER BY name;
```
Sometimes problems solved with semi-joins can also be solved using an inner join.
```sql
SELECT languages.name AS language
FROM languages
INNER JOIN countries
ON languages.code = countries.code
WHERE region = 'Middle East'
ORDER BY language;
```
This inner join isn't quite right. 

What is missing from this second code block to get it to match with the correct answer produced by the first block?

1. ~`HAVING` instead of `WHERE`~
2. **`DISTINCT`**
3. ~`UNIQUE`~

**Answer: 2** *There's no use on retrieving 'Arabic' multiple times; you only care about* `DISTINCT` *languages here.*

```sql
SELECT DISTINCT languages.name AS language
FROM languages
INNER JOIN countries
ON languages.code = countries.code
WHERE region = 'Middle East'
ORDER BY language;
```

## Diagnosing problems using anti-join
Another powerful join in SQL is the anti-join. It is particularly useful in identifying which records are causing an incorrect number of records to appear in join queries.

You will also see another example of a subquery here, as you saw in the first exercise on semi-joins. Your goal is to identify the currencies used in Oceanian countries!

- Begin by determining the number of countries in `countries` that are listed in Oceania using `SELECT`, `FROM`, and `WHERE`.

In [51]:
%%sql
SELECT COUNT(*) AS num_of_countries
FROM countries
WHERE continent = 'Oceania';

   sqlite://
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/diagrams.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


num_of_countries
19


- Complete an inner join with `countries AS c1` on the left and `currencies AS c2` on the right to get the different currencies used in the countries of Oceania.
- Match `ON` the `code` field in the two tables.
- Include the country `code`, country `name`, and `basic_unit AS currency`.

Observe the query result and make note of how many *different* countries are listed here.

In [55]:
%%sql
SELECT c1.code, country_name, basic_unit AS currency
FROM countries AS c1
INNER JOIN currencies AS c2
    ON c1.code = c2.code
WHERE continent = 'Oceania'
LIMIT 10;

   sqlite://
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/diagrams.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


code,country_name,currency
AUS,Australia,Australian dollar
KIR,Kiribati,Australian dollar
MHL,Marshall Islands,United States dollar
NRU,Nauru,Australian dollar
PLW,Palau,United States dollar
PNG,Papua New Guinea,Papua New Guinean kina
PYF,French Polynesia,CFP franc
SLB,Solomon Islands,Solomon Islands dollar
WSM,Samoa,Samoan tala
TON,Tonga,Tongan paʻanga


```
Showing 10 out of 15 rows
```

Note that not all countries in Oceania were listed in the resulting inner join with `currencies`. Use an anti-join to determine which countries were not included.

- Use `NOT IN` and (`SELECT code FROM currencies`) as a subquery to get the country code and country name for the Oceanian countries that are not included in the `currencies` table.

In [57]:
%%sql
SELECT code, country_name
FROM countries
WHERE continent = 'Oceania'
    AND code NOT IN
        (SELECT code
        FROM currencies);

   sqlite://
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/diagrams.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


code,country_name
ASM,American Samoa
FJI,Fiji Islands
GUM,Guam
FSM,"Micronesia, Federated States of"
MNP,Northern Mariana Islands


## Set theory challenge
You've now made your way to the challenge problem for this third chapter. Your task here will be to incorporate two of `UNION`/`UNION ALL`/`INTERSECT`/`EXCEPT` to solve a challenge involving three tables.

In addition, you will use a subquery as you have in the last two exercises.

- Identify the country codes that are included in either `economies` or `currencies` but not in `populations`.
- Use that result to determine the names of cities in the countries that match the specification in the previous instruction.

In [58]:
%%sql
SELECT name
FROM cities AS c1
WHERE country_code IN
    (SELECT e.code FROM economies AS e
    UNION ALL
    SELECT c2.code FROM currencies AS c2
    EXCEPT
    SELECT p.country_code FROM populations AS p
    );

   sqlite://
 * sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/countries.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/diagrams.sqlite
   sqlite:////Users/sj501/Documents/Jupyter/Jupyter_lab/17_Joining_Data_in_SQL/leaders.sqlite
Done.


name
Bucharest
Kaohsiung
New Taipei City
Taichung
Tainan
Taipei
