# UNION vs. UNION ALL

Nice work learning all about UNION and UNION ALL!

Two tables, `languages` and `currencies`, are provided. Run the queries provided in the console and select the correct answer for the multiple-choice questions in this exercise.

In [1]:
from pandasql import sqldf
import pandas as pd

# Create helper function for easier query execution
execute = lambda q: sqldf(q, globals())

In [8]:
import pandas as pd
economies2015 = pd.read_csv("dataset/countries/economies2015.csv")
economies2019 = pd.read_csv("dataset/countries/economies2019.csv")
currencies = pd.read_csv("dataset/countries/currencies.csv")
cities = pd.read_csv("dataset/countries/cities.csv")
populations = pd.read_csv("dataset/countries/populations.csv")
languages = pd.read_csv("dataset/countries/languages.csv")
countries = pd.read_csv("dataset/countries/countries.csv")
economies = pd.read_csv("dataset/countries/economies.csv")
countries.rename(columns={'country_name':'name'}, inplace=True)
# populations = reviews.reset_index()
# reviews.columns = ['id',	'film_id',	'num_user',	'num_critic',	'imdb_score',	'num_votes',	'facebook_likes']
# print(reviews.columns)
economies2019.head()


Unnamed: 0,code,year,income_group,gross_savings
0,AGO,2019,Lower middle income,25.524848
1,ALB,2019,Upper middle income,14.499826
2,ARG,2019,Upper middle income,14.285295
3,ARM,2019,Upper middle income,9.815574
4,ATG,2019,High income,26.383427


In [4]:
# query = """
# SELECT * 
# FROM languages
# UNION
# SELECT * 
# FROM currencies;
# """
# result_df = execute(query)

# # Show results
# result_df.head()

In [5]:
query = """
SELECT code FROM
languages
UNION ALL
SELECT code FROM 
currencies;
"""
result_df = execute(query)

# Show results
result_df.head()

Unnamed: 0,code
0,AFG
1,AFG
2,AFG
3,AFG
4,ALB


In [6]:
query = """
SELECT code 
FROM languages
UNION
SELECT curr_id 
FROM currencies;
"""
result_df = execute(query)

# Show results
result_df.head()

Unnamed: 0,code
0,1
1,2
2,3
3,4
4,5


# Comparing global economies

Are you ready to perform your first set operation?

In this exercise, you have two tables, `economies2015` and `economies2019`, available to you under the tabs in the console. You'll perform a set operation to stack all records in these two tables on top of each other, excluding duplicates.

When drafting queries containing set operations, it is often helpful to write the queries on either side of the operation first, and then call the set operator. The instructions are ordered accordingly.

In [9]:
query = """
-- Select all fields from economies2015
SELECT *
FROM economies2015    
-- Set operation
UNION
-- Select all fields from economies2019
SELECT *
FROM economies2019 
ORDER BY code, year;
"""
result_df = execute(query)

# Show results
result_df.head()

Unnamed: 0,code,year,income_group,gross_savings
0,ABW,2015,High income,14.867852
1,AGO,2015,Lower middle income,25.021327
2,AGO,2019,Lower middle income,25.524848
3,ALB,2015,Upper middle income,16.863981
4,ALB,2019,Upper middle income,14.499826


# Comparing two set operations

You learned in the video exercise that `UNION ALL` returns duplicates, whereas `UNION` does not. In this exercise, you will dive deeper into this, looking at cases for when `UNION` is appropriate compared to `UNION ALL`.

You will be looking at combinations of country code and year from the `economies` and `populations` tables.

In [10]:
query = """
-- Query that determines all pairs of code and year from economies and populations, without duplicates
SELECT code, year
FROM economies
UNION
SELECT country_code, year
FROM populations
"""
result_df = execute(query)

# Show results
result_df.head()

Unnamed: 0,code,year
0,ABW,2010
1,ABW,2015
2,AFG,2010
3,AFG,2015
4,AGO,2010


In [12]:
query = """
-- Query that determines all pairs of code and year from economies and populations, with duplicates
SELECT code, year
FROM economies
-- Set theory clause
UNION ALL
SELECT country_code, year
FROM populations
ORDER BY code, year;
"""
result_df = execute(query)

# Show results
result_df.head()

Unnamed: 0,code,year
0,ABW,2010
1,ABW,2015
2,AFG,2010
3,AFG,2010
4,AFG,2015


# INTERSECT

Well done getting through the material on `INTERSECT`!

Let's say you are interested in those countries that share names with cities. Use this task as an opportunity to show off your knowledge of set theory in SQL!

In [13]:
query = """
-- Return all cities with the same name as a country
SELECT name
FROM cities
INTERSECT
SELECT name
FROM countries
"""
result_df = execute(query)

# Show results
result_df.head()

Unnamed: 0,name
0,Singapore


# Review UNION and INTERSECT

Which of the following definitions of set operations is correct?

- `INTERSECT`: returns only records appearing in both tables
- `UNION`: returns all records (NO duplicates) in both tables
- `UNION ALL`: returns all records (WITH duplicates) in both tables


# You've got it, EXCEPT...

Just as you were able to leverage `INTERSECT` to find the names of `cities` with the same names as `countries`, you can also do the reverse, using `EXCEPT`.

In this exercise, you will find the names of `cities` that do not have the same names as their `countries`

In [14]:
query = """
-- Return all cities that do not have the same name as a country
SELECT name
FROM cities
EXCEPT
SELECT name
FROM countries
ORDER BY name;
"""
result_df = execute(query)

# Show results
result_df.head()

Unnamed: 0,name
0,Abidjan
1,Abu Dhabi
2,Abuja
3,Accra
4,Addis Ababa


# Calling all set operators

Test your knowledge of set operators in SQL by classifying the below use cases into the correct buckets.

Think of how the information in each use case could be stored as tables, and recall the Venn diagrams you have learned, shown below!

<center><img src="images/03.07.png"  style="width: 400px, height: 300px;"/></center>
