# SQL Part 3: Non-JOIN Ways to Combine Data

<img src="https://media3.giphy.com/media/UqeH2KKx0U65oETdDR/source.gif" width="300" height="300" />

**What makes a database 'relational'?**

*There are relationships between the data in different tables*

**What is a primary key?**

*The column in a table that has a unique value for every record/row*

**What is a foreign key?**

*A column in a table that has values from the primary key column in another table*

**Which type of JOIN is illustrated here?**

<img src="Inner Join Quiz Pic.png" width="700" height="700" />

*INNER JOIN : returns a table that includes only the records/rows that share the same values for the columns you are joining on*

**Which type of JOIN is illustrated here?**

<img src="Left Join Quiz Pic.png" width="700" height="700" />

*LEFT JOIN : returns a table that includes all of the records/rows from one table (the 'left' one), and only the records from the other table (the 'right' one) that match up to the rows on the first table*

**Which type of JOIN is illustrated here?**

<img src="Full Join Quiz Pic.png" width="700" height="700" />

*FULL JOIN : returns a table that includes all of the records/rows from both tables, matched up when possible*

So generally, we can see that, in general, JOINs are used in queries where the resulting table has columns from both tables being joined. But what if we:
* Want to take information from a second table into account, but don't plan on including any of its columns in the resulting table? 
* Want to make a query that can't be done with a JOIN? 
* Want to combine the *rows* of two different tables, instead of the *columns*? 

This is where we need different ways to combine and relate data, and that is where subqueries, UNION, INTERSECT, and EXCEPT come into play

---

# Subqueries

**Subquery** : a query that is contained inside of another query, most often in the WHERE clause; also known as an 'inner query'

There are two types of Subqueries:
* Self-Contained
* Correlated

Subqueries can also return two types of information:
* a single value
* multiple values

**<span style="color:purple">REMEMBER</span>** : whatever is in the SELECT clause is what is 'returned' by a query. That is the only thing that is going to be used by the outer query

Depending on the type of information that is returned, the subquery can be used in different ways. 

<img src="Types of Subqueries.png" width="700" height="700" />

*(image from https://visualizeright.com/2019/03/15/subqueries/)*

# For Any Type of Subquery...

**When you encounter a question where there seems to be a question within the question, that is probably a situation where you'll want to use a subquery (as long as you don't need to return columns from different tables)**

The question *'Which books have more than the average number of versions?'* requires you to first get the answer to the question *'What is the average number of versions for a book to have?'*. You could go find that answer first with a separate query and then manually plug the answer into a second query, but that leaves room for copying error and you won't have the right answer if later on the information in the table changes and that value for the average is no longer correct

**Wherever it is, the subquery must be surrounded by parentheses:**

In [None]:
SELECT 
    column1
FROM 
    table1
WHERE 
    column2 IN
    (subquery)

**A subquery can go in a lot of different places (SELECT clause, FROM clause, WHERE clause, HAVING clause, etc.). You can even have a subquery inside another subquery! It is usually used in the WHERE clause.**

In [None]:
# In the SELECT clause
SELECT 
    (SELECT Aggregate_function(column_name)
     FROM table_name) 
    AS table_alias
FROM
    table_name

In [None]:
# In the FROM clause (this is also called an inline view or derived table)
SELECT 
    column_name(s)
FROM 
    (SELECT column_name(s) 
     FROM table_name) 
    AS table_alias
WHERE condition

In [None]:
# In the WHERE clause
SELECT 
    column_name(s)
FROM 
    table_name_1
WHERE 
    column_name expression_operator{=,NOT IN,IN, <,>, etc}
    (SELECT column_name(s) 
     FROM table_name_2)

In [None]:
# In the HAVING clause
SELECT 
    column_name(s)
FROM 
    table_name_1
WHERE 
    condition
GROUP BY 
    column_name(s)
HAVING 
    Aggregate_function(column_name) expression_operator{=,<,>}
    (SELECT Aggregate_function(column_name) 
     FROM table_name_2)

**If you are using a subquery in a WHERE clause, the column name in the WHERE clause must be join-compatible with the column being returned by the the subquery (in the inner SELECT statement)**

Don't want to end up with something like 'WHERE datetime IN \[1, 2, 3, 4, 5\]'

**There should only be one column in the SELECT clause of a subquery, unless you are comparing against multiple columns (or with EXISTS)**

Similar to the previous statement. If you are returning two column, you should be comparing them to two columns like below:

In [None]:
SELECT 
    column_names(s)
FROM
    table_name_1 AS a
WHERE
    (a.column1, a.column2) IN
    (SELECT b.column1, b.column2
     FROM table_name_2 AS b)

**It is always a good idea to use table aliases to keep track of which column belongs to which table, especially if columns have the same name in different tables**

In [None]:
SELECT
    a.column1
FROM
    table1 AS a
WHERE 
    a.column2 IN
    (SELECT b.column1
     FROM table2 AS b)

Without aliases, columns are first looked for in the table at the same level as the expression, and then if not found there it will look for the column in a higher level query (the outer query).

**You can't reference columns in the outer query from a table that only exists in an inner query**

If a table appears only in a subquery and not in the outer query, you can't reference that table in the outer query. This is why you can't include columns from the inner query table in the outer query SELECT statement. For instance the following query would return an error:

In [None]:
SELECT
    a.column1, b.column3
FROM 
    table1 AS a
WHERE
    a.column2 IN
    (SELECT b.column1
     FROM table2 AS b)

If you want to be able to SELECT column from multiple tables, you'll have to use JOINs

**A JOIN can ALWAYS be expressed as a subquery. A subquery can often, but not always, be expressed as a JOIN.**

When switching back and forth between a JOIN and a subquery, the attribute that is compared to the subquery in a WHERE statement is likely the attribute that you are going to be joining on if you wrote it as a JOIN statement

# Subqueries That Return One Value

A subquery that returns one value is used with an unmodified comparison operator.

**Unmodified Comparison Operator** : compares two single values (e.g. '=' , '>', '<', '<>' etc.)

**<span style="color:red">WARNING</span>** : If the subquery returns more than one value when used with an unmodified comparison operator it will cause the query to raise an error

These queries often use aggregate functions (like AVG or MAX) because they guarantee to return only one answer. If you don't use an aggregate function, you must be familiar enough with your data and the nature of the problem to know that the subquery will return exactly one value. 

# Subqueries That Return Multiple Values

A subquery that can return multiple values (a list) is used with a modified comparison operator, IN/NOT IN, or EXISTS.

* **Modified Comparison Operator** : One that is followed by the keyword ANY or ALL (e.g. ' > ALL \[list\] ')
    - **ALL** requires that comparison operator is true for every value of the following list (e.g. '3 > ALL \[1, 2, 5\]' would evaluate to False because 3 is not greater than 5)
    - **ANY** requires that the comparison operator is true for at least on value of the following list (e.g. '3 > ANY \[1, 2, 5\]' would evaluate to True because 3 is greater than at least one value in the list)
        
* **(NOT) IN** : Checks to see whether the preceding value is (not) in the following list (e.g. 'a IN \[a, b, c, d\]' would evaluate to True)
* **EXISTS** : If anything is returned by the subquery, the expression evaluates to True. If nothing is returned by the subquery, the expression evaluates to FALSE

# Self-Contained Subqueries

**Self-Contained** : The subquery has all the information it needs within itself

If the 'question within the question' has just one answer that is always the same, then the query will likely be a self-contained query. With this type, you can think of the subquery as being a completely independent query that is executed once and then the result is plugged into the outer query

# Correlated Subqueries

**Correlated** : the subquery needs some kind of information from the outer query; also known as 'repeating subqueries'

If the 'question within the question' is different depending on the situation, the query will likely be a correlated query. This type of subquery gets evaluated again and again for every row of the outer query, because the result of the subquery depends on some value from each row of the outer query (i.e. it is *'correlated'* to the outer query)

---

<img src="https://studentlife.dal.ca/article/2019/how-to-schedule-the-perfect-finals-week/_jcr_content/root/maincontent/main/article-body/center/contentfragment/par10/image.coreimg.gif/1575042792843/take-a-break.gif" width="500" height="500" />

# Combining Rows from Different Tables

We know that JOINs can combine columns from two different tables. What if we want to combine rows from two different tables? Row combinations are done using UNION, INTERSECT, and EXCEPT, which are called Set Operators (because you are combining two groups of rows to create a single set of rows)

<img src="Row Combinations Visual.png" width="700" height="700" />

*(set operation images from https://essentialsql.com/sql-union-intersect-except/)*

<img src="Vertical Venn Diagram.jpeg" width="300" height="300" />

**<span style="color:red">REQUIREMENTS for all rows combination operations
:</span>**
* the number of columns must be the same for both SELECT statements
* the columns, in order, must be of the same data type (e.g. don't try to create a column where some values are strings and some are integers)

**For all set operations:**
* Think of each query creating its own intermediary table, and then those table being combined according to the set operation
* NULL does match NULL (unlike when you explicitely compare NULL = NULL)
* Just like for JOINs, the top query is called the 'left' and the bottom is called the 'right'
* Because the rows from the different queries need to have the same type of columns, they are often queries on the same table with different filters added

# UNION

When using a UNION between two queries it returns a combination of all of the rows from the results of the queries. Duplicate rows are deleted unless you use the ALL keyword (ALL is like the opposite of DISTINCT)

Can think of UNION as the row equivalent of a FULL JOIN 

<img src="Anatomy of a Union.png" width="700" height="700" />

### Syntax

In [None]:
SELECT 
    columnlist1
FROM 
    table1
WHERE 
    conditions1
UNION (ALL) # ALL not required
SELECT 
    columnlist2
FROM 
    table2
WHERE
    conditions2
ORDER BY 
    column # Applies to the combined result

Which you can think of as:

In [None]:
(
Query 1
UNION
Query 2
)
ORDER BY

Each query creates its own intermediary table, and then those table are combined according to the union

# INTERSECT

When using INTERSECT between two queries it returns the rows that the two queries share in common.

Can think of INTERSECT as the row equivalent of an INNER JOIN

<img src="Intersect Visual.png" width="700" height="700" />

### Syntax

In [None]:
SELECT 
    columnlist1
FROM 
    table1
WHERE 
    conditions1
INTERSECT
SELECT
    columnlist2
FROM 
    table2
WHERE
    conditions2
ORDER BY
    column

# EXCEPT

When using EXCEPT between two queries it returns the rows that exist in one query (the left one) but don't exist in the other query (the right one).

Can think of EXCEPT as the row equivalent of a LEFT EXCLUSIVE JOIN.

<img src="Except Visual.png" width="700" height="700" />

---

# So...which type of query do I use to answer my question??

<img src="https://media.giphy.com/media/MdRc6qXSukWsDJfnp9/giphy.gif" width="700" height="700" />

You'll find that there is often more than one way to answer a particular question in SQL. You get to pick your approach based on what makes most logical sense to you, what is the most straightforward, what is the easiest to understand, etc.