# Uncorrelated sub-query

A sub-query is another query within a query. The sub-query returns its results to an outer query to be processed.

You want a query that returns the region and countries that have experienced earthquakes centered at a depth of 400km or deeper. Your query will use the `Earthquakes` table in the sub-query, and `Nations` table in the outer query.

```
SELECT UNStatisticalRegion,
       CountryName 
FROM Nations
WHERE Code2 -- Country code for outer query 
         IN (SELECT Country -- Country code for sub-query
             FROM Earthquakes
             WHERE depth >= 400 ) -- Depth filter
ORDER BY UNStatisticalRegion;
```

- The sub-query does not reference the outer query.

# Correlated sub-query

Sub-queries are used to retrieve information from another table, or query, that is separate to the main query.

A friend is working on a project looking at earthquake hazards around the world. She requires a table that lists all countries, their continent and the average magnitude earthquake by country. This query will need to access data from the `Nations` and `Earthquakes` tables.

```
SELECT UNContinentRegion,
       CountryName, 
        (SELECT AVG(magnitude) -- Add average magnitude
        FROM Earthquakes e 
         	  -- Add country code reference
        WHERE n.Code2 = e.Country) AS AverageMagnitude 
FROM Nations n
ORDER BY UNContinentRegion DESC, 
         AverageMagnitude DESC;
```

- The sub-query references the outer query.

# Sub-query vs INNER JOIN

Often the results from a correlated sub-query can be replicated using an `INNER JOIN`. Depending on what your requirements are, using an INNER JOIN may be more efficient because it only makes one pass through the data whereas the correlated sub-query must execute for each row in the outer query.

You want to find out the 2017 population of the biggest city for every country in the world. You can get this information from the `Earthquakes` database with the `Nations` table as the outer query and `Cities` table in the sub-query.

You will first create this query as a correlated sub-query then rewrite it using an `INNER JOIN`.

```
SELECT
	n.CountryName,
	 (SELECT MAX(c.Pop2017) -- Add 2017 population column
	 FROM Cities AS c 
                       -- Outer query country code column
	 WHERE n.Code2 = c.CountryCode) AS BiggestCity
FROM Nations AS n; -- Outer query table
```

```
SELECT n.CountryName, 
       c.BiggestCity 
FROM Nations AS n
INNER JOIN -- Join the Nations table and sub-query
    (SELECT CountryCode, 
     MAX(Pop2017) AS BiggestCity 
     FROM Cities
     GROUP BY CountryCode) AS c
ON n.Code2 = c.CountryCode; -- Add the joining columns
```

# INTERSECT

`INTERSECT` is one of the easier and more intuitive methods used to check if data in one table is present in another.

You want to know which, if any, country capitals are listed as the nearest city to recorded earthquakes. You can get this information by comparing the `Nations` table with the `Earthquakes` table.

```
SELECT Capital
FROM Nations -- Table with capital cities

INTERSECT -- Add the operator to compare the two queries

SELECT NearestPop -- Add the city name column
FROM Earthquakes;
```

# EXCEPT

`EXCEPT` does the opposite of `INTERSECT`. It is used to check if data, present in one table, is absent in another.

You want to know which countries have no recorded earthquakes. You can get this information by comparing the `Nations` table with the `Earthquakes` table.

```
SELECT Code2 -- Add the country code column
FROM Nations

EXCEPT -- Add the operator to compare the two queries

SELECT Country 
FROM Earthquakes; -- Table with country codes
```

# Interrogating with INTERSECT

`INTERSECT` and `EXCEPT` are very useful for data interrogation.

The `Earthquakes` and `NBA Season 2017-2018` databases both contain information on countries and cities. You are interested to know which countries are represented by players in the 2017-2018 NBA season and you believe you can get the results you require by querying the relevant tables across these two databases.

Use the `INTERSECT` operator between queries, but be careful and think about the results. Although both tables contain a country name column to compare, these are separate databases and the data may be stored differently.

```
SELECT CountryName 
FROM Nations -- Table from Earthquakes database

INTERSECT -- Operator for the intersect between tables

SELECT Country
FROM Players; -- Table from NBA Season 2017-2018 database
```

With one exception, all NBA teams are USA based, so why does USA not appear in the results? Are there no Americans playing in the NBA?

To help get your answer, use the two queries below;

- Delete the query in the query console.
- Copy and paste one of the queries into the query console.
- Select Run Code to view the results.
- Repeat steps 1 to 4 for the other query.
```
SELECT * 
FROM Nations
WHERE CountryName LIKE 'U%'
SELECT *
FROM Players
WHERE Country LIKE 'U%'
```

- The values do not match. In the `Nations` table, the value for country name is stored as `United States of America`, and in the `Players` table, the value is stored as `USA`. `INTERSECT` is being used correctly. However, although both tables contain names of countries, the way the values are stored is different. In the `Nations` table, the values are stored as `United States of America` and in the `Players` table, as `USA`. Therefore, there is no match, which is a good reason to perform a thorough data interrogation on all data sets before working with databases.

# IN and EXISTS

You want to know which, if any, country capitals are listed as the nearest city to recorded earthquakes. You can get this information using `INTERSECT` and comparing the `Nations` table with the `Earthquakes` table. However, `INTERSECT` requires that the number and order of columns in the `SELECT` statements must be the same between queries and you would like to include additional columns from the outer query in the results.

You attempt two queries, each with a different operator that gives you the results you require.

```
-- First attempt
SELECT CountryName,
       Pop2017, -- 2017 country population
	   Capital, -- Capital city	   
       WorldBankRegion
FROM Nations
WHERE Capital IN -- Add the operator to compare queries
        (SELECT NearestPop 
	     FROM Earthquakes);
```

```
-- Second attempt
SELECT CountryName,   
	   Capital,
       Pop2016, -- 2016 country population
       WorldBankRegion
FROM Nations AS n
WHERE EXISTS -- Add the operator to compare queries
	  (SELECT 1
	   FROM Earthquakes AS e
	   WHERE n.Capital = e.NearestPop); -- Columns being compared
```

# NOT IN and NOT EXISTS

`NOT IN` and `NOT EXISTS` do the opposite of `IN` and `EXISTS` respectively. They are used to check if the data present in one table is absent in another.

You are interested to know if there are some countries in the `Nations` table that do not appear in the `Cities` table. There may be many reasons for this. For example, all the city populations from a country may be too small to be listed, or there may be no city data for a particular country at the time the data was compiled.

You will compare the queries using country codes.

```
SELECT WorldBankRegion,
       CountryName
FROM Nations
WHERE Code2 NOT IN -- Add the operator to compare queries
	(SELECT CountryCode -- Country code column
	 FROM Cities);
```

```
SELECT WorldBankRegion,
       CountryName,
	   Code2,
       Capital, -- Country capital column
	   Pop2017
FROM Nations AS n
WHERE NOT EXISTS -- Add the operator to compare queries
	(SELECT 1
	 FROM Cities AS c
	 WHERE n.Code2 = c.CountryCode); -- Columns being compared
```

# NOT IN with IS NOT NULL

You want to know which country capitals have never been the closest city to recorded earthquakes. You decide to use `NOT IN` to compare `Capital` from the `Nations` table, in the outer query, with `NearestPop`, from the `Earthquakes` table, in a sub-query.

```
SELECT WorldBankRegion,
       CountryName,
       Capital -- Capital city name column
FROM Nations
WHERE Capital NOT IN
	(SELECT NearestPop -- City name column
     FROM Earthquakes);
```

```
SELECT WorldBankRegion,
       CountryName,
       Capital
FROM Nations
WHERE Capital NOT IN
	(SELECT NearestPop
     FROM Earthquakes
     WHERE NearestPop IS NOT NULL); -- filter condition
```

# INNER JOIN

An insurance company that specializes in sports franchises has asked you to assess the geological hazards of cities hosting NBA teams. You believe you can get this information by querying the `Teams` and `Earthquakes` tables across the `Earthquakes` and `NBA Season 2017-2018` databases respectively. Your initial query will use `EXISTS` to compare tables. The second query will use a more appropriate operator.

```
-- Initial query
SELECT TeamName,
       TeamCode,
	   City
FROM Teams AS t -- Add table
WHERE EXISTS -- Operator to compare queries
      (SELECT 1 
	  FROM Earthquakes AS e -- Add table
	  WHERE t.City = e.NearestPop);
```

```
-- Second query
SELECT t.TeamName,
       t.TeamCode,
	   t.City,
	   e.Date,
	   e.place, -- Place description
	   e.Country -- Country code
FROM Teams AS t
INNER JOIN Earthquakes AS e -- Operator to compare tables
	  ON t.City = e.NearestPop
```

In this exercise, what does the `INNER JOIN` help you to determine that `EXISTS` could not?
- The earthquakes occurred in `San Antonio`, `Chile`, not `San Antonio`, `USA`.

# Exclusive LEFT OUTER JOIN


An exclusive `LEFT OUTER JOIN` can be used to check for the presence of data in one table that is absent in another table. To create an exclusive `LEFT OUTER JOIN` the right query requires an `IS NULL` filter condition on the joining column.

Your sales manager is concerned that orders from French customers are declining. He wants you to compile a list of French customers that have not placed any orders so he can contact them.

```
-- Second attempt
SELECT c.CustomerID,
       c.CompanyName,
	   c.ContactName,
	   c.ContactTitle,
	   c.Phone 
FROM Customers c
LEFT OUTER JOIN Orders o
	ON c.CustomerID = o.CustomerID
WHERE c.Country = 'France'
	AND o.CustomerID IS NULL; -- Filter condition
```

# Test your knowledge


The Venn diagram below describes which method used to check whether the data in one table is present, or absent, in a related table?

The `Earthquakes` database is available for you to test scenarios in the query console.

<center><img src="images/03.16.bmp"  style="width: 400px, height: 300px;"/></center>


- exclusive `LEFT OUTER JOIN`