# Column does not exist

When using `WHERE` as a filter condition, it is important to think about the processing order in the query. In this exercise, you want a query that returns NBA players with average total rebounds of 12 or more per game. The following formula calculates average total rebounds from the `PlayerStats` table;

 `Average Total Rebounds = (Defensive Rebounds + Offensive Rebounds)/Games Played`

The first query in Step 1 returns an error. Select Run Code to view the error. The second query, in Step 2, will give you the results you want, without error, by using a sub-query.

Note that `GamesPlayed` is `CAST AS numeric` to ensure we get decimal points in our output, as opposed to whole numbers.

```
-- Second query

-- Add the new column to the select statement
SELECT PlayerName, 
       Team, 
       Position, 
       AvgRebounds -- Add the new column
FROM
     -- Sub-query starts here                             
	(SELECT 
      PlayerName, 
      Team, 
      Position,
      -- Calculate average total rebounds
     (ORebound+DRebound)/CAST(GamesPlayed AS numeric) AS AvgRebounds
	 FROM PlayerStats) tr
WHERE AvgRebounds >= 12; -- Filter rows

```

# Functions in WHERE

You want to know which players from the 2017-2018 NBA season went to college in Louisiana. You ask a friend to make the query for you. It looks like he overcomplicated the `WHERE` filter condition by unnecessarily applying string functions and, also, it does not give you precisely what you want because he forgot how to spell Louisiana. You will simplify his query to return exactly what you require.

```
SELECT PlayerName, 
      Country,
      College, 
      DraftYear, 
      DraftNumber 
FROM Players 
-- WHERE UPPER(LEFT(College,5)) LIKE 'LOU%';
WHERE College LIKE 'Louisiana%' -- Add the new wildcard filter
```

# Test your knowledge of WHERE

Which of the following statements regarding `WHERE` is FALSE?

- `WHERE` is processed before `SELECT` and `FROM`.

# Row filtering with HAVING

In some cases, using `HAVING`, instead of `WHERE`, as a filter condition will produce the same results. If filtering individual or ungrouped rows then it is more efficient to use `WHERE`.

In this exercise, you want to know the number of players from Latin American countries playing in the 2017-2018 NBA season.

```
SELECT Country, COUNT(*) CountOfPlayers 
FROM Players
GROUP BY Country
HAVING Country 
    IN ('Argentina','Brazil','Dominican Republic'
        ,'Puerto Rico');
```

- The filter is on individual rows. Using `HAVING` here, for filtering, could increase the time a query takes to run.

```
SELECT Country, COUNT(*) CountOfPlayers
FROM Players
-- Add the filter condition
WHERE Country
-- Fill in the missing countries
	IN ('Argentina','Brazil','Dominican Republic'
        ,'Puerto Rico')
GROUP BY Country;

```

# Filtering with WHERE and HAVING

`WHERE` and `HAVING` can be used as filters in the same query. But how we use them, where we use them and what we use them for is quite different.

You want a query that returns the total points contribution of a teams Power Forwards where their total points contribution is greater than 3000.

```
SELECT Team, 
	SUM(TotalPoints) AS TotalPFPoints
FROM PlayerStats
-- Filter for only rows with power forwards
WHERE Position = 'PF'
GROUP BY Team
-- Filter for total points greater than 3000
HAVING SUM(TotalPoints) > 3000;
```

# Test your knowledge of HAVING

The following query from the `NBA Season 2017-2018` database returns the total points contribution, of a teams Centers, where total points are greater than 2500.
```
SELECT Team, 
    SUM(TotalPoints) AS TotalCPoints
FROM PlayerStats
WHERE Position = 'C'
GROUP BY Team
HAVING SUM(TotalPoints) > 2500;
```
Copy and paste the above query into the query console and select Run Code to check the results.

When using `HAVING` in a query which one of the following statements is FALSE?

- `HAVING` and `WHERE` produce the same output, so it doesn't matter which one you use.

# SELECT what you need

Your friend is a seismologist, and she is doing a study on earthquakes in South East Asia. She asks you for a query that returns coordinate locations, strength, depth and nearest city of all earthquakes in Papua New Guinea and Indonesia.

All the information you need is in the `Earthquakes` table, and your initial interrogation of the data tells you that the column for the `country` code is Country and that the Codes for Papua New Guinea and Indonesia are `PG` and `ID` respectively.

```
SELECT * -- Select all rows and columns
FROM Earthquakes;
```

```
SELECT latitude, -- Y location coordinate column
       longitude, -- X location coordinate column
	   magnitude , -- Earthquake strength column
	   depth, -- Earthquake depth column
	   NearestPop -- Nearest city column
FROM Earthquakes
WHERE Country = 'PG' -- Papua New Guinea country code
	OR Country = 'ID'; -- Indonesia country code
```

# Limit the rows with TOP

Your seismologist friend that is doing a study on earthquakes in South East Asia has asked you to subset a query that you provided her. She wants two additional queries for earthquakes recorded in Indonesia and Papua New Guinea. The first returning the ten shallowest earthquakes and the second the upper quartile of the strongest earthquakes.

```
SELECT TOP 10 -- Limit the number of rows to ten
      Latitude,
      Longitude,
	  Magnitude,
	  Depth,
	  NearestPop
FROM Earthquakes
WHERE Country = 'PG'
	OR Country = 'ID'
ORDER BY Depth; -- Order results from shallowest to deepest

```

```
SELECT TOP 25 PERCENT -- Limit rows to the upper quartile
       Latitude,
       Longitude,
	   Magnitude,
	   Depth,
	   NearestPop
FROM Earthquakes
WHERE Country = 'PG'
	OR Country = 'ID'
ORDER BY Magnitude DESC; -- Order the results
```

# Should I use ORDER BY?

Which of the following statements is FALSE, when considering using `ORDER BY` in a query?

- `ORDER BY` is only supported by Microsoft SQL Server and none of the other major database vendors.

# Remove duplicates with DISTINCT()

You want to know the closest city to earthquakes with a magnitude of 8 or higher. You can get this information from the `Earthquakes` table. However, a simple query returns duplicate rows because some cities have experienced more than one magnitude 8 or higher earthquake.

You can remove duplicates by using the `DISTINCT()` clause. Once you have your results, you would like to know how many times each city has experienced an earthquake of magnitude 8 or higher.

Note that `IS NOT NULL` is being used because many earthquakes do not occur near any populated area, thankfully.

```
SELECT DISTINCT(NearestPop),-- Remove duplicate city
		Country
FROM Earthquakes
WHERE magnitude >= 8 -- Add filter condition 
	AND NearestPop IS NOT NULL
ORDER BY NearestPop;
```

```
SELECT NearestPop, 
       Country, 
       COUNT(NearestPop) NumEarthquakes -- Number of cities
FROM Earthquakes
WHERE Magnitude >= 8
	AND Country IS NOT NULL
GROUP BY NearestPop, Country -- Group columns
ORDER BY NumEarthquakes DESC;
```

# UNION and UNION ALL

You want a query that returns all cities listed in the `Earthquakes` database. It should be an easy query on the `Cities` table. However, to be sure you get all cities in the database you will append the query to the `Nations` table to include capital cities as well. You will use `UNION` to remove any duplicate rows.

Out of curiosity, you want to know if there were any duplicate rows. If you do the same query but append with `UNION ALL` instead, and compare the number of rows returned in each query, `UNION ALL` will return more rows if there are duplicates.

```
SELECT CityName AS NearCityName, -- City name column
	   CountryCode
FROM Cities

UNION -- Append queries

SELECT Capital AS NearCityName, -- Nation capital column
       Code2 AS CountryCode
FROM Nations;
```

```
SELECT CityName AS NearCityName,
	   CountryCode
FROM Cities

UNION ALL -- Append queries

SELECT Capital AS NearCityName,
       Code2 AS CountryCode  -- Country code column
FROM Nations;
```

- More rows are returned with `UNION ALL` therefore, `UNION` must be removing duplicates.

# UNION or DISTINCT()?

When deciding whether to use `DISTINCT()` or `UNION` in a query to remove duplicate rows, which of the following questions would you NOT ask yourself?

- Should I be thinking about duplicate rows because my queries never produce duplicate rows?