# Order of operations impact on query structure

The World Bank has data on a lot of countries, but they are missing values for some years. You decide to look for countries that have at least one year of missing population data. You write the following query, but it does NOT run.
```
SELECT olympic_cc, COUNT(*)
FROM demographics
WHERE population IS NULL
AND COUNT(*) > 1
```
You reference your notes about the SQL order of operations to understand why.

Which of the following explains the logical error?

- Aggregations occur after the WHERE clause.


# Group by and aggregations

You join the World Bank demographics data to the Olympic oregions data.

You want to look at trends so group countries regionally so see how many countries, by region, are missing population data. You know you want to view the results by Olympic region and count the countries with population data using the country code olympic_cc.

Select Run Code to view the results. Did you get an error?

Read the red error message then follow the instructions below to correct the query.

```
SELECT reg.region, COUNT(DISTINCT dem.olympic_cc)
FROM regions reg -- Olympics region data
LEFT JOIN demographics dem -- World Bank population data
  ON dem.olympic_cc = reg.olympic_cc
GROUP BY reg.region;
```

# Count and count distinct

By pure numbers, the greater a country's population, the greater the pool of athletes and thus Olympians. You decide to look at Olympic medals based purely on population, or volume of available athletes.

The World Bank data shows that in 2016, the countries with the highest populations were China (1,378,665,000), India (1,324,171,354), and the United States of America (323,405,935).

Now you will look at the Olympic athletes' data from the 2016 Summer Olympics to see if China, India, and the USA did indeed win the most medals.

```
SELECT country_code
  , COUNT(athlete_id) as medals_count
FROM athletes_recent
WHERE medal IS NOT NULL
AND year = 2016
GROUP BY country_code
ORDER BY medals_count DESC;
```

```
SELECT country_code
  , COUNT(DISTINCT athlete_id) as medals_count
FROM athletes_recent
WHERE medal IS NOT NULL
AND year = 2016
GROUP BY country_code
ORDER BY medals_count DESC;
```

# OR versus IN with athletes

The Olympics has no age restrictions for competitors. The youngest ever Winter competitor was 11-year-old Cecilia Colledge, a figure skating athlete from Great Britain.

Africa also has an 11-year-old figure skating competitor named Marcelle Matthews. Focusing on Africa and the Winter Olympics, find how many young African athletes have competed since the Olympics inception.

```
SELECT COUNT(*)
FROM athletes_wint 
WHERE age = 11 
OR age = 12 ;
```

```
SELECT *
FROM athletes_wint 
WHERE age IN (11,12);
```

# Data type filters

Since the first African country participated in the Winter Olympics in 1960, only two preteen (11 and 12 years) athletes have participated. Both athletes competed in the 1960 games.

Perhaps athletes in the 21st century are now older when they compete. Take a quick look at the oldest (1960) Winter games and compare it to the Winter games 50 years later (2010) to test this hypothesis.

```
SELECT games
  , name
  , age
FROM athletes_wint
WHERE games IN ('1960 Winter' ,'2010 Winter' )
ORDER BY games;
```

```
SELECT games
  , name
  , age
FROM athletes_wint
WHERE year IN (1960 ,2010 )
ORDER BY games;
```

# EXPLAIN the filter query plan step

All Olympic athletes are impressive, but athletes that are not yet adults (i.e., too young to drive or vote) are particularly impressive. You will look at these young athletes, those under 16 years, for African athletes across all Winter Olympics.

Before proceeding, feel free to explore the athletes_wint table and find if any athletes are your age.

Once done exploring, you will use the EXPLAIN function to see how your WHERE clause executes.

```
EXPLAIN
SELECT *
FROM athletes_wint
WHERE age < 16;
```

# Where to place a region filter

Population alone does not predict the number of Olympians. Researchers propose that gross domestic product (GDP) together with population can predict the number of Olympians. Luckily the World Bank also collects GDP per capita (GDP divided by the number of citizens) on many countries.

Focusing again on Africa, the most populous countries in 2014 were Nigeria, Ethiopia, and Egypt; however, the only African countries sending athletes to the 2014 Olympics were Morocco, Togo, and Zimbabwe.

Determine which African countries have 2014 GDP data and whether Morocco, Togo, and Zimbabwe have high GDPs.

Use region as a non-linking join condition

```
SELECT dem.olympic_cc, reg.country, dem.gdp, dem.population
FROM demographics dem
LEFT JOIN oregions reg
  ON dem.olympic_cc = reg.olympic_cc
  AND region = 'Africa'
WHERE dem.year = 2014
AND dem.gdp IS NOT NULL
ORDER BY dem.gdp DESC; 
```

Move region to the WHERE clause

```
SELECT dem.olympic_cc, reg.country, dem.gdp, dem.population
FROM demographics dem
LEFT JOIN oregions reg
  ON dem.olympic_cc = reg.olympic_cc
WHERE dem.year = 2014
  AND region = 'Africa'
AND dem.gdp IS NOT NULL
ORDER BY dem.gdp DESC;
```

Limit the demographics table with INNER JOIN

```
SELECT dem.olympic_cc, reg.country, dem.gdp, dem.population
FROM demographics dem
INNER JOIN oregions reg
  ON dem.olympic_cc = reg.olympic_cc
  AND reg.region = 'Africa'
WHERE dem.year = 2014
AND dem.gdp IS NOT NULL
ORDER BY dem.gdp DESC;
```

# Filtering in the join, where, and select

Looking at all the countries with gross domestic product (GDP) data in 2014, the average GDP per capita was $19,342. For African countries, the average GDP per capita was $5,879.

Perhaps the African countries that sent athletes to the Olympics (Morocco, Togo, Zimbabwe) did not have the highest GDPs per capita in Africa but had higher GDPs than the average African GDP.

Test this theory. Find the per capita GDP for the African countries with athletes at the 2014 Winter Games.

```
SELECT DISTINCT ath.name, dem.country, dem.gdp
FROM athletes_wint ath
INNER JOIN odemographics dem
  ON ath.country_code = dem.olympic_cc 
AND ath.year = 2014 
ORDER BY dem.gdp DESC;
```

# Aggregate before joining tables

You have a list of the following items:

- African athletes participating in past Olympics
- Country GDP per capita
- Population by year
For simplification, the annual demographics (GDP and population) have been grouped into low, medium, and high categories. Your job is to compare each African country's GDP, population, and athlete count.

You want the final answer to have one row per country, per year. Because the athletes table is on a different grain (athlete-event) than the demographics_rank table (country-year), you will first aggregate the athletes table before joining it to the GDP and population data.

```
-- Number of competing athletes
WITH athletes as (
  SELECT country_code, year, COUNT(athlete_id) AS no_athletes
  FROM athletes
  GROUP BY country_code, year
)

SELECT demos.country, ath.year, ath.no_athletes
    , demos.gdp_rank
    , demos.population_rank
FROM athletes ath
INNER JOIN demographics_rank demos  
  ON ath.country_code = demos.olympic_cc -- Country
  AND ath.year = demos.year -- Year
ORDER BY ath.no_athletes DESC;
```

# South African trends

You decide to zero in on one country to look at country demographic trends over time. Since 1990 (the first year of demographics' data), South Africa's population has increased from 37.5 to 56.7 million people. The GDP per capita has also increased from $6,267 to $13,497.

You want to determine if the number of Olympic athletes from South Africa has increased during this time.

The athletes table is at an athlete-event level grain. You aggregate it to the year-country grain before joining it to the demographics_rank table. Additionally, the demographics_rank table is large, so you want to filter it to only South Africa before joining it to the athletes data.

```
-- South African athletes by year
WITH athletes_cte AS 
(
    SELECT year
      , season
      , COUNT(DISTINCT athlete_id) AS no_athletes
    FROM athletes
    WHERE country_code = 'RSA' -- South Africa filter
    GROUP BY year, season
)

SELECT ath.year
  , ath.season
  , ath.no_athletes
  , demos.gdp_rounded
  , demos.gdp_rank
  , demos.population_rounded
  , demos.population_rank
FROM athletes_cte ath
INNER JOIN demographics_rank demos
  ON ath.year = demos.year
  WHERE demos.olympic_cc = 'RSA' -- Filter to South Africa
ORDER BY ath.season, ath.year;
```