# Identifying the base report

You are tasked with building the following visualization:

<center><img src="images/01.02.png"  style="width: 400px, height: 300px;"/></center>


A base report is an underlying report that sources a visualization. From a SQL point of view, you first want to build a base report before creating the visualization.

Which of the following statements about the base report for this visualization is false?

- The query needs a `WHERE` statement.

# Building the base report

Now, build the base report for this visualization:

<center><img src="images/01.02.png"  style="width: 400px, height: 300px;"/></center>

This should be built by querying the `summer_games` table.

```
-- Query the sport and distinct number of athletes
SELECT 
	sport, 
    COUNT(DISTINCT athlete_id) AS athletes
FROM summer_games
GROUP BY sport
-- Only include the 3 sports with the most athletes
ORDER BY athletes
LIMIT 3;

```

# Athletes vs events by sport

Now consider the following visualization:

<center><img src="images/01.04.png"  style="width: 400px, height: 300px;"/></center>


Using the summer_games table, run a query that creates the base report that sources this visualization.

```
-- Query sport, events, and athletes from summer_games
SELECT 
	sport, 
    COUNT(DISTINCT event) AS events, 
    COUNT(DISTINCT athlete_id) AS athletes
FROM summer_games
GROUP BY sport;
```

# Planning queries with an E:R diagram

An E:R diagram visually shows all tables, fields, and relationships in a database. You are given the following E:R diagram:

<center><img src="images/01.06.png"  style="width: 400px, height: 300px;"/></center>

You are tasked with building a report that shows Age of Oldest Athlete by Region. What tables will need to be included in the query?

- `athletes`, `summer_games`, `countries`

# Age of oldest athlete by region

You are given the following E:R diagram:

<center><img src="images/01.06.png"  style="width: 400px, height: 300px;"/></center>

In the previous exercise, you identified which tables are needed to create a report that shows Age of Oldest Athlete by Region. Now, set up the query to create this report.


```
-- Select the age of the oldest athlete for each region
SELECT 
	region, 
    MAX(age) AS age_of_oldest_athlete
FROM athletes AS a
-- First JOIN statement
JOIN summer_games AS s
ON a.id = s.athlete_id
-- Second JOIN statement
JOIN countries AS c
ON c.id = s.country_id
GROUP BY region;
```

# Number of events in each sport

The full E:R diagram for the database is shown below:

<center><img src="images/01.08.png"  style="width: 400px, height: 300px;"/></center>


Since the company will be involved in both summer sports and winter sports, it is beneficial to look at all sports in one centralized report.

Your task is to create a query that shows the unique number of events held for each sport. Note that since no relationships exist between these two tables, you will need to use a UNION instead of a JOIN.

```
-- Select sport and events for summer sports
SELECT 
	sport, 
    COUNT(DISTINCT event) AS events
FROM summer_games
GROUP BY sport
UNION
-- Select sport and events for winter sports
SELECT 
	sport, 
    COUNT(DISTINCT event) AS events
FROM winter_games
GROUP BY sport
-- Show the most events at the top of the report
ORDER BY events DESC;
```

# Exploring summer_games

Exploring the data in a table can provide further insights into the database as a whole. In this exercise, you will try out a series of different techniques to explore the `summer_games` table.

```
-- Update query to explore the unique bronze field values
SELECT DISTINCT bronze
FROM summer_games;
```

```
-- Recreate the query by using GROUP BY 
SELECT bronze
FROM summer_games
GROUP BY bronze;
```

```
-- Add the rows column to your query
SELECT 
	bronze, 
	COUNT(*) AS rows
FROM summer_games
GROUP BY bronze;
```

# Validating our query

The same techniques we use to explore the data can be used to validate queries. By using the query as a subquery, you can run exploratory techniques to confirm the query results are as expected.

In this exercise, you will create a query that shows Bronze Medals by Country and then validate it using the subquery technique.

Feel free to reference the E:R Diagram as needed.

```
-- Pull total_bronze_medals from summer_games below
SELECT SUM(bronze) AS total_bronze_medals
FROM summer_games;
```

```
-- Setup a query that shows bronze_medal by country
SELECT 
	country, 
    SUM(bronze) AS bronze_medals
FROM summer_games AS s
JOIN countries AS c
ON c.id = s.country_id
GROUP BY country;
```

```
/* Pull total_bronze_medals below
SELECT SUM(bronze) AS total_bronze_medals
FROM summer_games; 
>> OUTPUT = 141 total_bronze_medals */

-- Select the total bronze_medals from your query
SELECT SUM(temp.bronze_medals)
FROM (
-- Previous query is shown below.  Alias this AS subquery
  SELECT 
      country, 
      SUM(bronze) AS bronze_medals
  FROM summer_games AS s
  JOIN countries AS c
  ON s.country_id = c.id
  GROUP BY country) AS temp
;
```

# Report 1: Most decorated summer athletes

Now that you have a good understanding of the data, let's get back to our case study and build out the first element for the dashboard, Most Decorated Summer Athletes:

<center><img src="images/01.12.png"  style="width: 400px, height: 300px;"/></center>


Your job is to create the base report for this element. Base report details:

- Column 1 should be athlete_name.
- Column 2 should be gold_medals.
- The report should only include athletes with at least 3 medals.
- The report should be ordered by gold medals won, with the most medals at the top.

```
-- Pull athlete_name and gold_medals for summer games
SELECT 
	name AS athlete_name, 
    SUM(gold) AS gold_medals
FROM summer_games AS s
JOIN athletes AS a
ON a.id = s.athlete_id
GROUP BY athlete_name
-- Filter for only athletes with 3 gold medals or more
HAVING SUM(gold) >= 3
-- Sort to show the most gold medals at the top
ORDER BY gold_medals DESC;
```