# Recursive Queries - An introduction to the CTE

We're going to learn the basics of CTEs. 

This part may not present any new SQL possibilities for some learners, but we suggest they treat it as an introduction to more advanced topics. 

For now, it's important that you get the basics right.

Let's start off by introducing the tables we're going to work with.

Go here to access the fiddle: https://www.db-fiddle.com/f/nn6fUyjjMovPY8NGpnZTgY/0

### Supporter table

In this part, we're going to deal with a crowdfunding website similar to Kickstarter. Let's get to know the first table.

### Exercise

Select all the information from the supporter table.

On our crowdfunding site, people who have donated money at least once automatically have an account created. From that moment on, they are called supporters. Each supporter has an id, a first_name and a last_name. Once someone is a supporter, they can donate more money under the same account or start their own project.

```
SELECT *
FROM supporter;
```


### Project table

Let's get to know the second table.

### Exercise
Select all the information from the project table.

A project is a creative initiative that requires donations from supporters. Each project has a/an:

- id – A unique identifier for the project.

- category – The project's category, like traveling, games, etc.

- author_id – The ID of the person who created the project.

- minimal_amount – The least amount of money required to complete the project. This is a well-known crowdfunding concept. If you raise more than that amount – fantastic, your project can be even better. If you do not meet the minimal amount, the fundraising fails and donations go back to the relevant supporters.

```
SELECT *
FROM project;
```

### Donation table

And finally, the last table.

### Exercise
Select all information from the donation table.

Each donation has an id, is associated with a specific project (project_id), was donated by a specific supporter (supporter_id) in a given amount (amount) on a given day (donated). Additionally, the column amount_eur shows the donation amount when converted from US dollars to euros.

```
SELECT *
FROM donation;
```


# Basic syntax

Time to get down to work.

What is a `Common Table Expression`? 

You can think of it as a temporary set of rows that you name then use in the same query. 

In principle, CTEs are similar to subqueries.

Let's take a look at the most basic syntax of any Common Table Expression:

```
WITH some_name AS (your_cte)
  SELECT ... 
  FROM some_name
```

In the most basic version, you need to give your CTE a name (e.g., some_name) and define the query within the parentheses.

Then, once you close it with a bracket, you can select columns from this CTE as if it were a table.

In this course, we will refer to the CTE as the inner query and the part after it as the outer query. 

Note that you need to define your CTE first, i.e. before the SELECT clause of the outer query.

Enough of theory. Take a look at this example:

```
WITH project_revenue AS (
  SELECT
    project.id,
    SUM(amount) AS sum_amount
  FROM project 
  JOIN donation 
    ON donation.project_id = project.id
  GROUP BY project.id
)

SELECT
  id,
  sum_amount
FROM project_revenue;
```

In the query above, we want to show each project with the amount it collected. 

For this reason, we created a CTE where we selected the project_id and the sum of donations. 

Once we had our CTE defined, we could retrieve its columns in the outer query.

Note that we gave an alias (sum_amount) to the computed column. 

This way, we can refer to that column in the outer query.

# Exercise
Let's do one together! Show the number of projects that reached their minimal_amount.

```
WITH temporary AS (
  SELECT
    project_id,
    SUM(amount) AS sum_amount
  FROM project
  JOIN donation
    ON donation.project_id = project.id
  GROUP BY project_id, minimal_amount
  HAVING SUM(amount) > minimal_amount
)

SELECT COUNT(project_id)
FROM temporary;
```

Let's do one more exercise.

Remember, a CTE query starts with WITH, as in the example:

```
WITH project_revenue AS (
  SELECT
    project.id, 
    SUM(amount) AS sum_amount
  FROM project
  JOIN donation
    ON donation.project_id = project.id
  GROUP BY project.id
)

SELECT
  id,
  sum_amount
FROM project_revenue;

```


# Exercise
Show the first and last names of authors along with the number of not-yet-founded projects they've created. 

Name this column projects_count and show it as the third column. 

Show the authors in descending order based on projects_count.

```
WITH temporary AS (
  SELECT
    author_id,
    project_id,
    minimal_amount,
    SUM(amount) AS donations
  FROM donation
  JOIN project
    ON donation.project_id = project.id
  GROUP BY author_id, project_id, minimal_amount
)

SELECT
  first_name,
  last_name,
  COUNT(project_id) AS projects_count
FROM temporary
JOIN supporter
  ON author_id = supporter.id
WHERE donations < minimal_amount
GROUP BY first_name, last_name
ORDER BY projects_count DESC
```

# Syntax with columns

There is an alternative CTE syntax, where we define the columns explicitly:

```
WITH some_name (cte_columns) AS ( your_cte )
  SELECT ... 
  FROM some_name
```

In other words, we now have two pairs of parentheses. 

First, we provide the names of the columns that our CTE will have. 

Second, we define the actual CTE, based on the columns we provided.

```
WITH project_revenue (id, sum_amount) AS (
  SELECT 
    project.id, 
    SUM(amount)
  FROM project
  JOIN donation 
    ON donation.project_id = project.id
  GROUP BY project.id
)

SELECT 
  id, 
  sum_amount 
FROM project_revenue;
```

The column definition is not required. 

Why would we use it? 

It increases the readability of your query. 

Also, while simple columns inside CTEs don't require aliases, aggregates and other function results do. 

Columns like `SUM(amount)` or `COUNT(project_id)` must be given names so that you can refer to them outside the CTE. 

One way to do this is to use the keyword AS, just as we did previously. Another way is to provide a list of columns. 

Either method is fine, but if you use neither, an error will occur.

# Exercise

Check it out for yourself: try to run the template query without the column definition or aliases and see what happens. 

Then, correct it and run the code again.

# Exercise

Count the number of projects for which the total sum of donations exceeds 50% of the minimal_amount.


```
WITH temporary (my_sum) AS (
  SELECT
    SUM(amount)
  FROM donation
  JOIN project
    ON donation.project_id = project.id
  GROUP BY project.id, minimal_amount
  HAVING SUM(amount) > 0.5 * minimal_amount
)
SELECT
  COUNT(my_sum)
FROM temporary;
```

# Multiple CTEs

You can have as many CTEs in a single query as you need. 

Each of them should be separated with a comma, and the WITH keyword should only appear once, at the beginning.

```
WITH some_name1 AS ( your_CTE1 ),
some_name2 AS ( you_CTE2 ),
...
SELECT ...
```

Remember that WITH appears only once, at the beginning. The other CTEs are separated with commas. 

Do not put a comma after the last CTE.

Using multiple CTEs usually makes sense when they refer to each other. 

We'll get to know such CTEs in the next part. 

For now, we may think of other usages; for instance, you can use set operations like UNION to show results from two CTEs. 

Suppose we want to show the top projects from two separate categories and some specific parameters. 

To do so, we could use a query like this:
```
WITH succ_traveling AS (
  SELECT
    project_id,
    category,
    SUM(amount) AS sum_amount
  FROM project
  JOIN donation
    ON donation.project_id = project.id
  WHERE category = 'traveling'
  GROUP BY project_id, category, minimal_amount
  HAVING SUM(amount) >= 1.25 * minimal_amount),

succ_games AS (
  SELECT
    project_id,
    category,
    SUM(amount) AS sum_amount
  FROM project 
  JOIN donation 
    ON donation.project_id = project.id
  WHERE category = 'games'
  GROUP BY project_id, category, minimal_amount
  HAVING SUM(amount) >= 2 * minimal_amount)

SELECT 
  project_id, 
  category, 
  sum_amount
FROM succ_traveling
UNION
SELECT 
  project_id, 
  category, 
  sum_amount
FROM succ_games
ORDER BY sum_amount DESC;
```

In this example, we want to show projects from the traveling category that collected at least 25% more than their minimal_amount. 

We also want to show projects from the games category that collected at least twice as much as their minimal_amount. 

Once we defined both CTEs, we could show all results by using UNION in the outer query.



# CTE vs. subquery

The last thing we'll discuss in this part is the difference between subqueries and CTEs. 

In fact, all of the examples and exercises in this part could be rewritten with subqueries. 

For now, CTEs simply increase the readability of your query.

A query with a subquery will look similar to a query with a CTE, but the CTE version makes your query look more structured and easier to read. 

However, we can't create correlated subqueries with CTEs. 

Don't worry, though – you will soon find out that there are also certain things that CTEs can do that subqueries can't.

```
SELECT
  MAX(count_donations) AS max_donations,
  MIN(count_donations) AS min_donations
FROM (
  SELECT
    project.id,
    SUM(amount) AS sum_amount,
    minimal_amount,
    COUNT(DISTINCT donation.id) AS count_donations
  FROM project
  JOIN donation
    ON donation.project_id = project.id 
  GROUP BY project.id, minimal_amount 
  HAVING SUM(amount) > minimal_amount) temp;
```
# Exercise

The template query shows the maximum and minimum number of donations for successful projects. 

Change the subquery version to a CTE version.

```
WITH temp AS (
  SELECT
    project.id,
    SUM(amount) AS sum_amount,
    minimal_amount,
    COUNT(DISTINCT donation.id) AS count_donations
  FROM project
  JOIN donation
    ON donation.project_id = project.id
  GROUP BY project.id, minimal_amount
  HAVING SUM(amount) > minimal_amount
) 

SELECT
  MAX(count_donations) AS max_donations,
  MIN(count_donations) AS min_donations
FROM temp;
```

# Summary

 It's time to wrap things up.

We've discussed the following CTEs syntax:
```
WITH cte_name1 (cte1_columns) AS (your_cte1),
cte_name2 (cte2_columns) AS (your_cte2),
...
SELECT ...
```

- CTEs are placed at the start of the query. They are introduced with the WITH keyword.
- You can skip column definitions if you provide aliases to columns that contain aggregates and other functions.
- CTEs are temporary sets of rows. They are similar to subqueries. Thanks to them, your query becomes better organized and easier to read.


# Simple nested CTEs – explanation

Here's the fiddle for this part of the course: https://www.db-fiddle.com/f/jjTVh2hR8osG7Jav6tjGM8/0


We're going to tackle nested CTEs.

If you take a look into PostgreSQL's documentation, you''ll find out that you cannot put one CTE inside the parentheses of another CTE. 

Luckily, we can still use a simple technique to create nested CTEs. 

Take a look at the following query:

```
WITH total_salesman AS (
  SELECT
    s.id AS s_id,
    c.id AS c_id,
    SUM(distance) AS sum_kilometers
  FROM salesman s 
  JOIN daily_sales ds 
    ON s.id = ds.salesman_id 
  JOIN city c 
    ON s.city_id = c.id
  GROUP BY s.id, c.id
),

max_salesman AS (
  SELECT
    c_id,
    MAX(sum_kilometers) AS max_kilometers
  FROM total_salesman
  GROUP BY c_id
)

SELECT
  AVG(max_kilometers)
FROM max_salesman;
```


The trick is that the second query, max_salesman, uses the first CTE, total_salesman, in the FROM clause (FROM total_salesman). 

That means that once we define the first CTE, we can freely use it in subsequent CTEs.

Now, what does this query do? 

In the first CTE, it finds the total number of kilometers driven in each city by each salesman (on all days). 

In the second CTE, it takes those sums and uses them to find the salesman with the maximal travel distance in each city. 

Finally, in the outer query, we find the average maximal distance across all cities.

As we learned, we can't put one aggregate function inside another. 

Nested CTEs are a simple way to get around that restriction.

# Exercise

Now it's your turn! I will clearly state each step you need so you don't get confused.

First, find the daily sum of amount_earned in each city. 


```
WITH daily_sum_city AS (
  SELECT
    c.id,
    c.name,
    ds.day,
    SUM(amount_earned) AS sum_amount
  FROM salesman s
  JOIN daily_sales ds
    ON s.id = ds.salesman_id
  JOIN city c
    ON s.city_id = c.id
 GROUP BY c.id, ds.day, c.name
),
```

Then, find the average daily amount for all cities for all days. 

```
avg_all_cities AS (
 SELECT
   AVG(sum_amount) AS avg_amount
 FROM daily_sum_city
)
```


Finally, show the id and name of each city and the number of daily sums that exceeded the average daily amount across all cities.

```
SELECT
  id,
  name,
  COUNT(*)
FROM daily_sum_city, avg_all_cities
WHERE sum_amount > avg_amount
GROUP BY id, name;
```

# Exercise

Count the maximal amount_earned on a given day in a given city. Then, calculate the average maximal amount across all cities on a given day. 

Finally, count the number of days on which that average maximal amount exceeded $1700.

```

WITH daily_max AS (
  SELECT
    day,
    c.name,
    MAX(amount_earned) AS max_amount
  FROM salesman s
  JOIN daily_sales ds
    ON s.id = ds.salesman_id
  JOIN city c
    ON s.city_id = c.id
  GROUP BY c.name, day
),
avg_daily_max AS (
  SELECT
    day,
    AVG(max_amount) AS avg_max
  FROM daily_max
  GROUP BY day
)
SELECT COUNT(avg_max)
FROM avg_daily_max
WHERE avg_max > 1700;
```

# Nested CTE with subqueries – explanation

We will now add a small piece to our queries: a subquery. 

Why would we do that?

Let's say we have the following task: we first count the number of salesmen who sold more than 5 items on a given day in a given city. 

Then, we find the average number of such salesmen across all cities for each day. 

Finally, we find the maximal average sales for that day.

Well, nothing new so far. 

But what if we want to show the maximal average sales and the day to which that average corresponds? 

Take a look:
```
WITH count_good_salesmen AS (
  SELECT
    day,
    name,
    COUNT(DISTINCT s.id) AS count_good 
  FROM salesman s 
  JOIN daily_sales ds 
    ON s.id=ds.salesman_id 
  JOIN city c 
    ON s.city_id=c.id 
  WHERE items_sold > 5 
  GROUP BY day, name
),

avg_salesmen_daily AS (
  SELECT
    day,
    AVG(count_good) as avg_count
  FROM count_good_salesmen
  GROUP BY day
)

SELECT
  day,
  avg_count
FROM avg_salesmen_daily
WHERE avg_count = (
  SELECT
    MAX(avg_count)
  FROM avg_salesmen_daily);
```

Look how we used a small subquery in the WHERE clause of the outer query to find the MAX(avg_count). 

Once we had the right row, we could select both columns (day and avg_count).


# Exercise

Your turn! First, find the total number of customers in each region on each day. Then, calculate the average number of customers across all regions on each day.

Finally, show the day and the avg_region_customers for the region with the lowest average across all regions.

```
WITH sum_region AS (
  SELECT
    day,
    region,
    SUM(customers) AS sum_customers
  FROM salesman s
  JOIN daily_sales ds
    ON s.id = ds.salesman_id
  JOIN city c
    ON s.city_id = c.id
  GROUP BY day, region
),
avg_region AS (
  SELECT
    day,
    AVG(sum_customers) AS avg_region_customers
  FROM sum_region
  GROUP BY day
)
SELECT
  day,
  avg_region_customers
FROM avg_region
WHERE avg_region_customers = (SELECT
    MIN(avg_region_customers)
  FROM avg_region);
```

# Nested CTEs in complex queries – exercise 1

Nested CTEs are very handy when you want to compute complex statistics in one query. 

For example, you can use nested CTEs to compute the average of some averages. 

Look at the query below:


The first CTE, salesman_sold_items, computes the total number of items sold by each salesman. 

```
WITH salesman_sold_items AS (
  SELECT 
    salesman_id,
    city_id,
    SUM(items_sold) AS total_items_sold
  FROM daily_sales d
  JOIN salesman s
    ON d.salesman_id = s.id
  GROUP BY salesman_id, city_id
),
```

The second CTE, city_average, computes the city-level average. It's a very important performance metric. 

```
city_average AS (
  SELECT 
    city_id,
    AVG(total_items_sold) AS city_avg
  FROM salesman_sold_items
  GROUP BY city_id
)
```
Finally, the outer query computes the overall average value for all city averages.

```
SELECT AVG(city_avg)
FROM city_average
```


Notice that computing the overall average per salesman would render a completely different result.


Another situation where nested CTEs can be useful is comparing two groups of items. 

Check it out:

Let's say we want to compare the average earnings of salespeople from Europe with those from other parts of the world.



In the first CTE, we define the groups: cities from Europe are labeled 'Europe', while cities outside Europe are labeled 'Other'.
```
WITH grouping AS (
  SELECT
    id AS city_id,
    CASE WHEN region = 'Europe' THEN region ELSE 'Other' END AS group_name
  FROM city
),
```


In the second CTE, we compute the total amount earned for each salesperson and combine this information with the group definition (i.e. 'Europe' or 'Other') for that salesperson.
```
total_salesman_earnings AS (
  SELECT
    salesman_id,
    group_name,
    SUM(amount_earned) AS total_amount
  FROM daily_sales d
  JOIN salesman s
    ON d.salesman_id = s.id
  JOIN grouping g
    ON g.city_id = s.city_id
  GROUP BY salesman_id, group_name
) 
```
In the outer query, we compute the group-level average.
```
SELECT
  group_name,
  AVG(total_amount)
FROM total_salesman_earnings s
GROUP BY group_name
```

# Exercise

Compare the average number of items sold by salespeople from the USA (country = 'USA') with those from all other countries.

Name the group column group_name. In your query, use the values 'USA' and 'Other' to label the groups.

```
WITH a AS (
  SELECT
    id AS city_id,
    CASE
      WHEN country = 'USA' THEN country
      ELSE 'Other'
    END AS group_name
  FROM city
),

b AS (
  SELECT
    group_name,
    salesman_id,
    SUM(items_sold) sis
  FROM daily_sales ds
  JOIN salesman s
    ON s.id = salesman_id
  JOIN a
    ON a.city_id = s.city_id
  GROUP BY 1,2
)

SELECT
  group_name,
  AVG(sis)
FROM b
GROUP BY group_name;
```


# Summary

It's time to wrap things up. What have we learned in this part?

- It's impossible to put a CTE inside another CTE. However, each CTE can use previously-defined CTEs in the FROM clause.

- Nested CTEs can be used to compute aggregate functions on a few levels.

- If you need more than just the aggregate result in the outer query, you can use a small subquery.

- CTEs can also be used to compare two (or more) groups of rows.

# Homework

Use the same schema: https://www.db-fiddle.com/f/nn6fUyjjMovPY8NGpnZTgY/0

# Exercise

In January 2016, supporters who donated at least 10% of a project's minimal_amount in one donation received a gift. 

In February 2016, they had to donate 20% of the minimal_amount to get the same gift. 

Show the columns amount and donated for all donations occurring in January and February for which a gift was awarded.

```
WITH gift_jan AS (
  SELECT
    amount,
    donated
  FROM donation d
  JOIN project p
    ON d.project_id = p.id
  WHERE donated BETWEEN '2016-01-01' AND '2016-01-31'
    AND amount >= 0.1 * minimal_amount
), 
gift_feb AS (
  SELECT
    amount,
    donated
  FROM donation d
  JOIN project p
    ON d.project_id = p.id
  WHERE donated BETWEEN '2016-02-01' AND '2016-02-29'
    AND amount >= 0.2 * minimal_amount
)
SELECT
  amount,
  donated
FROM gift_jan
UNION
SELECT
  amount,
  donated
FROM gift_feb;
```

# Exercise
Show two groups of users:

- Those who donated at least 200 (per person, in total).

- Those who donated at least twice.

For each group, show each supporter’s ID and first and last names.

```
WITH rich AS (
  SELECT
    s.id,
    first_name,
    last_name
  FROM supporter s
  JOIN donation d
    ON d.supporter_id = s.id
  GROUP BY s.id, first_name, last_name
  HAVING SUM(amount) > 200
),
frequent AS (
  SELECT
    s.id,
    first_name,
    last_name
  FROM supporter s
  JOIN donation d
    ON d.supporter_id = s.id
  GROUP BY s.id, first_name, last_name
  HAVING COUNT(d.id) > 1
)
SELECT
  id,
  first_name,
  last_name
FROM rich
UNION ALL
SELECT
  id,
  first_name,
  last_name
FROM frequent;
```

# Exercise

For each person who made donations in the 'music' or 'traveling' categories, show three columns:

- supporter_id
- min_music – That person's minimum donation amount in the music category.
- max_traveling – That person's maximum donation amount in the traveling category.

```
WITH total_number AS (
  SELECT DISTINCT
    supporter_id
  FROM donation d 
  JOIN project p
    ON d.project_id = p.id 
  WHERE category IN ('music', 'traveling')
),

music AS (
  SELECT
    supporter_id,
    MIN(amount)
  FROM donation d 
  JOIN project p
    ON d.project_id = p.id 
  WHERE category IN ('music')
  GROUP BY 1
),

traveling AS (
  SELECT
    supporter_id,
    MAX(amount)
  FROM donation d 
  JOIN project p
    ON d.project_id = p.id 
  WHERE category IN ('traveling')
  GROUP BY 1
)

SELECT
  t.supporter_id,
  m.min AS min_music,
  tr.max AS max_traveling
FROM total_number t 
LEFT JOIN music m
  ON m.supporter_id = t.supporter_id
LEFT JOIN traveling tr
  ON tr.supporter_id = t.supporter_id
```


# Exercise
Show the average total amount raised in successful projects that had more than 10 donations.



```
WITH temp AS (
  SELECT
    project.id,
    SUM(amount) AS sum_amount,
    minimal_amount,
    COUNT(DISTINCT donation.id) AS count_donations
  FROM project
  JOIN donation
    ON donation.project_id = project.id
  GROUP BY project.id, minimal_amount
  HAVING SUM(amount) > minimal_amount
)

SELECT
  AVG(sum_amount)
FROM temp
WHERE count_donations > 10;
```

# Exercise

Among successful projects, those that raised 100% to 150% of the minimum amount are good projects, whereas those that raised more than 150% are great projects. 

Show the number of projects along with a string representing how good the project is (good projects or great projects) name the column tag.

```
WITH temp AS (
  SELECT
    project.id,
    SUM(amount) AS sum_amount,
    minimal_amount,
    COUNT(DISTINCT donation.id) AS count_donations
  FROM project
  JOIN donation
    ON donation.project_id = project.id
  GROUP BY project.id, minimal_amount
  HAVING SUM(amount) > minimal_amount
    AND SUM(amount) <= 1.5 * minimal_amount
),
temp2 AS (
  SELECT
    project.id,
    SUM(amount) AS sum_amount,
    minimal_amount,
    COUNT(DISTINCT donation.id) AS count_donations
  FROM project
  JOIN donation
    ON donation.project_id = project.id
  GROUP BY project.id, minimal_amount
  HAVING SUM(amount) > 1.5 * minimal_amount
) 
SELECT
  COUNT(*),
  'good projects' AS tag
FROM temp
UNION ALL
SELECT
  COUNT(*),
  'great projects' AS tag
FROM temp2;
```

Using this fiddle for the following questions: https://www.db-fiddle.com/f/jjTVh2hR8osG7Jav6tjGM8/0

# Exercise

Compute the city-level average of the total distance travelled by each salesman. Then, compute the company-level average. Use the example provided in the explanation.

```
WITH salesman_distance AS (
  SELECT
    salesman_id,
    city_id,
    SUM(distance) AS total_distance
  FROM daily_sales d
  JOIN salesman s
    ON d.salesman_id = s.id
  GROUP BY salesman_id, city_id
),

city_average AS (
  SELECT
    city_id,
    AVG(total_distance) AS city_avg
  FROM salesman_distance
  GROUP BY city_id
)

SELECT AVG(city_avg)
FROM city_average
```

# Exercise

A salesperson performs well if their total amount earned is above the average amount earned in their city. We want to show which salespeople perform well.

For each salesperson, show their first_name, last_name, and a third column named label. This column will display either 'Above average' or 'Below average', based on the total amount earned by that person.

```
WITH total_salesman_earnings AS (
  SELECT 
    city_id,
    first_name,
    last_name,
    salesman_id,
    SUM(amount_earned) AS total_amount
  FROM daily_sales d
  JOIN salesman s
    ON d.salesman_id = s.id
  GROUP BY city_id, salesman_id, first_name, last_name
),
city_average AS (
  SELECT
    city_id,
    AVG(total_amount) AS city_avg_amount
  FROM total_salesman_earnings
  GROUP BY city_id
)

SELECT 
  first_name,
  last_name,
  CASE
    WHEN total_amount > city_avg_amount THEN 'Above average'
    ELSE 'Below average'
  END AS label
FROM total_salesman_earnings s
JOIN city_average c
  ON s.city_id = c.city_id
```

# Exercise

We define 'Good' salespeople as those whose total amount earned is above the average amount earned in their city.

We want to compare the average number of items sold between two groups: the 'Good' salespeople and the 'Bad' salespeople.



```
WITH total_earnings AS (
  SELECT
    salesman_id,
    city_id,
    SUM(amount_earned) s
  FROM salesman sa
  LEFT JOIN daily_sales ds
    ON salesman_id = sa.id
  GROUP BY 1, 2
),

b AS (
  SELECT
    city_id,
    AVG(s) av
  FROM total_earnings
  GROUP BY 1
),

salesman_label AS (
  SELECT
    salesman_id,
    CASE
      WHEN s > av THEN 'Good'
      ELSE 'Bad'
    END AS label
  FROM total_earnings
  JOIN b
    ON total_earnings.city_id = b.city_id
),

total_items_sold AS (
  SELECT
    sl.salesman_id,
    label,
    SUM(items_sold) total_items
  FROM salesman_label sl 
  LEFT JOIN daily_sales ds
    ON ds.salesman_id = sl.salesman_id
  GROUP BY 1, 2
  )

SELECT
  label,
  AVG(total_items) average
FROM total_items_sold
GROUP BY label
```

# Exercise

Find the largest number of items sold in each city on each day. Then, calculate the average maximum number of items sold in each city across all days. Finally, count the number of cities where the average maximal number of items sold is greater than 18.

```
WITH daily_max AS (
  SELECT
    day,
    c.name,
    MAX(items_sold) AS max_items
  FROM salesman s
  JOIN daily_sales ds
    ON s.id = ds.salesman_id
  JOIN city c
    ON s.city_id = c.id
  GROUP BY c.name, day
),
avg_daily_max AS (
  SELECT
    name,
    AVG(max_items) AS avg_max
  FROM daily_max
  GROUP BY name
)
SELECT COUNT(avg_max)
FROM avg_daily_max
WHERE avg_max > 18;
```

# Exercise

A city is performing well if their total number of items sold is above the average for their region. 

For each city, show its name and label (either 'Above average' or 'Below average', depending on how well the city performs).

```
WITH total_number AS (
  SELECT
    name,
    region,
    SUM(items_sold) s
  FROM daily_sales ds
  JOIN salesman s
    ON ds.salesman_id = s.id
  JOIN city c
    ON c.id = city_id
  GROUP BY 1, 2
),
max_sum AS (
  SELECT
    region,
    AVG(s) av
  FROM total_number
  GROUP BY 1
)

SELECT
  name,
  CASE WHEN s > av THEN 'Above average' ELSE 'Below average' END AS label
FROM total_number ts
JOIN max_sum ms
  ON ms.region = ts.region
```