# SUbqueries and CTEs

### 1. List movies with a rating|revenue  higher than average rating|revenue of all movies.

Without subqueries it will take two queries, one for average  and the second to filter the higher.

```sql
select avg(rating) from movies;
select title, count(*) 
from movies
where rating > 5.73--value copied from the first query
```
Instead, we use subquery.
You use () to hold subquery in the main query.
```sql
select
  count(*)
from
  movies
where
  rating > (
    select
      avg(rating)
    from
      movies
  );

```
Think about running a query for each row. It will slow down the query, however, the result of the subquery does not change so the database will not run the query over and over again. Instead, it will run the query once and save the result to filter where clause.(query planner) this type of query is called independent subquery.


* CTE = common table expression, it allows you to reuse subqueries.
```sql
with -- start with 'with' 
  avg_revenue_cte as ( -- first cte
    select
      avg(revenue) -- use 'as' if nickname needed
    from
      movies
  ), --comma to separate each cte and no with needed for the second
  avg_rating_cte as ( -- second cte
    select
      avg(rating)
    from
      movies
  ) -- no semi-colon
select
  title,
  director,
  revenue,
  round(
    ( -- use parentheses
      select
        * -- use a name of the column from cte
      from
        avg_revenue_cte
    ),
    0
  ) as avg_revenue,
  rating,
  round(
    (
      select
        *
      from
        avg_rating_cte
    ),
    0
  ) as avg_rating
from
  movies
where
  revenue > (
    select
      *
    from
      avg_revenue_cte
  )
  and (
    select
      *
    from
      avg_rating_cte
  );
```

### 2. List movies with a rating higher than the average rating of movies in their genre.

```sql
with
  rating_per_genre as (
    select
      avg(rating)
    from
      movies as m2
    where
      rating is not null
      and genres is not null
      and m2.genres = m1.genres
  )

select
  title,
  rating,
  genres,
  (
    select
      *
    from
      rating_per_genre
  ) as rating_per_genre
from
  movies as m1
where
	release_date > 2020 and
  m1.rating > (
    select
      *
    from
      rating_per_genre
  );
```

### 3. Find the movies with a rating higher than the average rating of movies released in the same year.

### Correlated subqueries

```sql
select
  m1.title,
  m1.director,
  m1.rating
from
  movies as m1
where
  m1.rating > (
    select
      avg(m2.rating)
    from
      movies as m2 
    where
      m2.release_date = m1.release_date -- reference the outer query
  );
```
The subquery will run every single row by referring the value from the outer query. The number of execution will be exponential with respect to the number of rows.
You can reduce the time by limiting the results 
```sql
select
  m1.title,
  m1.director,
  m1.rating
from
  movies as m1
where
  m1.release_date > 2022 and
  m1.rating > (
    select
      avg(m2.rating)
    from
      movies as m2 
    where
      m2.release_date = m1.release_date -- reference the outer query
  );
```
* the order of where filtering condition does not matter because query optimizer will choose the cheaper queries to run first.
* this query is not optimized yet.

### correlated CTEs

```sql
with
  movie_avg_per_year as (
    select
      avg(m2.rating)
    from
      movies as m2
    where
      m2.release_date = m1.release_date
  )
select
  m1.title,
  m1.director,
  m1.rating,
  (
    select
      *
    from
      movie_avg_per_year
  ) as year_avg
from
  movies as m1
where
  m1.release_date > 2022 -- to limit the results
  and m1.rating > (
    select
      *
    from
      movie_avg_per_year
  );
```
SQLite allows referring alias that used under CTE. Not the other DBs. Because alias are made after CTE, since the execution order is from top to bottom, CTE is not supposed to refer to the alias under it.


### 4. Find the directors with a career revenue higher than the average revenue of all directors.


```sql
with
  directors_rev as (
    select
      director,
      sum(revenue) as career_rev
    from
      movies
    where
      revenue is not null
      and director is not null
    group by
      director
  )
select
  director,
  sum(revenue) as total_rev,
  (
    select
      avg(career_rev)
    from
      directors_rev
  ) as peers_avg
from
  movies
where
  director is not null
  and revenue is not null
group by
  director
having
  total_rev > (
    select
      avg(career_rev)
    from
      directors_rev
  );
```