# Group by Practice

### Aggregate functions: 
https://www.sqlite.org/lang_aggfunc.html


### DB Source: 
https://pub-f13217639d6446309ebabc652f18d0ad.r2.dev/movies_download.db

### 1.1 What is the average rating of each director?

```sql
select
  director,
  round(avg(rating),2) as avg_rating
from
  movies
where rating is not null and director is not null
group by director
order by avg_rating desc;
```


### 1.2 What is the average rating of each director who made more than 5 movies?

```sql
select -- 4
  director,
  count(*) as total_movies,
  round(avg(rating), 2) as avg_rating
from --1
  movies
where
  rating is not null
  and director is not null --2
group by
  director --3
having
  total_movies > 5 --5
order by
  avg_rating desc;--6
```

### 2. How many movies are in each genre?

```sql
select
  genres,
  count(*) as genre_count
from
  movies
where
  genres is not null
group by
  genres
order by
  genre_count desc;
```

### 3. How many movies have a rating greater than 6? and what rating is the most commn?

```sql
select
  rating,
  count(*) as total_movies
from
  movies
where
  rating > 6
group by
  rating
order by
  total_movies desc;
```

### 4. Find the number of movies released each year


```sql
select
	release_date,
	count(*) as total_movies
from
  movies
where
  release_date is not null
group by release_date
order by total_movies desc;
```


### 5. List the top 10 years with the highest average movie runtime.

```sql
select
  release_date,
  avg(runtime) as avg_runtime
from
  movies
where
  release_date is not null
  and runtime is not null
group by
  release_date
order by
  avg_runtime desc
limit
  10;
```

### 6. Calculcate the average rating for movies released in the 21st century.

```sql
select
  round(avg(rating),3) as avg_rating
from
  movies
where
  release_date >= 2000;
```

### 7. Find the director with the highest average movie runtime. // remove outliers, let's say directors who made 3 or more movies

```sql
select
  director,
  avg(runtime) as avg_runtime,
  count(*) as total_movies
from
  movies
where
  director is not null
  and runtime is not null
group by
  director
having
  total_movies >= 3
order by
  avg_runtime desc
limit
  1;
```

### 8. List the top 5 most prolific directors(those who have directored the most movies). / remove outliers let's say runtime more than 60 min.

```sql
select
  director,
  count(*) as total_movies
from
  movies
where
  director is not null
  and runtime is not null
  and runtime > 60
group by
  director
order by
  total_movies desc
limit
  5;
```

### 9. Find the highest and lowest rating of each director

```sql
select
  director,
  max(rating) as max_rating,
  min(rating) as min_rating,
  count(*) as total_movies
from
  movies
where
  director is not null
  and rating is not null
group by
  director
having total_movies > 5;
```

### 10. Find the director that has made the most money(revenue - budget)

```sql
select
  director,
  sum(revenue) - sum(budget) as profit
from
  movies
where
  director is not null
  and revenue is not null
  and budget is not null
group by
  director
order by profit desc
limit 1;
```

### 11. Calculate the average ratng for movies longer than 2 hours

```sql
select avg(rating) as avg_rating
from movies
where rating is not null and runtime is not null and runtime > 120;
```

### 12. Find the year with the most movies released

```sql
select
  release_date,
  count(*) as total_movies
from
  movies
group by
  release_date
order by
  total_movies desc
limit
  1;
```

### 13. Find the average runtime of movies for each decade

```sql
select
  (release_date / 10) * 10 as decade, -- sqlite returns integer for integer calculation.
  avg(runtime) as avg_runtime
from
  movies
where
  release_date is not null
  and runtime > 30
group by
  decade;
```

### How SQL Handles Computed Columns

1. **Pre-Evaluation of Expressions**: When the SQL engine processes the query, it first evaluates the expressions that are needed for grouping. In your query, the expression `(release_date / 10) * 10 AS decade` is needed to define how the data should be grouped.

2. **Creating Computed Columns**: The SQL engine computes the `decade` values based on the expression `(release_date / 10) * 10` for each row of the filtered data. This creates a temporary column (or a virtual column) called `decade` that holds these computed values.

3. **Grouping by Computed Values**: With the computed `decade` values in place, the `GROUP BY` clause then groups all rows that have the same `decade` value.

4. **Aggregation and Selection**: After the rows are grouped, the SQL engine moves to the `SELECT` clause, where it computes aggregate functions like `AVG(runtime)` for each group and prepares the final output.


### 14. List the top 5 years where the difference between the highest and lowest rated movie was the greatest.

```sql
select
  release_date,
  max(rating) - min(rating) as rating_gap
from
  movies
where
  release_date is not null
  and rating between 0 and 10
group by
  release_date
order by rating_gap desc
limit 5;
```

### 15. List directors who have naver made a movie shorter than 2 hours.

my initial query

```sql
select
  director
from
  movies
where
  director is not null
  and runtime > 120

```
Correct answer

```sql
select
  director
from
  movies
group by
  director
having
  min(runtime) > 120;
```

 * The initial one only returns directors who have made `at least one movie longer than 120 minutes`. However, it will not exclude directors who may have also made shorter movies.

### 16. Calculate the percentage of movies with a rating above 8.0.

```sql
select
  count(
    case
      when rating > 8.0 then 1
    end
  ) * 100.0 / count(*) as percentage
from
  movies
```

### 17. Find the director with the highest ratio of movies rated above 7.0.

```sql
select
  director,
  count(case when rating > 7.0 then 1.0 end) * 100.0 / count(*) as ratio
from movies
where director is not null and rating is not null
group by director
having count(*) >= 5
order by ratio desc;
```


### 18. Categorize and group by movies by length(runtime)

```sql
select
  case
    when runtime < 60 then 'short'
    when runtime between 60 and 120  then 'medium'
    else 'long'
  end as length,
  count(*)
from
  movies
where
  runtime is not null
group by
  length;
```

### 19. Categorize and group by movies by flop or not.(flop: revenue < budget)

```sql
select
  case
    when revenue < budget then 'flop'
    else 'not flop'
  end as isFlop,
  count(*) as  total_movies
from movies
where revenue is not null and budget is not null
group by isFlop;
```