## 🔄 Chapter 04: OLAP Operations in SQL

### 📊 What is OLAP?

**OLAP** = *Online Analytical Processing*
Used for:

* Aggregating data for multidimensional insights
* Generating pivot-style summaries
* Supporting business decisions

## 🧮 GROUP BY CUBE

```sql
SELECT country, genre, COUNT(*) 
FROM renting_extended 
GROUP BY CUBE (country, genre);
```

**Breaks down ALL combinations** of country & genre including subtotals and grand totals.

<table>
<thead><tr><th>country</th><th>genre</th><th>count</th></tr></thead>
<tbody>
<tr><td>Austria</td><td>Comedy</td><td>2</td></tr>
<tr><td>Belgium</td><td>Drama</td><td>15</td></tr>
<tr><td>Austria</td><td>Drama</td><td>4</td></tr>
<tr><td>Belgium</td><td>Comedy</td><td>1</td></tr>
<tr><td>Belgium</td><td>null</td><td>16</td></tr>
<tr><td>Austria</td><td>null</td><td>6</td></tr>
<tr><td>null</td><td>Comedy</td><td>3</td></tr>
<tr><td>null</td><td>Drama</td><td>19</td></tr>
<tr><td>null</td><td>null</td><td>22</td></tr>
</tbody>
</table>



## 🎢 GROUP BY ROLLUP

```sql
SELECT country, genre, COUNT(*) 
FROM renting_extended 
GROUP BY ROLLUP (country, genre);
```

**ROLLUP** aggregates in hierarchical order:

* Country + Genre
* Country
* Total

<table>
<thead><tr><th>country</th><th>genre</th><th>count</th></tr></thead>
<tbody>
<tr><td>Austria</td><td>Comedy</td><td>2</td></tr>
<tr><td>Austria</td><td>Drama</td><td>4</td></tr>
<tr><td>Austria</td><td>null</td><td>6</td></tr>
<tr><td>Belgium</td><td>Comedy</td><td>1</td></tr>
<tr><td>Belgium</td><td>Drama</td><td>15</td></tr>
<tr><td>Belgium</td><td>null</td><td>16</td></tr>
<tr><td>null</td><td>null</td><td>22</td></tr>
</tbody>
</table>

### 👁‍🗨 Order Matters in ROLLUP!

```sql
GROUP BY ROLLUP (genre, country)
```

Reverses how NULLs and aggregates appear.



## 🔀 GROUPING SETS

Flexible — define custom grouping levels.

```sql
SELECT country, genre, COUNT(*) 
FROM renting_extended 
GROUP BY GROUPING SETS (
  (country, genre), 
  (country), 
  (genre), 
  ()
);
```

Effectively a **manual `UNION`** of several `GROUP BY` queries.

<table>
<thead><tr><th>country</th><th>genre</th><th>count</th></tr></thead>
<tbody>
<tr><td>Austria</td><td>Comedy</td><td>2</td></tr>
<tr><td>Austria</td><td>Drama</td><td>4</td></tr>
<tr><td>Austria</td><td>null</td><td>6</td></tr>
<tr><td>Belgium</td><td>Comedy</td><td>1</td></tr>
<tr><td>Belgium</td><td>Drama</td><td>15</td></tr>
<tr><td>Belgium</td><td>null</td><td>16</td></tr>
<tr><td>null</td><td>Comedy</td><td>3</td></tr>
<tr><td>null</td><td>Drama</td><td>19</td></tr>
<tr><td>null</td><td>null</td><td>22</td></tr>
</tbody>
</table>



## 📏 Metrics: Rentals + Ratings with GROUPING SETS

```sql
SELECT country, genre, COUNT(*) AS count, AVG(rating) AS avg_rating
FROM renting_extended
GROUP BY GROUPING SETS ((country, genre), (genre));
```

<table>
<thead><tr><th>country</th><th>genre</th><th>count</th><th>avg_rating</th></tr></thead>
<tbody>
<tr><td>Austria</td><td>Comedy</td><td>2</td><td>8.00</td></tr>
<tr><td>Austria</td><td>Drama</td><td>4</td><td>6.00</td></tr>
<tr><td>Belgium</td><td>Comedy</td><td>1</td><td>null</td></tr>
<tr><td>Belgium</td><td>Drama</td><td>15</td><td>9.17</td></tr>
<tr><td>null</td><td>Comedy</td><td>3</td><td>8.00</td></tr>
<tr><td>null</td><td>Drama</td><td>19</td><td>8.38</td></tr>
</tbody>
</table>



## 💼 Final Business Case: Do newer movies get better ratings?

### Step 1: Join Data

```sql
SELECT * 
FROM renting AS r 
LEFT JOIN customers AS c ON r.customer_id = c.customer_id 
LEFT JOIN movies AS m ON m.movie_id = r.movie_id;
```



### Step 2: Filter

* Only movies with at least **4 ratings**
* Only rentals **since 2018-04-01**

```sql
SELECT * 
FROM renting AS r 
LEFT JOIN customers AS c ON r.customer_id = c.customer_id 
LEFT JOIN movies AS m ON m.movie_id = r.movie_id 
WHERE r.movie_id IN (
    SELECT movie_id 
    FROM renting 
    GROUP BY movie_id 
    HAVING COUNT(rating) >= 4
) 
AND r.date_renting >= '2018-04-01';
```



### Step 3: Aggregate with ROLLUP

```sql
SELECT c.country, 
       m.year_of_release, 
       COUNT(*) AS n_rentals, 
       COUNT(DISTINCT r.movie_id) AS n_movies, 
       AVG(rating) AS avg_rating 
FROM renting AS r 
LEFT JOIN customers AS c ON c.customer_id = r.customer_id 
LEFT JOIN movies AS m ON m.movie_id = r.movie_id 
WHERE r.movie_id IN (
    SELECT movie_id 
    FROM renting 
    GROUP BY movie_id 
    HAVING COUNT(rating) >= 4
) 
AND r.date_renting >= '2018-04-01' 
GROUP BY ROLLUP (m.year_of_release, c.country) 
ORDER BY c.country, m.year_of_release;
```

<table>
<thead><tr><th>year_of_release</th><th>country</th><th>n_rentals</th><th>n_movies</th><th>avg_rating</th></tr></thead>
<tbody>
<tr><td>2009</td><td>null</td><td>10</td><td>1</td><td>8.75</td></tr>
<tr><td>2010</td><td>null</td><td>41</td><td>5</td><td>7.96</td></tr>
<tr><td>2011</td><td>null</td><td>14</td><td>2</td><td>8.22</td></tr>
<tr><td>2012</td><td>null</td><td>28</td><td>5</td><td>8.11</td></tr>
<tr><td>2013</td><td>null</td><td>10</td><td>2</td><td>7.60</td></tr>
<tr><td>2014</td><td>null</td><td>5</td><td>1</td><td>8.00</td></tr>
<tr><td>null</td><td>null</td><td>333</td><td>50</td><td>7.90</td></tr>
</tbody>
</table>



## 🎯 Chapter 4 Summary

| OLAP Tool         | Purpose                                                 |
| ----------------- | ------------------------------------------------------- |
| `CUBE`            | All combinations of columns, subtotals, grand totals    |
| `ROLLUP`          | Hierarchical aggregation (e.g., by country, then total) |
| `GROUPING SETS`   | Fully custom aggregation levels                         |
| `ROLLUP + FILTER` | Powerful tool for trend detection                       |
