# AGGREGATE functions in PostgresSQL

## What are Aggregate Functions in SQL?

**Aggregate functions** in SQL are special functions that perform a calculation on a set of values and return a single value. These functions are commonly used in conjunction with the `GROUP BY` clause to group rows that share a common attribute and to calculate a summary statistic for each group. Aggregate functions can also be used without `GROUP BY` to calculate a summary statistic for the entire result set.

## Common Aggregate Functions

Here are the most commonly used aggregate functions in SQL:

1. **`COUNT()`**: Returns the number of rows in a group.
2. **`SUM()`**: Returns the sum of all values in a group.
3. **`AVG()`**: Returns the average of all values in a group.
4. **`MIN()`**: Returns the smallest value in a group.
5. **`MAX()`**: Returns the largest value in a group.
6. **`VARIANCE()`**: Returns the variance of the values in a group.
7. **`STDDEV()`**: Returns the standard deviation of the values in a group.

## Why are Aggregate Functions Used?

### 1\. Data Summarization

- **Summary Statistics**: Aggregate functions allow you to compute summary statistics such as totals, averages, minimums, and maximums, providing insights into the data.
- **Reporting**: They are essential for creating reports that require summarized data, such as monthly sales totals, average customer satisfaction scores, and more.

### 2\. Data Analysis

- **Trends and Patterns**: Aggregate functions help identify trends and patterns in data by summarizing it across different dimensions.
- **Comparative Analysis**: They allow for comparative analysis by summarizing data for different groups or time periods.

### 3\. Data Filtering and Segmentation

- **Filtering**: Aggregate functions can be used with the `HAVING` clause to filter groups based on the results of aggregate calculations.
- **Segmentation**: They help in segmenting data into different categories based on aggregate values, such as high, medium, and low performers.

### 4\. Performance Optimization

- **Efficiency**: Aggregate functions enable you to perform complex calculations within the database, reducing the amount of data transferred to the application and improving performance.
- **Data Reduction**: They reduce the amount of data returned by a query, making it easier to manage and analyze large datasets.

## How to Use Aggregate Functions in SQL

### Example 1: Using `COUNT()`

To count the number of orders placed by each customer:

```
SELECT customer_id, COUNT(order_id) AS order_countFROM orders
GROUP BY customer_id;

```

### Example 2: Using `SUM()`

To calculate the total sales for each product:

```
SELECT product_id, SUM(quantity * price) AS total_salesFROM sales
GROUP BY product_id;

```

### Example 3: Using `AVG()`

To find the average salary of employees in each department:

```
SELECT department_id, AVG(salary) AS average_salaryFROM employees
GROUP BY department_id;

```

### Example 4: Using `MIN()` and `MAX()`

To find the minimum and maximum order amounts:

```
SELECT MIN(order_amount) AS min_order, MAX(order_amount) AS max_orderFROM orders;

```

### Example 5: Using `HAVING` with Aggregate Functions

To find departments with an average salary greater than $50,000:

```
SELECT department_id, AVG(salary) AS average_salaryFROM employees
GROUP BY department_id
HAVING AVG(salary) > 50000;

```

## Best Practices for Using Aggregate Functions

1. **Use with `GROUP BY`**: When using aggregate functions, always include the `GROUP BY` clause if you want to group rows based on specific columns.
2. **Filter with `HAVING`**: Use the `HAVING` clause to filter groups based on aggregate values.
3. **Optimize with Indexes**: Consider indexing the columns used in `GROUP BY` to improve query performance.
4. **Avoid Nested Aggregates**: Avoid nesting aggregate functions as it can make queries complex and slow.

## Summary

Aggregate functions in SQL are powerful tools for summarizing and analyzing data. They allow you to compute summary statistics, perform data segmentation, and optimize query performance. By using aggregate functions effectively, you can gain valuable insights into your data and create meaningful reports and analyses.

Using the above knowledge let us practice the AGGREGATE operations on DVD Rental database for the following questions:

**Aggregate Functions:**

- Calculate total revenue generated by each category.
- Determine the average rental duration for different customer segments.
- Find the most rented film.

## <span style="color: var(--vscode-foreground);">Calculate total revenue generated by each category.</span>

In [1]:
WITH RevenueByCategory AS (
    SELECT 
        c.name,
        c.category_id,
        SUM(p.amount) AS total_revenue
    FROM 
        film f 
        INNER JOIN inventory i ON f.film_id = i.film_id
        INNER JOIN rental r ON r.inventory_id = i.inventory_id
        INNER JOIN payment p ON p.rental_id = r.rental_id
        INNER JOIN film_category fc ON fc.film_id = f.film_id
        INNER JOIN category c ON c.category_id = fc.category_id
    GROUP BY c.category_id
)
SELECT rbc.name,
    rbc.total_revenue
FROM 
    RevenueByCategory rbc
ORDER BY 
    rbc.total_revenue DESC;

name,total_revenue
Sports,4892.19
Sci-Fi,4336.01
Animation,4245.31
Drama,4118.46
Comedy,4002.48
New,3966.38
Action,3951.84
Foreign,3934.47
Games,3922.18
Family,3830.15


## Determine the average rental duration for different customer segments.

In [2]:
WITH RentalDurationByStore AS (
    SELECT (r.return_date - r.rental_date) AS rental_duration,
        c.store_id
    FROM rental r 
        INNER JOIN customer c ON r.customer_id = c.customer_id
)
SELECT AVG(rdc.rental_duration) as avg_rental_duration,
    rdc.store_id as customer_segment
FROM RentalDurationByStore as rdc
GROUP BY 2
ORDER BY 1 DESC;

avg_rental_duration,customer_segment
4 days 25:25:30.659112,1
4 days 23:37:41.10079,2


## Find the most rented film.

In [3]:
SELECT f.title,
    COUNT(r.rental_id) AS rental_count
FROM rental r 
    INNER JOIN inventory i ON i.inventory_id = r.inventory_id
    INNER JOIN film f ON i.film_id = f.film_id
GROUP BY 1
ORDER BY 2 DESC
LIMIT 1;

title,rental_count
Bucket Brotherhood,34
