# GROUP BY in PostgresSQL

## What is Grouping in SQL?

**Grouping** in SQL refers to the process of combining rows with similar values into a single group. This is done using the `GROUP BY` clause in a SQL query. When you group rows, you can perform aggregate calculations on the grouped data, such as calculating the sum, average, count, minimum, or maximum of a specific column.

### How Grouping Works

- **Group by Columns**: The columns specified in the `GROUP BY` clause are used to determine how the rows are grouped. Rows that have the same values in these columns are placed in the same group.
- **Aggregate Functions**: Once the data is grouped, aggregate functions can be applied to the grouped data to perform calculations and return a single value for each group.

### Example of Grouping

Suppose you have a table named `sales` with the following columns: `product_id`, `quantity`, and `price`. To calculate the total sales for each product, you can use the `GROUP BY` clause as follows:

```
SELECT product_id, SUM(quantity * price) AS total_salesFROM sales
GROUP BY product_id;

```

In this example, the rows in the `sales` table are grouped by the `product_id` column. The `SUM()` function is then used to calculate the total sales for each group.

## Why Do We Need Grouping in SQL?

Grouping in SQL is essential for several reasons:

### 1\. Data Summarization

- **Aggregate Information**: Grouping allows you to aggregate information from multiple rows into a single result, making it easier to analyze and understand the data.
- **Summarized Reports**: It is used to create summarized reports, such as total sales by product, average salary by department, or count of orders by customer.

### 2\. Data Analysis

- **Identify Patterns and Trends**: Grouping helps in identifying patterns and trends in the data by summarizing it across different dimensions.
- **Comparative Analysis**: It enables comparative analysis by summarizing data for different groups or categories.

### 3\. Data Segmentation

- **Segment Data**: Grouping allows you to segment data into different categories based on specific criteria, such as customer segments, product categories, or geographic regions.
- **Targeted Insights**: It helps in deriving targeted insights for each segment, which can be used for decision-making and strategy formulation.

### 4\. Performance Optimization

- **Efficient Data Retrieval**: Grouping helps in retrieving only the relevant aggregated data, reducing the amount of data processed and transferred to the application.
- **Reduced Data Volume**: By summarizing data, grouping reduces the volume of data returned by a query, making it easier to manage and analyze large datasets.

### 5\. Data Filtering

- **Filter Groups**: Grouping can be combined with the `HAVING` clause to filter groups based on specific conditions. This allows you to focus on groups that meet certain criteria.

## How to Use Grouping in SQL?

### Example 1: Grouping by a Single Column

To count the number of orders placed by each customer:

```
SELECT customer_id, COUNT(order_id) AS order_countFROM orders
GROUP BY customer_id;

```

### Example 2: Grouping by Multiple Columns

To calculate the total sales for each product in each region:

```
SELECT product_id, region, SUM(quantity * price) AS total_salesFROM sales
GROUP BY product_id, region;

```

### Example 3: Grouping with the `HAVING` Clause

To find products with total sales greater than $10,000:

```
SELECT product_id, SUM(quantity * price) AS total_salesFROM sales
GROUP BY product_id
HAVING SUM(quantity * price) > 10000;

```

### Example 4: Grouping and Sorting

To find the average salary of employees in each department, sorted by the average salary in descending order:

```
SELECT department_id, AVG(salary) AS average_salaryFROM employees
GROUP BY department_id
ORDER BY average_salary DESC;

```

## Best Practices for Using Grouping in SQL

1. **Select Columns**: Only include columns in the `SELECT` clause that are used in the `GROUP BY` clause or are aggregated.
2. **Use with Aggregates**: Always use aggregate functions in conjunction with grouping to perform meaningful calculations.
3. **Optimize with Indexes**: Consider indexing the columns used in the `GROUP BY` clause to improve query performance.
4. **Filter with `HAVING`**: Use the `HAVING` clause to filter groups based on aggregate values.

## Summary

Grouping in SQL is a powerful feature that allows you to organize and summarize data based on specific criteria. It is essential for data summarization, analysis, segmentation, and performance optimization. By using the `GROUP BY` clause effectively, you can gain valuable insights from your data and create meaningful reports and analyses.

Lets do some questions on GROUPING in SQL on DVD Rental Database on the following questions:

  

**Grouping Data:**

- Group customers by city and count the number of customers in each city.
- Calculate total sales for each film category.
- Find the top 3 rental stores by total revenue.

## <span style="color: var(--vscode-foreground);">Group customers by city and count the number of customers in each city.</span>

In [1]:
SELECT c.city,
    COUNT(cu.customer_id) AS cusotmer_count
FROM 
    city c
    INNER JOIN address a ON c.city_id = a.city_id
    INNER JOIN customer cu ON cu.address_id = a.address_id
GROUP BY 1
ORDER BY 2 DESC;

city,cusotmer_count
London,2
Aurora,2
Tokat,1
Atlixco,1
Mukateve,1
Pontianak,1
Gatineau,1
Saint-Denis,1
Molodetno,1
Yingkou,1


## Calculate total sales for each film category.

In [2]:
SELECT 
    c.name AS category_name,
    SUM(p.amount) AS total_sales
FROM 
    payment p
    INNER JOIN rental r ON p.rental_id = r.rental_id
    INNER JOIN inventory i ON r.inventory_id = i.inventory_id
    INNER JOIN film f ON i.film_id = f.film_id
    INNER JOIN film_category fc ON f.film_id = fc.film_id
    INNER JOIN category c ON fc.category_id = c.category_id
GROUP BY 
    c.name
ORDER BY 
    total_sales DESC;

category_name,total_sales
Sports,4892.19
Sci-Fi,4336.01
Animation,4245.31
Drama,4118.46
Comedy,4002.48
New,3966.38
Action,3951.84
Foreign,3934.47
Games,3922.18
Family,3830.15


## Find the top 3 rental stores by total revenue.

In [3]:
SELECT 
    s.store_id,
    SUM(p.amount) AS total_revenue
FROM 
    payment p
    INNER JOIN rental r ON p.rental_id = r.rental_id
    INNER JOIN customer c ON r.customer_id = c.customer_id
    INNER JOIN store s ON c.store_id = s.store_id
GROUP BY 
    s.store_id
ORDER BY 
    total_revenue DESC
LIMIT 3;

store_id,total_revenue
1,33626.39
2,27685.65
