# **Data Science Learners Hub**

**Module : SQL**

**Topic :** **GROUP BY Clause**

**email** : [datasciencelearnershub@gmail.com](mailto:datasciencelearnershub@gmail.com)

### **GROUP BY Clause:**

- The GROUP BY clause is used to arrange identical data into groups. It is often used with aggregate functions like SUM, COUNT, AVG, MAX, or MIN to perform operations on each group of data.
- It is a key player in summarizing and analyzing large datasets.

In [1]:
USE DataScienceLearnersHub

**Create a simple table:**

In [2]:
CREATE TABLE Sales (
    SaleID INT PRIMARY KEY,
    ProductName VARCHAR(50),
    Category VARCHAR(50),
    SaleAmount DECIMAL(10, 2),
    SaleDate DATE
);

**Insert queries**

In [3]:
INSERT INTO Sales VALUES (1, 'Product A', 'Electronics', 500.00, '2023-01-01');
INSERT INTO Sales VALUES (2, 'Product B', 'Clothing', 300.00, '2023-01-02');
INSERT INTO Sales VALUES (3, 'Product C', 'Electronics', 800.00, '2023-01-03');
INSERT INTO Sales VALUES (4, 'Product D', 'Clothing', 450.00, '2023-01-04');
INSERT INTO Sales VALUES (5, 'Product E', 'Electronics', 700.00, '2023-01-05');

In [4]:
SELECT * FROM Sales

SaleID,ProductName,Category,SaleAmount,SaleDate
1,Product A,Electronics,500.0,2023-01-01
2,Product B,Clothing,300.0,2023-01-02
3,Product C,Electronics,800.0,2023-01-03
4,Product D,Clothing,450.0,2023-01-04
5,Product E,Electronics,700.0,2023-01-05


**Types of GROUP BY Clause uses:**

**a. Simple GROUP BY:**

- This will give you the total sales amount for each category.

In [5]:
SELECT Category, SUM(SaleAmount) AS TotalSales
FROM Sales
GROUP BY Category;

Category,TotalSales
Clothing,750.0
Electronics,2000.0


**b. GROUP BY with HAVING:**

- This will display categories with an average sale amount greater than 400.
- Note : The HAVING Clause is discussed in detail separately. The **HAVING** <span style="color: rgb(0, 0, 0); font-family: &quot;Helvetica Neue&quot;; font-size: 13px;"> clause in SQL is used in conjunction with the </span> **GROUP BY** <span style="color: rgb(0, 0, 0); font-family: &quot;Helvetica Neue&quot;; font-size: 13px;"> clause to filter the results of a query based on aggregate conditions. While the </span> **WHERE** <span style="color: rgb(0, 0, 0); font-family: &quot;Helvetica Neue&quot;; font-size: 13px;"> clause filters individual rows before they are grouped and aggregated, the </span> **HAVING** <span style="color: rgb(0, 0, 0); font-family: &quot;Helvetica Neue&quot;; font-size: 13px;"> clause filters the results after the grouping has occurred.</span>

In [6]:
SELECT Category, AVG(SaleAmount) AS AvgSales
FROM Sales
GROUP BY Category
HAVING AVG(SaleAmount) > 400;

Category,AvgSales
Electronics,666.666666


**c. GROUP BY with Multiple Columns:**

- This will show the count of sales for each product in each category.

In [7]:
SELECT Category, ProductName, COUNT(*) AS SaleCount
FROM Sales
GROUP BY Category, ProductName;

Category,ProductName,SaleCount
Electronics,Product A,1
Clothing,Product B,1
Electronics,Product C,1
Clothing,Product D,1
Electronics,Product E,1


### **Practical application of GROUP BY**

- In real-world scenarios, GROUP BY is often used in business intelligence, reporting, and data analysis. For example, you might use it to analyze sales performance by region, product category, or time period.
- **Real-World Applications:**
    - **E-commerce**: Analyzing customer purchase trends, identifying top-selling products.
    - **Finance**: Calculating average expense per category, tracking revenue by region.
    - **Marketing**: Segmenting customers based on demographics, analyzing campaign effectiveness.

### **Peculiarities and Considerations:**

- All selected columns in the SELECT statement that are not part of an aggregate function must be included in the GROUP BY clause.
- The order of columns in the GROUP BY clause matters.

### **Common mistakes:**

- Forgetting to include all non-aggregated columns in the GROUP BY clause.
- Misusing aggregate functions.
- Incorrectly ordering columns in the GROUP BY clause.