# Groupby-aggregations With Having
By the end of this lecture you will be able to:
- do a group by-aggregation with having
- group by multiple columns with having


In [1]:
import polars as pl

In [2]:
df = pl.read_csv("../Files/Sample_Superstore.csv")

In [3]:
df.head(3)

Row_ID,Order_ID,Order_Date,Ship_Date,Ship_Mode,Customer_ID,Customer_Name,Segment,Country,City,State,Postal_Code,Region,Product_ID,Category,Sub_Category,Product_Name,Sales,Quantity,Discount,Profit
i64,str,str,str,str,str,str,str,str,str,str,i64,str,str,str,str,str,f64,i64,f64,f64
1,,,"""11-11-2016""","""Second Class""","""CG-12520""","""Claire Gute""","""Consumer""","""United States""","""Henderson""","""Kentucky""",42420,"""South""","""FUR-BO-10001798""","""Furniture""","""Bookcases""","""Bush Somerset Collection Bookc…",261.96,2,0.0,41.9136
2,"""CA-2016-152156""","""08-11-2016""","""11-11-2016""","""Second Class""","""CG-12520""","""Claire Gute""","""Consumer""","""United States""","""Henderson""","""Kentucky""",42420,"""South""","""FUR-CH-10000454""","""Furniture""","""Chairs""","""Hon Deluxe Fabric Upholstered …",731.94,3,0.0,219.582
3,"""CA-2016-138688""","""12-06-2016""",,,"""DV-13045""","""Darrin Van Huff""","""Corporate""",,"""Los Angeles""","""California""",90036,"""West""","""OFF-LA-10000240""","""Office Supplies""","""Labels""","""Self-Adhesive Address Labels f…",14.62,2,0.0,6.8714


## Group-by and aggregation
In Polars we can group by a column and aggregate the data in other columns with the `group_by.agg` combination.

In this example we group by the `Category` and take the max of the `Profit` column more than 2000

In [4]:
(
    df.group_by("Category")
    .agg(pl.col("Profit").max().alias("total_profit"))
    .filter(pl.col("total_profit") > 2000)
)

Category,total_profit
str,f64
"""Office Supplies""",4946.37
"""Technology""",8399.976


In this example we group by the Customer_Name and take the Sum of the `Profit` column more than 2000

In [7]:
(
    df.group_by("Category")
    .agg(pl.col("Profit").sum().alias("total_profit"))
    .filter(pl.col("total_profit") > 2000)
)

Category,total_profit
str,f64
"""Technology""",145125.1584
"""Furniture""",18210.2309
"""Office Supplies""",122469.5792


## Grouping by multiple columns with having

In this example we group by the multiple column and take the max of the `Profit` column more than 2000

In [15]:
(
    df.group_by("Category", "Region")
    .agg(pl.col("Profit").max().alias("total_profit"))
    .filter(pl.col("total_profit") > 2000)
)

Category,Region,total_profit
str,str,f64
"""Office Supplies""","""South""",3177.475
"""Technology""","""East""",5039.9856
"""Technology""","""Central""",8399.976
"""Office Supplies""","""Central""",4946.37
"""Technology""","""West""",6719.9808
"""Technology""","""South""",2799.984


In [16]:
(
    df.group_by("Customer_Name", "Region", "Category")
    .agg(pl.col("Profit").sum().alias("total_profit"))
    .filter(pl.col("total_profit") > 2000)
)

Customer_Name,Region,Category,total_profit
str,str,str,f64
"""Jane Waco""","""West""","""Office Supplies""",2087.2509
"""Andy Reiter""","""Central""","""Office Supplies""",2529.4531
"""Tamara Chand""","""Central""","""Technology""",8399.976
"""Adrian Barton""","""Central""","""Office Supplies""",5353.0892
"""Sanjit Engle""","""South""","""Technology""",2799.984
…,…,…,…
"""Karen Daniels""","""East""","""Technology""",2400.9657
"""Bill Shonely""","""East""","""Technology""",2365.9818
"""Tom Boeckenhauer""","""East""","""Technology""",2239.9872
"""Nathan Mautz""","""East""","""Technology""",2205.6126
