# Groupby-aggregations With Having
By the end of this lecture you will be able to:
- do a group by-aggregation with having
- group by multiple columns with having


In [9]:
import polars as pl

In [11]:
df = pl.read_csv("../Files/Sample_Superstore.csv")

In [12]:
df.head()

Row_ID,Order_ID,Order_Date,Ship_Date,Ship_Mode,Customer_ID,Customer_Name,Segment,Country,City,State,Postal_Code,Region,Product_ID,Category,Sub_Category,Product_Name,Sales,Quantity,Discount,Profit
i64,str,str,str,str,str,str,str,str,str,str,i64,str,str,str,str,str,f64,i64,f64,f64
1,,,"""11-11-2016""","""Second Class""","""CG-12520""","""Claire Gute""","""Consumer""","""United States""","""Henderson""","""Kentucky""",42420,"""South""","""FUR-BO-10001798""","""Furniture""","""Bookcases""","""Bush Somerset Collection Bookc…",261.96,2,0.0,41.9136
2,"""CA-2016-152156""","""08-11-2016""","""11-11-2016""","""Second Class""","""CG-12520""","""Claire Gute""","""Consumer""","""United States""","""Henderson""","""Kentucky""",42420,"""South""","""FUR-CH-10000454""","""Furniture""","""Chairs""","""Hon Deluxe Fabric Upholstered …",731.94,3,0.0,219.582
3,"""CA-2016-138688""","""12-06-2016""",,,"""DV-13045""","""Darrin Van Huff""","""Corporate""",,"""Los Angeles""","""California""",90036,"""West""","""OFF-LA-10000240""","""Office Supplies""","""Labels""","""Self-Adhesive Address Labels f…",14.62,2,0.0,6.8714
4,,"""11-10-2015""",,"""Standard Class""","""SO-20335""","""Sean O'Donnell""","""Consumer""","""United States""","""Fort Lauderdale""","""Florida""",33311,"""South""","""FUR-TA-10000577""","""Furniture""","""Tables""","""Bretford CR4500 Series Slim Re…",957.5775,5,0.45,-383.031
5,"""US-2015-108966""","""11-10-2015""","""18-10-2015""","""Standard Class""","""SO-20335""","""Sean O'Donnell""","""Consumer""","""United States""",,"""Florida""",33311,"""South""","""OFF-ST-10000760""","""Office Supplies""","""Storage""","""Eldon Fold 'N Roll Cart System""",22.368,2,0.2,2.5164


## Group-by and aggregation
In Polars we can group by a column and aggregate the data in other columns with the `group_by.agg` combination.

In this example we group by the `Category` and take the max of the `Profit` column more than 2000

In [13]:
(
    df.group_by("Category")
    .agg(pl.col("Profit").max().alias("total_profit"))
    .filter(pl.col("total_profit") > 2000)
)

Category,total_profit
str,f64
"""Technology""",8399.976
"""Office Supplies""",4946.37


In this example we group by the Customer_Name and take the Sum of the `Profit` column more than 2000

In [15]:
(
    df.group_by("Category")
    .agg(pl.col("Profit").sum().alias("total_profit"))
    .filter(pl.col("total_profit") > 2000)
)

Category,total_profit
str,f64
"""Technology""",145454.9481
"""Office Supplies""",122490.8008
"""Furniture""",18451.2728


## Grouping by multiple columns with having

In this example we group by the multiple column and take the max of the `Profit` column more than 2000

In [17]:
(
    df.group_by("Customer_Name", "Category", "Region")
    .agg(pl.col("Profit").max().alias("total_profit"))
    .filter(pl.col("total_profit") > 2000)
)

Customer_Name,Category,Region,total_profit
str,str,str,f64
"""Keith Dawkins""","""Technology""","""East""",2229.024
"""Adrian Barton""","""Office Supplies""","""Central""",4946.37
"""Raymond Buch""","""Technology""","""West""",6719.9808
"""Sanjit Engle""","""Technology""","""South""",2799.984
"""Tom Ashbrook""","""Technology""","""East""",3919.9888
…,…,…,…
"""Andy Reiter""","""Office Supplies""","""Central""",2504.2216
"""Harry Marie""","""Technology""","""Central""",2302.9671
"""Christopher Martinez""","""Office Supplies""","""South""",3177.475
"""Sanjit Chand""","""Office Supplies""","""Central""",4630.4755
