### **`groupby()` and Aggregation**
Grouping and aggregating data in Pandas is essential for summarization, transformation, and analysis of large datasets.

#### **1. Basic `groupby()` Syntax**

```python
# Basic grouping by one column
grouped = df.groupby('Category')

# Grouping by multiple columns
grouped = df.groupby(['Region', 'Year'])
```

- Returns a `DataFrameGroupBy` object.
- No computation until an aggregation is applied (lazy evaluation).

#### **2. Common Aggregation Methods**

| Method         | Description                                     |
|----------------|-------------------------------------------------|
| `sum()`        | Sum of values                                  |
| `mean()`       | Average of values                              |
| `median()`     | Median of values                               |
| `min()`        | Minimum value                                  |
| `max()`        | Maximum value                                  |
| `count()`      | Number of non-null observations                |
| `size()`       | Size of each group (including nulls)           |
| `std()`        | Standard deviation                             |
| `var()`        | Variance                                       |

```python
# Example: sum of Sales by Region
df.groupby('Region')['Sales'].sum()

# Multiple aggregations
df.groupby('Region')['Sales'].agg(['sum', 'mean', 'count'])
``` 

#### **3. Using `.agg()` for Custom Aggregations**

```python
# Dictionary mapping columns to functions
df.groupby('Region').agg({
    'Sales': 'sum',
    'Profit': 'mean',
    'Quantity': 'max'
})

# Lambda functions or named functions
df.groupby('Region')['Sales'].agg(lambda x: x.max() - x.min())
``` 

#### **4. Applying Multiple Aggregations to Multiple Columns**

```python
agg_funcs = {
    'Sales': ['sum', 'mean'],
    'Profit': ['mean', 'std'],
    'Quantity': 'count'
}
df.groupby(['Region', 'Category']).agg(agg_funcs)
``` 

- Results in **MultiIndex** columns with outer (column) and inner (function) levels.


#### **5. Transformations with `.transform()`**

- Returns an object **same shape** as original.
- Useful for adding group-level metrics back to DataFrame.

```python
# Z-score of Sales within each Region
df['Sales_zscore'] = df.groupby('Region')['Sales'].transform(
    lambda x: (x - x.mean()) / x.std()
)
``` 

#### **6. Filtering Groups with `.filter()`**

- Keep entire rows of groups that satisfy a condition.

```python
# Keep regions where total Sales > 1e6
df_filtered = df.groupby('Region').filter(
    lambda x: x['Sales'].sum() > 1e6
)
``` 

#### **7. Looping Over Groups**

```python
for name, group in df.groupby('Category'):
    print(f"Group: {name}")
    print(group.head())
``` 

- `name`: group key or tuple of keys
- `group`: subset DataFrame for that group

#### **8. `as_index` Parameter**

- Default `as_index=True` makes group keys the index of the result.
- Use `as_index=False` to keep keys as columns.

```python
# Keys remain as columns
df.groupby('Region', as_index=False)['Sales'].sum()
``` 

#### **9. `.size()` vs `.count()`**

- `.size()`: counts **all** rows in each group (including NaNs).
- `.count()`: counts **non-null** values for each column.

```python
df.groupby('Region').size()
# vs
df.groupby('Region')['Sales'].count()
``` 

#### **Tips**

- Use `.agg()` for **flexible** summaries.
- Use `.transform()` to **annotate** original DataFrame.
- Use `.filter()` to **prune** groups.
- Remember chaining methods (`.groupby().agg().reset_index()`) to flatten results.

In [1]:
import pandas as pd