# 1. Grouping Data

The `groupby` functionality in pandas is a powerful tool for aggregating and analyzing data. 
It allows you to split your data into groups based on some criteria, perform operations on those groups, 
and then combine the results back into a DataFrame.

In [1]:
import pandas as pd

# Sample DataFrame
data = {'Category': ['A', 'A', 'B', 'B', 'C', 'C'],
        'Values': [10, 20, 30, 40, 50, 60]}
df = pd.DataFrame(data)

# Group by 'Category'
grouped = df.groupby('Category')

In this example, `grouped` is a `DataFrameGroupBy` object, which means that it's not yet aggregated; it's just a reference to the grouped data.

## a. Aggregating Data

After grouping, you can perform various aggregate functions like `sum()`, `mean()`, `count()`, etc.

In [2]:
# Sum of 'Values' for each 'Category'
sum_df = grouped.sum()
print(sum_df)

          Values
Category        
A             30
B             70
C            110


### Applying Custom Aggregations

You can also use `agg()` to apply multiple aggregation functions at once or to apply custom functions.

In [3]:
# Multiple aggregations
agg_df = grouped.agg({'Values': ['sum', 'mean', 'max']})
print(agg_df)

         Values          
            sum  mean max
Category                 
A            30  15.0  20
B            70  35.0  40
C           110  55.0  60


## b. Filtering Groups

You can filter groups based on some condition using `filter()`.

In [4]:
# Filter groups where the sum of 'Values' is greater than 50
filtered = grouped.filter(lambda x: x['Values'].sum() > 50)
print(filtered)

  Category  Values
2        B      30
3        B      40
4        C      50
5        C      60


## c. Iterating Over Groups

In [5]:
for name, group in grouped:
    print(f"Group name: {name}")
    print(group)

Group name: A
  Category  Values
0        A      10
1        A      20
Group name: B
  Category  Values
2        B      30
3        B      40
Group name: C
  Category  Values
4        C      50
5        C      60


## d. Grouping by Multiple Columns

In [6]:
# Sample DataFrame with multiple columns
data_multi = {'Category': ['A', 'A', 'B', 'B', 'C', 'C'],
              'Type': ['X', 'Y', 'X', 'Y', 'X', 'Y'],
              'Values': [10, 20, 30, 40, 50, 60]}
df_multi = pd.DataFrame(data_multi)

# Group by 'Category' and 'Type'
grouped_multi = df_multi.groupby(['Category', 'Type'])
print(grouped_multi.sum())

               Values
Category Type        
A        X         10
         Y         20
B        X         30
         Y         40
C        X         50
         Y         60


## e. Using `groupby()` with `apply()`

You can apply custom functions to each group using `apply()`.

In [7]:
# Custom function to get the first value in each group
def custom_func(group):
    return group.head(1)

result = grouped.apply(custom_func)
print(result)

           Category  Values
Category                   
A        0        A      10
B        2        B      30
C        4        C      50


  result = grouped.apply(custom_func)


## f. Transforming Data

If you want to perform operations on each group but return a DataFrame of the same shape, use `transform()`.

In [8]:
# Standardize values within each group
standardized = grouped.transform(lambda x: (x - x.mean()) / x.std())
print(standardized)

     Values
0 -0.707107
1  0.707107
2 -0.707107
3  0.707107
4 -0.707107
5  0.707107


# 2. Aggregating Data

The `aggregate()` function (or `agg()` for short) is used to apply multiple aggregation functions to the columns of a DataFrame or Series. It's very flexible and powerful for summarizing data.

Here's a basic syntax:

```python
DataFrame.aggregate(func, axis=0, *args, **kwargs)
```

- `func`: The aggregation function or functions to apply.

- `axis`: The axis to aggregate over. `0` (default) aggregates over columns, and `1` aggregates over rows.

- `*args` and `**kwargs`: Additional arguments and keyword arguments to pass to the aggregation functions.

## a. Aggregating with a Single Function

After grouping, you can perform various aggregate functions like `sum()`, `mean()`, `count()`, etc.

In [9]:
import pandas as pd

# Sample DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': [5, 6, 7, 8]
})

# Applying aggregation
result = df.aggregate('mean')
print(result)

A    2.5
B    6.5
dtype: float64


## b. Aggregating with Multiple Functions

In [10]:
result = df.aggregate(['mean', 'std'])
print(result)

             A         B
mean  2.500000  6.500000
std   1.290994  1.290994


## c.Aggregating with Different Functions for Different Columns

In [11]:
result = df.aggregate({
    'A': 'sum',
    'B': 'mean'
})
print(result)

A    10.0
B     6.5
dtype: float64


## d. Aggregating Rows

To aggregate over rows, set `axis=1`:

In [12]:
result = df.aggregate('sum', axis=1)
print(result)

0     6
1     8
2    10
3    12
dtype: int64


## e. Aggregating with Custom Functions

In [13]:
def range_func(x):
    return x.max() - x.min()

result = df.aggregate(range_func)
print(result)

A    3
B    3
dtype: int64
