```{contents}
```


## Data Aggregation

In [5]:
import pandas as pd

# Sample loan applications dataset
data1 = {
    'application_id': ['A1','A2','A3','A4'],
    'customer_id': ['C1','C2','C3','C4'],
    'loan_type': ['Business Loan', 'Car Loan', 'Education Loan', 'Car Loan'],
    'loan_amount_requested': [604000, 100000, 431000, 324000],
    'loan_status': ['Approved','Approved','Approved','Declined']
}

df1 = pd.DataFrame(data1)

# Sample customer info dataset
data2 = {
    'customer_id': ['C1','C2','C3','C5'],
    'monthly_income': [34700, 51600, 14800, 40000],
    'employment_status': ['Retired','Unemployed','Self-Employed','Employed']
}

df2 = pd.DataFrame(data2)



---

### Aggregation



In [6]:

# Sum, mean, max of loan amount
df1['loan_amount_requested'].agg(['sum','mean','max'])

# Group by loan type
df1.groupby('loan_type')['loan_amount_requested'].agg(['mean','sum','max'])

# Count of loans by status
df1.groupby('loan_status')['application_id'].count()


loan_status
Approved    3
Declined    1
Name: application_id, dtype: int64



---

### Joining / Concatenation

In [7]:

# Concatenate two dataframes vertically (stack rows)
df_vertical = pd.concat([df1, df1], ignore_index=True)

# Concatenate horizontally (add columns)
df_horizontal = pd.concat([df1, df2], axis=1)




---

### Merging / Join on key column



In [8]:

# Merge on customer_id (like SQL JOIN)
df_merged = pd.merge(df1, df2, on='customer_id', how='inner')  # inner join
df_merged_left = pd.merge(df1, df2, on='customer_id', how='left')  # left join
df_merged_outer = pd.merge(df1, df2, on='customer_id', how='outer')  # outer join

df_merged


Unnamed: 0,application_id,customer_id,loan_type,loan_amount_requested,loan_status,monthly_income,employment_status
0,A1,C1,Business Loan,604000,Approved,34700,Retired
1,A2,C2,Car Loan,100000,Approved,51600,Unemployed
2,A3,C3,Education Loan,431000,Approved,14800,Self-Employed




---

### Windowing / Rolling & Expanding

   - Windowing allows **calculations over a moving set of rows** (rolling) or **expanding set**.



In [9]:

# Sample numeric column for demonstration
df1['loan_amount_cumsum'] = df1['loan_amount_requested'].cumsum()  # cumulative sum
df1['loan_amount_rolling_mean'] = df1['loan_amount_requested'].rolling(window=2).mean()  # rolling mean
df1['loan_amount_expanding_mean'] = df1['loan_amount_requested'].expanding().mean()  # expanding mean
df1['loan_amount_rank'] = df1['loan_amount_requested'].rank()  # ranking
df1['loan_amount_pct_change'] = df1['loan_amount_requested'].pct_change()  # percent change


---

**Summary**

| Operation               | Method / Function                                                     | Example                                                                |
| ----------------------- | --------------------------------------------------------------------- | ---------------------------------------------------------------------- |
| Aggregation             | `sum()`, `mean()`, `min()`, `max()`, `.agg()`, `.groupby()`           | `df.groupby('loan_type')['loan_amount_requested'].agg(['mean','sum'])` |
| Joining / Concatenation | `pd.concat()`                                                         | `pd.concat([df1, df2], axis=0 or 1)`                                   |
| Merging / Join          | `pd.merge()`                                                          | `pd.merge(df1, df2, on='customer_id', how='inner')`                    |
| Windowing / Rolling     | `.rolling()`, `.expanding()`, `.cumsum()`, `.pct_change()`, `.rank()` | See code above                                                         |

