# **Guided Lab 343.4.4 - Pandas Aggregate Function**

## **Lab Overview**

This lab explores the `aggregate()` function (also known as `agg()`) in Pandas, which is a powerful tool for performing summary computations on data. The lab covers the following topics:

1. **Introduction to Aggregate Function**: Explains the purpose and basic syntax of the `aggregate()` function in Pandas.
2. **Applying Single Aggregate Function**: Demonstrates how to apply a single aggregate function (e.g., `sum`, `mean`, `max`, `count`) to a Pandas Series or DataFrame column.
3. **Applying Multiple Aggregate Functions**: Shows how to apply multiple aggregate functions to one or more columns using the `aggregate()` function.
4. **Grouping and Aggregation**: Introduces the concept of grouping data using the `groupby()` function and then applying aggregate functions to the groups.

## **Introduction**
Aggregate function in Pandas performs summary computations on data, often on grouped data. But it can also be used on Series objects.

This can be really useful for tasks such as calculating mean, sum, count, and other statistics for different groups within our data.


## **Lab Objectives:**

By the end of this lab, you should be able to:

* Describe the purpose and syntax of the `aggregate()` function in Pandas.
* Apply single and multiple aggregate functions to data.
* Group data using the `groupby()` function.
* Perform aggregation on grouped data to calculate summary statistics.

**Syntax**

Here's the basic syntax of the aggregate function,

`df.aggregate(func, axis=0, *args, **kwargs)`
Here,

- func - an aggregate function like sum, mean, etc.
- axis - specifies whether to apply the aggregation operation along rows or columns.
- *args and **kwargs - additional arguments that can be passed to the aggregation functions.

---
# **Begin**

 In order to explain several examples of how to perform aggregate() function or agg() function, lets create a very simple DataFrame.

In [1]:
import pandas as pd

data = {
    'Category': ['A', 'A', 'B', 'B', 'A', 'B'],
    'Value': [10, 15, 20, 25, 30, 35],
    'Fee' :[20000,25000,26000,22000,24000,35000],
    'Duration':['30day','40days','35days','40days','60days','60days'],
    'Discount':[1000,2300,1200,2500,2000,2000]
}
df = pd.DataFrame(data)
df

Unnamed: 0,Category,Value,Fee,Duration,Discount
0,A,10,20000,30day,1000
1,A,15,25000,40days,2300
2,B,20,26000,35days,1200
3,B,25,22000,40days,2500
4,A,30,24000,60days,2000
5,B,35,35000,60days,2000


## **Example: Apply Single Aggregate Function**

Here's how we can apply a single aggregate() function in Pandas.

In [2]:
# calculate total sum of the Value column
total_sum = df['Value'].aggregate('sum')
print("Total Sum:", total_sum)

# Alternate

total_sum2 = df[['Value']].sum()
print("Total Sum:", total_sum2)



Total Sum: 135
Total Sum: Value    135
dtype: int64


**Let's find cumulative sum:**
- The cumulative sum is the cumulative sum of the differences between the values and the average in the given data.
- It is not the sum of the data given.

- The amounts that are added to the cumulative sum, those amounts will be positive, and the sum will increase steadily or continuously.

- Cumulative sums, or running totals, are used to display the total sum of values as it grows with time (or any other series or progression). This lets you view the total contribution so far of a given measure against time.

In [3]:
df[['Value']].cumsum()

Unnamed: 0,Value
0,10
1,25
2,45
3,70
4,100
5,135


In [4]:
# calculate the mean of the Value column
average_value = df['Value'].aggregate('mean')
print("Average Value:", average_value)

# Alternate

average_value2 = df[['Value']].mean()
print("Average Value:", average_value2)



Average Value: 22.5
Average Value: Value    22.5
dtype: float64


In [5]:
# calculate the maximum value in the Value column
max_value = df['Value'].aggregate('max')
print("Maximum Value:", max_value)

# Alternate

max_value2 = df[['Value']].max()
print("Maximum Value:", max_value2)

Maximum Value: 35
Maximum Value: Value    35
dtype: int64


In [6]:
# calculate the total number of the Value
std_value = df['Value'].aggregate('count')
print("Total count:", std_value)

# Alternate


std_value2 = df[['Value']].count()
print("Total count:", std_value2)

Total count: 6
Total count: Value    6
dtype: int64


## **Example: Apply Multiple Aggregate Functions in Pandas**
We can also apply multiple aggregation() functions to one or more columns using the aggregate() function in Pandas. For example,

In [7]:
# applying multiple aggregation functions to a single column
result = df[['Fee','Discount']].aggregate('sum')
print(result)

#Alternate

result2 = df[['Fee','Discount']].sum()
print(result2)

Fee         152000
Discount     11000
dtype: int64
Fee         152000
Discount     11000
dtype: int64


**To determine the total of each Group, we can use grouping by utilizing the groupby() function. Next, we will explore the concept of grouping..**

In [8]:
# Use DataFrame.group() Function
result_Group = df.groupby('Category')['Fee','Discount'].aggregate('sum')
print(result_Group)

ValueError: Cannot subset columns with a tuple with more than one element. Use a list instead.


## **Submission Instructions**
- Submit your completed lab using the Start Assignment button on the assignment page in Canvas.
- Your submission can be include:
  - if you are using notebook then, all tasks should be written and submitted in a single notebook file, for example: (**your_name_labname.ipynb**).
  - if you are using python script file, all tasks should be written and submitted in a single python script file for example: **(your_name_labname.py)**.
- Add appropriate comments and any additional instructions if required.
