# Aggreagation Functions 


##### Reduction functions -> take an array and reduce it down to a single value, or a smaller array.

## Essential NumPy Statistical Functions for ML/DS

| Function | Description | Example Output | Application in ML/DS |
| :--- | :--- | :--- | :--- |
| `np.sum(arr)` | Calculates the **total sum** of all elements. | $15$ | Calculating total loss or the sum of weighted inputs. |
| `np.mean(arr)` | Calculates the **arithmetic mean** (average). | $3.0$ | Finding the **average value of a feature** (for normalization). |
| `np.median(arr)` | Calculates the **median** (middle value) of the array. | $3.0$ | Useful for statistics **less sensitive to outliers**. |
| `np.std(arr)` | Calculates the **standard deviation**. | $1.41...$ | Key for **Standard Scaling** data during pre-processing. |
| `np.var(arr)` | Calculates the **variance** (standard deviation squared). | $2.0$ | **Measuring the spread** of the data. |
| `np.min(arr)` | Finds the **minimum value**. | $1$ | Useful for **min-max scaling** and finding range. |
| `np.max(arr)` | Finds the **maximum value**. | $5$ | Useful for **min-max scaling** and finding range. |
| `np.argmin(arr)` | Returns the **index** of the minimum value. | $0$ | Finding the **index of the predicted class with the lowest error**. |
| `np.argmax(arr)` | Returns the **index** of the maximum value. | $4$ | Essential for **finding the predicted class in classification models**. |

In [1]:
import numpy as np

In [6]:
data_id = np.array([1, 5, 2, 4, 5])

print(f"Total Sum: {np.sum(data_id)}")
print(f"Average: {np.mean(data_id)}")
print(f"Max Value: {np.max(data_id)}")
print(f"Index of Max: {np.argmax(data_id)}")


Total Sum: 17
Average: 3.4
Max Value: 5
Index of Max: 1


In [8]:
matrix = np.array([[1, 2, 3, 4],
                  [5, 6, 7, 8],
                  [9, 10, 11, 12]])

print(matrix)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


In [None]:
# Aggregaton voer the entire array
print(F"oVERALL Mean: {np.mean(matrix)}") # Mean of all 12 numbers 



oVERALL Mean: 6.5


## Axis Parameter

In [15]:
matrix = np.array([[1, 2, 3, 4],
                  [5, 6, 7, 8],
                  [9, 10, 11, 12]])

print(matrix)

# Aggregation along axis = 0 ( across rows )
mean_by_feature = np.mean(matrix, axis = 0)
print(f"\nMean by feature (axis = 0): {mean_by_feature}")

# Aggreagation along axis = 0 ( across columns )
sum_by_sample = np.sum(matrix, axis = 1)
print(f"\nSum by sample (axis = 1): {sum_by_sample}")

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Mean by feature (axis = 0): [5. 6. 7. 8.]

Sum by sample (axis = 1): [10 26 42]


## Cumulative Aggregation

In [16]:
# Cumulative version of some functions  
# Produces an array where each element is result of operation up to the point
data = np.array([1, 2, 3, 5])

1. np.cumsum(arr): Cumulative sum.
2. np.cumprod(arr): Cumulative product.

In [18]:
cumulative_sum = np.cumsum(data)
print(F"Cumulative Sum: {cumulative_sum}")

Cumulative Sum: [ 1  3  6 11]
