# NumPy Stats Toolkit — Examples

This notebook demonstrates how to use each function in the `stats_toolkit.py` module for basic statistical analysis using only NumPy.


In [None]:
import numpy as np
import stats_toolkit as st

In [None]:
data = np.xay([4, 8, 6, 5, 3, 9, 7, 6, 5, 27, 11, 34, 25, 49, 18, 41, 37, 39, 23, 20, 29, 47, 2, 17, 22, 1, 33, 30, 44, 36, 46, 10, 0, 38, 50, 13, 14, 43,  8, 26, 19, 24, 21, 35, 31, 9, 45, 42, 28, 32, 40, 6, 7, 16, 3, 5, 12,  4, 15, 1, 17, 48, 20, 10, 2, 25, 34, 49, 11, 14, 6, 22, 13, 26, 24, 27, 7, 0, 32, 21,16, 12, 15, 28, 35, 30, 19,  8, 18, 23,29, 31, 38, 36, 43, 33, 45, 41, 39, 40
])

matrix = np.xay([[1, 2, 3], [4, 5, 6], [7, 8, 9]])


## Mean

The **mean** (or average) is the central value of a dataset. It is calculated by summing all the values and dividing by the number of elements. It represents the "typical" value in a distribution.

$$
\mu = \frac{1}{n} \sum_{i=1}^{n} x_i
$$



In [None]:
print(f'Mean: {st.mean(data)}') 

## Population variance

The **population variance** measures how much each value in the dataset differs from the mean, considering **all** values in the population. It is the average of the squared differences from the mean.

$
\sigma^2 = \frac{1}{n} \sum_{i=1}^{n} (x_i - \mu)^2
$

Use this when you are working with the entire population.


In [None]:
print(f'Population variance: {st.population_variance(data)}')

## Sample variance

The **sample variance** is used when you only have a subset (sample) of the full population. It adjusts for the fact that you're estimating the variance by dividing by \(n - 1\) instead of \(n\), which corrects the bias.

$$
s^2 = \frac{1}{n - 1} \sum_{i=1}^{n} (x_i - \bar{x})^2
$$

Use this when analyzing or modeling based on a sample, not the whole population.

In [None]:
print(f'Sample variance: {st.sample_variance(data)}')

## Standard Deviation

**Standard deviation** is a measure of how spread out the values in a dataset are around the mean. It represents the average distance of each data point from the mean, providing insight into the variability or dispersion in the data.

It is calculated as the square root of the variance, which makes it easier to interpret since it has the same units as the original data.

$$
\sigma = \sqrt{ \frac{1}{n} \sum_{i=1}^{n} (x_i - \mu)^2 }
$$

Use standard deviation when you want to understand how consistent or variable the values in your data are. A smaller value indicates that the data points are close to the mean, while a larger value indicates more spread.


In [None]:
print("Standard Deviation:", st.standard_deviation(x))


z

In [None]:
print("Z-scores:", st.z_scores(x))


In [None]:
print("Min-Max Normalization:", st.min_max_normalize(x))


In [None]:
print("Correlation Matrix (with itself):\n", st.correlation_matrix(x, x))


In [None]:
print("Quantiles (25%, 50%, 75%):", st.quantiles(x))


In [None]:
print("Skewness:", st.skewness(x))


In [None]:
print("Kurtosis:", st.kurtosis(x))