# NumPy Stats Toolkit — Examples

This notebook demonstrates how to use each function in the `stats_toolkit.py` module for basic statistical analysis using only NumPy.


In [1]:
import numpy as np
import stats_toolkit as st



In [2]:
data = np.array([4, 8, 6, 5, 3, 9, 7, 6, 5])
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])


## Mean

The **mean** (or average) is the central value of a dataset. It is calculated by summing all the values and dividing by the number of elements. It represents the "typical" value in a distribution.

$
\mu = \frac{1}{n} \sum_{i=1}^{n} x_i
$



In [3]:
print(f'Mean: {st.mean(data)}') 

Mean: 5.888888888888889


## Variance

The **population variance** measures how much each value in the dataset differs from the mean, considering **all** values in the population. It is the average of the squared differences from the mean.

$
\sigma^2 = \frac{1}{n} \sum_{i=1}^{n} (x_i - \mu)^2
$

Use this when you are working with the entire population.


In [6]:
print(f'Population variance: {st.population_variance(data)}')




Population variance: 3.209876543209876


The **sample variance** is used when you only have a subset (sample) of the full population. It adjusts for the fact that you're estimating the variance by dividing by \(n - 1\) instead of \(n\), which corrects the bias.

\[
s^2 = \frac{1}{n - 1} \sum_{i=1}^{n} (x_i - \bar{x})^2
\]

Use this when analyzing or modeling based on a sample, not the whole population.

In [7]:
print(f'Sample variance: {st.sample_variance(data)}')

Sample variance: 3.6111111111111107
