# 1-D Statistical Tools

When we are presented with sets of raw data from some observation,
we need to be able to systematically quantify its various properties.

**This is the purpose of most 1-D descriptive statistics**

- Quantify the _"average"_ value present (**mean**)

$$ \hat{x} := \frac{1}{N} \sum{i=1}^{N} x_i $$
 
- The "center" of the data set (**median**)
- Values that appear more or less frequently (**mode**)
- Extremes of the data (**Min/Max**)
- Measures of "spread", how varied is it? (**Std. Deviation**)

$$ std(X) := \sqrt{\sigma^2_x} $$

- Variance is : $$\frac{1}{N-1} \sum{i=1}^{N} (x_i - \overline{x})^2$$

- Covarience is similar to the variance, but instead of the average square distance to the mean, its the average product of differences with their means: $$ cov(X,Y) := \frac{1}{N-1} \sum_{i=1}^{N} (x_i - \bar{x}) (y_i - \bar{y}) $$

_Note: then the covarience generalizes variance in the sense that_

$$ var(X) := cov(X,X) $$

- Correlation between two sample populations/events/processes is measure of their relationship. Typically given by a '**correlation coefficient**' that is in range (-1,1). A positive correlation means that when the value of the first process/observation is higher, so will the other one be; i.e (they increase together) Negative means they have an inverse relationship. To define this, we need to define the '**covariance**' between the samples.

$$ \sigma_{x,y} := \frac{cov(X,Y)}{\sigma_x \sigma_y} $$


## First, some discussion on Python Modules

We want to, in Python, be able to import not only nice build-in libraries, but code we wrote ourselves! Similar to, in 'c++', being able to '#include' other '.cpp' files.

We with Python 'modules' -- which for us will just mean single files with definetiions of cunction/untility variables inside.

### Idea:
- Write your python tools, funcitons, etc... in some file that ends in '.py', ex. 'stats.py'
- If the file is local to your current other Python file or Notebook, you can simple 'import' that file by name (without the .py), e.g. import stats
- This then loads all the constituent definitions into a scope named by the import, e.g. a function called 'mean' defined in 'stats.py' will be accessible via 'stats.mean'
- NB: if using a notebook or kernel-based environment, either have to unload and reload the module to refresh its contents or restart your kernal

In [1]:
# Every variable, function, etc... 
# in the top-level scope is now here! and located under the "stats" scope
# When you import, it essentially does 'python stats.py' and stores the definitions/variables
# IF YOU CHANGE THE .py, YOU MUST RERUN THIS TO RELOAD THE FILE
import stats 
import numpy as np

In [2]:
stats.mean([1,2,3,5,6,7])

4.0

In [3]:
stats.median([1,2])

1.5

In [5]:
stats.var([2,3,4,5,])

1.25