In [6]:
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

# The Very Minimum of Statistics and Probability - An Informal Introduction

## Random Variables

"Normal" variables are usually assigned a fixed value somewhere in an algorithm or a mathematical expression, for instance: `x = 1`. Every time when such a variable is used in calculations, this value never changes: `x + 1 = 1 + 1`, `5*x = 1`.

The process of a random variable taking on a concrete value is also called 'sampling'.

Some processes, however, are inherently stochastic, and we can never observe the same value all the time. If we toss a coin, some times the value will be heads, some times tails. It will never be heads all the time. We can view the coin itself as a random variable `C` that can take the values `{C=heads, C=tails}`. If we use it in some calculation, the final result will be also random, since `C` will some times be `heads` and some times `tails`: `x + C = 1 + heads or tails`. Although the final concrete value of `C` can not be predicted with absolute certainty, the random variable behaves according to certain rules. For a balanced coin, we expect that the values `C=heads` and `C=tails` to occur with about the same frequency, or 50% of the time. If we toss 100 times, we expect that there will be about 50 `heads` and 50 `tails` and say that the *probability* for event (the values that the random variable *can* take on are called 'events') `tails` to occur is 0.5 and the probability for event `heads` to occur is also 0.5. A probability of 0 means that an event never occurs and a probability of 1 means that an event always occurs (deterministic event). We have thus derived the `probability distribution` of the random variable C that defines its behavior. The fundamental axiom of probability:

In general, there are two types of random variables: discrete and continuous. Discrete random variables take on discrete events: a coin has two discrete values: `heads` and `tail`, a dice has 6 different faces and can take the values `{1,2,3,4,5,6}`. There can also be random variables with countably infinite events: `{0,1,2,3,...}`. In any case, we can always build a table (finite or infinitely long) where we assign a probability score to each event:

|ev|pr|
|=|=|
|x|z|

Most of the random variables that we are concerned with are discrete random variables.

In some cases, a stochastic process can produce continuous values, so that its events can be mapped to the real numbers. In this case we don't have discrete events, since no matter how small the 'gap' (interval) between two given events, we can also find another event in between them. We can not produce a tabular representation for such probability distributions, therefore the probbability density function is also continuous and is usually specified analytically. The most fundamental continuous probability distribution is the Gaussian distribution and is defined by the famous "Bell curve":

## Statistics of Probability Distributions

Imagine that you would like to measure the width of a table. Your measuring tape has a limited precision, probably several millimeters. Every time you measure the width, you will record a slightly different result in the best precision you are capable of. The whole process can be abstracted as sampling from the continuous random variable `table width` (there are some other details that we are not going to work out here) and recording a series of events - the measured width values. In many cases we don't want to know the exact shape and distribution of a random variable, but only a simplified representation called a *statistic*. In this case we would like to know the width of the table that we can estimate by averaging all the measured width values (sampled events). 












