## Probability and Statistics Refresher

Today we'll review the essentials of probability and statistics.  Given the prerequisites for this course, I assume that you learned all of this once.  What I want to do today is bring the material back into your mind fresh.

## Probability

What is probability?  

Historically, the notion of probability has actually been very slippery and hard to pin down exactly.   There are probably at least three ways to view probability in practice:

1. as a mathematical exercise involving positive functions that integrate to 1 (i.e., distributions).
2. as an encoding of natural rules for reasoning under uncertainty. 
3. as an idealization of properties of data and processes.

For our purposes, we use probability as an abstraction that hides details we don't want to deal with.  This is a time-honored use of probability.

>Any simple idea is approximate; as an illustration, consider an object ... what is an object? Philosophers are always 
>saying, “Well, just take a chair for example.” The moment they say that, you know that they do not know what they are 
>talking about any more. What is a chair? ... every object is a mixture of a lot of things, so we can deal with it 
> only as a series of approximations and idealizations.

>The trick is the idealizations.

Richard Feynman, _The Feynman Lectures on Physics, 12-2_

Here is an illustration of this principle applied to probability:

>In a serious work ... an expression such as “this phenomenon is due to chance” constitutes simply, 
>an elliptic form of speech. ... It really means “everything occurs as if this phenomenon were due to chance,” 
>or, to be more precise: “To describe, or interpret or formalize this phenomenon, 
>only probabilistic models have so far given good results.”

Georges Matheron, _Estimating and Choosing: An Essay on Probability in Practice_

### Probability and Conditioning

__Definition.__  Consider a set $\Omega$, referred to as the
_sample space._  A _probability
  measure_ on $\Omega$ is a function $P[\cdot]$ defined on all the subsets of $\Omega$ (the
  _events_) such that:
1. $P[\Omega] = 1$
2. For any event $A \subset \Omega$, $P[A] \geq 0.$
3. For any events $A, B \subset \Omega$ where $A \cap B =
    \emptyset$, $P[A \cup B] = P[A] + P[B]$.



Often we want to ask how a probability measure changes if we restrict the sample space to be some subset of $\Omega$.  

This is called __conditioning.__

__Definition.__ The _conditional probability_ of an event $A$ given that
event $B$ (having positive probability) is known to occur, is 

$$ P[A|B] = \frac{P[A \cap B]}{P[B]}  \mbox{ where } P[B] > 0 $$

The function $P[\cdot|B]$ is a probability measure over the sample space
$B$.  

Note that in the expression $P[A|B]$, $A$ is random but $B$ is fixed. 

Now if $B$ is a proper subset of $\Omega,$ then $P[B] < 1$.   So $P[\cdot|B]$ is a rescaling of the quantity $P[A\cap B]$ so that $P[B|B] = 1.$ 

The sample space $\Omega$ may be continuous or discrete, and bounded or unbounded.

## Random Variables

We are usually interested in numeric values associated with events.  

When a random event has a numeric value we refer to it as a random variable.

Notationally, we use CAPITAL LETTERS for random variables and lowercase for non-random quantities.

To collect information about what values of the random variable are more probable than others, we have some more definitions.

__Definition.__ The cumulative distribution function (CDF) F for a random
 variable $X$ is equal to the probability measure for the event that
 consists of all possible outcomes with a value of the random variable $X$
 less than or equal to $x$, that is, $F(x) = P[X \leq x].$

__Example.__  Consider the roll of a single die.  The random variable here is the number of points showing.  What is the CDF?

Random variables: continuous and discrete.   PDF, CDF, hazard fn.  Long tails, short tails.

Variability.  Mean, Variance, Covariance.

Important Discrete RVs.   Geometric, Binomial, Poisson, Uniform.

Important Continuous RVs.  Exponential, Gaussian.

Look at CS 470 notes for best approach here.

Long tails, short tails.

Some examples of manipulating data to compute probabilities.

## Statistics

Estimating the mean.  Distinguishing between models and data.  Explain about estimating the mean when the mean doesn't exist, as a way of distinguishing between models and data.


Bayes' Rule.