# Probability

These notes contain excerpts from : 

* **The Data Science Design Manual**, by Steven Skiena, 2017, Springer;
* Python notebooks available at [http://data-manual.com/data](http://data-manual.com/data);
* Lectures slides available at [http://www3.cs.stonybrook.edu/~skiena/data-manual/lectures/](http://www3.cs.stonybrook.edu/~skiena/data-manual/lectures/).

In [None]:
from scipy import stats
import matplotlib.pyplot as plt
import numpy as np
from math import log
# %matplotlib notebook
%matplotlib qt5

*Probability theory provides a formal framework for reasoning about the likelihood of events.*

### Random experiments

**Experiment** : a procedure which yields one of a set of possible **outcomes**
 * **rolling a die** (for example, *Outcome* : 4)
 * **measuring the time to reach home** (for example, *Outcome* : 42 minutes)
 * **tomorrow's weather** ( for example, *Outcome* : Partly Cloudy)

**Sample Space** : the set of possible outcomes (usualy denoted $S$ or $\Omega$)

Consider the experiment of tossing two six-sided dice :

In [None]:
blue_die_faces = [1, 2, 3, 4, 5, 6]
red_die_faces = [1, 2, 3, 4, 5, 6]

sample_space = [(f1, f2) for f1 in blue_die_faces for f2 in red_die_faces]

print(sample_space)

**Event** : a subset of the sample space

In [None]:
def sum_equals_7_or_11(pair):
    sum = pair[0] + pair[1]
    return sum == 7 or sum == 11

E = list(filter(sum_equals_7_or_11, sample_space))
print(E)

The **probability** $p(s)$ of an outcome $s \in S$ satisfies:

* $0 \leqslant p(s) \leqslant 1$
* $\sum_{s \in S} p(s) = 1$

What is the probability of $p(s)$, with $s = (2, 5)$ ?

The **probability of an event** $E$ is defined as :

$$
P(E) = \sum_{s \in E} p(s).
$$

The complement of $E$ is denoted $\bar{E}$. What is the probability of $\bar{E}$ ?

#### The classical interpretation 

$$
\text{The probability of an event} = \frac{\text{Number of favourable outcomes}}{\text{Number of possible outcomes}}
$$

where all the possible **outcomes are equaly likely**.

For example, if we consider the experiment : "rolling a die". In this case the probability of getting an odd number id $3/6$, because each possible outcome is equally likely.

**Exercise** - Complete the following function definition :

In [None]:
from fractions import Fraction

def P(event, sample_space): 
    """The probability of an event,
    given a sample space of equiprobable outcomes.
    """
    return

p = P(event = {2, 4, 6}, sample_space = {1, 2, 3, 4, 5, 6})
print(p, float(p))

#### The frequency interpretation

$$
\text{The relative frequency of an event} = \frac{\text{Number of times the event has occured}}{\text{Number of observed cases}}
$$

Let $N$ denotes the number of times the random experiment is repeated and $N_E$ the number of times that event $E$ has occured.

$$
P(E) = \lim_{N\to\infty} \frac{N_E}{N}
$$

In [None]:
import numpy as np
sample_space = {1, 2, 3, 4, 5, 6}
# The die is rolled n times
n = 30
sample = np.random.choice(list(sample_space), n)
print('samples length : {}'.format(len(sample)))
print(sample)

**Exercise** - With the help of [scipy.stats.itemfreq](http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.itemfreq.html), compute the absolute and relative frequency of each outcome from the sample space.

In [None]:
from scipy import stats

# ...

**Exercise** - Same question with (pandas.Series.value_counts)[http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.value_counts.html)

In [None]:
# ...

## Compound Events and Independence

Events $A$ and $B$ are independent iff

$$
P(A \cap B) = P(A) \times P(B)
$$

Consider the following events for the experiment "rolling a die" :

* $E_1 = \{1, 3, 5\}$
* $E_2 = \{2, 4, 6\}$
* $E_3 = \{1, 2\}$ = "the value is less or equal than 2" 

Which pair of events is independent ?

This simplifies calculation, but not good for **prediction**. Why ?


## Conditional probability

The *conditional probability* of $A$ given $B$ is defined:

$$
P(A | B) = \frac{P(A \cap B)}{P(B)}
$$

also defined as follows:

$$
P(A \cap B) = P(A) \times P(B | A)
$$

Consider the following events for the experiment "rolling a die" :

* $E_1 = \{1, 3, 5\}$
* $E_2 = \{2, 4, 6\}$
* $E_3 = \{4, 5, 6\}$ = "the value is greater or equal than 4"

Questions :

* $P(E_1 | E_3) =$ ?
* $P(E_2 | E_3) =$ ?


The computation of the conditional probability is typically used in **classification problems**. For example in the case of *spam* filtering, $A$ would be that an email is a spam and $B$ the occurence of words within this email.

We will be able to compute conditional probabilities with the help of the *Bayes theorem* devised from the previous definitions :

$$
P(A | B) = \frac{P(A) \times P(B | A)}{P(B)}
$$

## Probability distributions

A **random variable** $V$ is a numerical function on the outcomes of a probability space. For example :

$$
V((a, b)) = a + b
$$

What is the value of $P(V(s) = 7)$ ?

The **expected value** of $V$ is defined :

$$
E(V) = \sum_{s \in S} p(s) \cdot V(s)
$$

Random variables are represented by their **probability density function** (pdf) :

In [None]:
results = [sum(outcome) for outcome in sample_space]

fix, ax = plt.subplots()

ax.hist(results, bins=[2,3,4,5,6,7,8,9,10,11,12,13],\
        align='left', normed='True', histtype='step', color='red', linewidth='1.2')
  
#The x and y axis and the title
ax.set_title("pdf")
ax.set_xticks(np.arange(2, 13, 1.0))
ax.set_xlim(1.55,12.45)
ax.set_xlabel("Total on dice")
ax.set_ylabel("Probability")

Random variables are also represented by their **cumulative density function** (cdf) :

In [None]:
results = [sum(outcome) for outcome in sample_space]

fix, ax = plt.subplots()

# in this case : cumulative=True
ax.hist(results, bins=[2,3,4,5,6,7,8,9,10,11,12,13],\
                 align='left', normed='True', cumulative=True, histtype='step', color='red', linewidth='1.2')
  
ax.set_title("cdf")
ax.set_xticks(np.arange(2, 13, 1.0))
ax.set_xlim(1.55,12.45)
ax.set_xlabel("Total on dice")
ax.set_ylabel("Probability")

The **cdf** is computed as follows :

$$
C(X \leqslant k) = \sum_{x \leqslant k} P(X = x)
$$

Pay attention to which distribution is considered. For example, in the case of the iPhone sales :

In [None]:
sales =[0.27,1.12,2.32,1.7,0.72,6.89,4.36,3.79,\
        5.21,7.37,8.74,8.75,8.4,14.1,16.24,\
        18.65,20.34,17.07,37.04,35.06,26.03,
        26.91,47.79,37.43,31.24]
x = range(25)
xlabels = ['2007 Q3','Q4','2008 Q1','Q2','Q3','Q4','2009 Q1','Q2','Q3','Q4',\
          '2010 Q1','Q2','Q3','Q4','2011 Q1','Q2','Q3','Q4','2012 Q1','Q2','Q3','Q4',\
          '2013 Q1','Q2','Q3']

fig, ax = plt.subplots()

# ax.plot(x,sales, 'bo-', label='iPhone Sales Per Quarter')
ax.plot(x,[sum(sales[:j]) for j in range(1,26)], 'ro-', label='iPhone Total Sales')

ax.set_xticks(x)
ax.set_xticklabels(xlabels, rotation='vertical')
ax.set_ylabel('iPhone Unit Sales in Millions')
ax.set_xlim(-1,25)
ax.set_ylim(-10,400)
ax.legend(loc='upper left')