# Probability

<a href="#Sample">Sample</a>

<a href="#Sample-Space">Sample Space</a>

<a href="#Event">Event</a>

<a href="#Probability-Measure">Probability Measure</a>

<a href="#Equally-Likely-Probability-Measure">Equally Likely Probability Measure</a>

<a href="#Urn-Problems">Urn Problems</a>

<a href="#Probability-of-full-house">Probability of full house</a>

<a href="#Fermat-and-Pascal:-Gambling,-Triangles,-and-the-Birth-of-Probability">Fermat and Pascal: Gambling, Triangles, and the Birth of Probability</a>

<a href="#Newton-Pepys-problem-(1693)">Newton-Pepys problem (1693)</a>

<a href="#Bertrand's-ballot-theorem-(1887)">Bertrand's ballot theorem (1887)</a>

Norvig [ipynb](http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability.ipynb)

In 1814, Pierre-Simon Laplace [wrote](https://en.wikipedia.org/wiki/Classical_definition_of_probability):

>*Probability ... is thus simply a fraction whose numerator is the number of favorable cases and whose denominator is the number of all the cases possible ... when nothing leads us to expect that any one of these cases should occur more than any other.*

![Laplace](https://upload.wikimedia.org/wikipedia/commons/thumb/3/30/AduC_197_Laplace_%28P.S.%2C_marquis_de%2C_1749-1827%29.JPG/180px-AduC_197_Laplace_%28P.S.%2C_marquis_de%2C_1749-1827%29.JPG)
<center><a href="https://en.wikipedia.org/wiki/Pierre-Simon_Laplace">Pierre-Simon Laplace</a><br>1814</center>

Norvig [ipynb](http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability.ipynb)

# Sample

Possible outcome $\omega$ of an experiment is a sample.

[<a href="#Probability">Back to top</a>]

# Sample Space

Collect all samples.
The set $\Omega$ of all samples is a sample space.

[<a href="#Probability">Back to top</a>]

# Event

Collect all samples of interest.
Technically this is a subset of $\Omega$.  
Any subset $A$ of $\Omega$ is an event.

[<a href="#Probability">Back to top</a>]

# Probability Measure

For each $\omega$ in $\Omega$
we attach a brick. 
Each brick may has different weights, but
the total weights of the bricks is 1.
This weight distribution over the sample space $\Omega$ is
a probability measure.
\begin{eqnarray}
P(\omega)&=&\mbox{Weight of the brick attached to $\omega$}\nonumber\\
P(A)&=&\sum_{\omega\in A}P(\omega)=\mbox{Weight of the bricks attached to $A$}\nonumber
\end{eqnarray}

### Properties of probability measure

A probability measure $P$ is in a nutshell a real-valued function defined on events $A$:
$$
A\ \ \stackrel{P}{\rightarrow}\ \ P(A)
$$
More precisely, a probability measure $P$  is a real-valued function defined on events $A$ which satisfies the following three
$$\begin{array}{llll}
(1)&&\displaystyle P(\Omega)=1,\ P(\emptyset)=0\nonumber\\
\\
(2)&&\displaystyle 0\le P(A)\le 1\nonumber\\
\\
(3)&&\displaystyle P\left(\cup_{i=1}^{\infty}A_i\right)=\sum_{i=1}^{\infty}P\left(A_i\right)\quad\mbox{for any {\color{red}disjoint events} $A_i$}\nonumber
\end{array}$$

$$\begin{array}{llll}
(4)&&\displaystyle P\left(\cup_{i=1}^{n}A_i\right)=\sum_{i=1}^{n}P\left(A_i\right)\quad\mbox{for any {\color{red}disjoint events}  $A_i$}\nonumber\\
\\
(5)&&\displaystyle P(A)\le P(B)\quad\mbox{for $A\subset B$}\nonumber\\
\\
(6)&&\displaystyle P(A)=1-P(A^c)\nonumber
\end{array}$$

### Inclusion-exclusion principle

##### Two events
\begin{eqnarray}
(7)\quad P(A\cup B)&\le& P(A)+P(B)\nonumber\\
(7)\quad P(A\cup B)&=&P(A)+P(B)-P(A\cap B)\nonumber
\end{eqnarray}

##### Three events
\begin{eqnarray}
(7)\ P(A\cup B\cup C)&\le& P(A)+P(B)+P(C)\nonumber\\
(7)\ P(A\cup B\cup C)&\ge& P(A)+P(B)+P(C)-P(AB)-P(BC)-P(CA)\nonumber\\
(7)\ P(A\cup B\cup C)&=&P(A)+\cdots-P(AB)-\cdots+P(ABC)\nonumber
\end{eqnarray}

##### Many events
\begin{eqnarray}
(7)\quad P(\cup_{i=1}^nA_i)&\le& \sum_{i=1}^nP(A_i)\nonumber\\
(7)\quad P(\cup_{i=1}^nA_i)&\ge& \sum_{i=1}^nP(A_i)-\sum_{1\le i<j\le n}P(A_iA_j)\nonumber\\
(7)\quad P(\cup_{i=1}^nA_i)&\le& \sum_{i=1}^nP(A_i)-\sum_{1\le i<j\le n}P(A_iA_j)+\sum_{1\le i<j<k\le n}P(A_iA_jA_k)\nonumber\\
&&\cdots\nonumber\\
(7)\quad P(\cup_{i=1}^nA_i)&=&\sum_{i=1}^nP(A_i)-\sum_{1\le i<j\le n}P(A_iA_j)+\cdots+(-1)^{n+1}P(A_1A_2\cdots A_n)\nonumber
\end{eqnarray}

##### Names
$$\begin{array}{ll}
\mbox{First inequalities}&\mbox{Boole's inequality}\\
\mbox{All inequalities}&\mbox{Bonferroni's inequality}\\
\mbox{Last equalities}&\mbox{Inclusion-exclusion principle}
\end{array}$$

[<a href="#Probability">Back to top</a>]

# Equally Likely Probability Measure

\begin{eqnarray}
P(\omega)&=&\frac{1}{|\Omega|}\nonumber\\
P(A)&=&\frac{|A|}{|\Omega|}\nonumber
\end{eqnarray}

### Example - Flip a fair coin three times

$$
P(HHH)=P(HHT)=\cdots=P(TTT)=\frac{1}{8}
$$

[<a href="#Probability">Back to top</a>]

# Urn Problems

Around 1700, Jacob Bernoulli wrote about removing colored balls from an urn in his landmark treatise *[Ars Conjectandi](https://en.wikipedia.org/wiki/Ars_Conjectandi)*, and ever since then, explanations of probability have relied on [urn problems](https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=probability%20ball%20urn). (You'd think the urns would be empty by now.) 

![Jacob Bernoulli](http://www2.stetson.edu/~efriedma/periodictable/jpg/Bernoulli-Jacob.jpg)
<center><a href="https://en.wikipedia.org/wiki/Jacob_Bernoulli">Jacob Bernoulli</a><br>1700</center>

For example, here is a three-part problem [adapted](http://mathforum.org/library/drmath/view/69151.html)  from mathforum.org:

> An urn contains 23 balls: 8 white, 6 blue, and 9 red.  We select six balls at random (each possible selection is equally likely). What is the probability of each of these possible outcomes:

A: all balls are red

B: 3 are blue, 2 are white, and 1 is red

C: exactly 4 balls are white

Norvig
[ipynb](http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability.ipynb)

$$\begin{array}{lll}
P(A)&=&\frac{{9\choose 6}}{{23\choose 6}}=0.000832119825255\\
P(B)&=&\frac{{8\choose 2}{6\choose 3}{9\choose 1}}{{23\choose 6}}=0.0499271895153\\
P(C)&=&\frac{{8\choose 4}{15\choose 2}}{{23\choose 6}}=0.0728104847098\\
\end{array}$$

[<a href="#Probability">Back to top</a>]

# Probability of full house

The number $|\Omega|$ of ways of choosing 5 cards simultaneously is 
$$\Omega={52 \choose 5}$$
To choose a particular full house
$$\begin{array}{ll}
\mbox{decide the rank of the three equal-rank cards}&\mbox{13 choices}\\
\mbox{pick the suits of the three equal-rank cards}&\mbox{${4\choose 3}$ choices}\\
\mbox{determine the rank of the two equal-rank cards}&\mbox{12 choices}\\
\mbox{choose the suits of the two equal-rank cards}&\mbox{${4\choose 2}$ choices}\\
\end{array}$$
So, the number $|A|$ of ways of choosing a full house is   
$$|A|=13\cdot {4\choose 3}\cdot 12 \cdot {4\choose 2}$$
and 
the probability $P(A)$ that  we have full house is
$$
P(A)=\frac{|A|}{|\Omega|}=\frac{13\cdot {4\choose 3}\cdot 12 \cdot {4\choose 2}}{{52 \choose 5}}
$$

[<a href="#Probability">Back to top</a>]

# Fermat and Pascal: Gambling, Triangles, and the Birth of Probability

<table>
<tr><td><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/9/98/Pierre_de_Fermat2.png/140px-Pierre_de_Fermat2.png"><center><a href="https://en.wikipedia.org/wiki/Pierre_de_Fermat">Pierre de Fermat</a><br>1654
<td><img src="https://www.umass.edu/wsp/images/pascal.jpg"><center><a href="https://en.wikipedia.org/wiki/Blaise_Pascal">Blaise Pascal]</a><br>1654
</table>

Consider a gambling game consisting of tossing a coin. Player H wins the game if 10 heads come up, and T wins if 10 tails come up. If the game is interrupted when H has 8 heads and T has 7 tails, how should the pot of money (which happens to be 100 Francs) be split?
In 1654, Blaise Pascal and Pierre de Fermat corresponded on this problem, with Fermat [writing](http://mathforum.org/isaac/problems/prob1.html):

>Dearest Blaise,

>As to the problem of how to divide the 100 Francs, I think I have found a solution that you will find to be fair. Seeing as I needed only two points to win the game, and you needed 3, I think we can establish that after four more tosses of the coin, the game would have been over. For, in those four tosses, if you did not get the necessary 3 points for your victory, this would imply that I had in fact gained the necessary 2 points for my victory. In a similar manner, if I had not achieved the necessary 2 points for my victory, this would imply that you had in fact achieved at least 3 points and had therefore won the game. Thus, I believe the following list of possible endings to the game is exhaustive. I have denoted 'heads' by an 'h', and tails by a 't.' I have starred the outcomes that indicate a win for myself.

    h h h h *       h h h t *       h h t h *       h h t t *
    h t h h *       h t h t *       h t t h *       h t t t
    t h h h *       t h h t *       t h t h *       t h t t
    t t h h *       t t h t         t t t h         t t t t

>I think you will agree that all of these outcomes are equally likely. Thus I believe that we should divide the stakes by the ration 11:5 in my favor, that is, I should receive (11/16)*100 = 68.75 Francs, while you should receive 31.25 Francs.

>I hope all is well in Paris,

>Your friend and colleague,

>Pierre

Pascal agreed with this solution, and [replied](http://mathforum.org/isaac/problems/prob2.html) with a generalization that made use of his previous invention, Pascal's Triangle. There's even [a book](https://smile.amazon.com/Unfinished-Game-Pascal-Fermat-Seventeenth-Century/dp/0465018963?sa-no-redirect=1) about it.

Norvig
[ipynb](http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability.ipynb)

[<a href="#Probability">Back to top</a>]

# Newton-Pepys problem (1693)

Which of the following three has the greatest chance of success?
\begin{eqnarray}
A&&\mbox{Six fair dice are tossed independently and at least one ``6" appears.}\nonumber\\
B&&\mbox{Twelve fair dice are tossed independently and at least two ``6"s appear.}\nonumber\\
C&&\mbox{Eighteen fair dice are tossed independently and at least three ``6"s appear.}\nonumber
\end{eqnarray}
Pepys initially thought $C$ had the highest probability, 
but Newton showed $A$ has.

##### $P(A)$
$$
|\Omega_A|=6^6,\quad |A^c|=5^6
\quad\Rightarrow\quad
P(A)=1-P(A^c)=1-\frac{5^6}{6^6}=0.6651
$$

##### $P(B)$
$$\begin{array}{ll}
\mbox{Number of twelve fair dice toss outcomes}&\displaystyle |\Omega_B|=6^{12}\nonumber\\
\mbox{Number of outcomes with no ``6''}&\displaystyle |B_0|=5^{12}\nonumber\\
\mbox{Number of outcomes with exactly one ``6''}&\displaystyle |B_1|={12\choose 1}\times 1\times 5^{11}\nonumber
\end{array}$$
$$
\Rightarrow\quad
P(B)=1-P(B_0)-P(B_1)=1-\frac{5^{12}}{6^{12}}-\frac{{12\choose 1}5^{11}}{6^{12}}
=0.6187
$$

##### $P(C)$
$$\begin{array}{ll}
\mbox{Number of eighteen fair dice toss outcomes}&\displaystyle |\Omega_C|=6^{18}\nonumber\\
\mbox{Number of outcomes with no ``6''}&\displaystyle |C_0|=5^{18}\nonumber\\
\mbox{Number of outcomes with exactly one ``6''}&\displaystyle |C_1|={18\choose 1}\times 1\times 5^{17}\nonumber\\
\mbox{Number of outcomes with exactly two ``6''}&\displaystyle |C_2|={18\choose 2}\times 1\times 1\times 5^{16}\nonumber
\end{array}$$
$$
\Rightarrow\quad
P(C)=1-P(C_0)-P(C_1)-P(C_2)
=
1-\frac{5^{18}}{6^{18}}-\frac{{18\choose 1}5^{17}}{6^{18}}
-\frac{{18\choose 2}5^{16}}{6^{18}}
=0.5973
$$

[<a href="#Probability">Back to top</a>]

# Bertrand's ballot theorem (1887)

During the election
$A$ wins against $B$,
where
$A$ receives $a$ votes and $B$ receives $b$ votes with $a > b$. 
The probability $P(A)$ that $A$ will be strictly ahead of $B$ throughout the count is
$$\frac{a-b}{a+b}$$

##### Count pattern as a path from $(0,0)$ to $(b,a)$

Starting from $(0,0)$,
whenever we have a new vote for $A$, we move one unit up ($U$). 
Whenever we have a new vote for $B$, we move one unit right ($R$). 
$$
AABABBABAAABAAA
\quad\stackrel{A\leftrightarrow U, B\leftrightarrow R}{\Longleftrightarrow}\quad
UURURRURUUURUUU
$$

##### Reflection principle
$$\begin{array}{ll}
\mbox{Number of count patterns}&\displaystyle |\Omega|={a+b \choose b}\nonumber\\
\mbox{Number of count patterns staring with $B$}&\displaystyle |B_1|={a+b-1 \choose b-1}\nonumber\\
\mbox{Number of count patterns staring with $A$ but}\nonumber\\
\quad\mbox{failing to be strictly ahead of $B$ all the time}&\displaystyle |B_2|=|B_1|={a+b-1 \choose b-1}\nonumber\\
\mbox{Number of count patterns staring with $A$ and}\\
\quad\mbox{being strictly ahead of $B$ all the time}&\displaystyle |A|=|\Omega|-|B_1|-|B_2|\nonumber
\end{array}$$
$$
\Rightarrow\quad
P(A)=1-P(B_1)-P(B_2)
=1-\frac{{a+b-1 \choose b-1}}{{a+b \choose b}}-\frac{{a+b-1 \choose b-1}}{{a+b \choose b}}
=\frac{a-b}{a+b}
$$

[<a href="#Probability">Back to top</a>]