# Probability Models and Axioms

Many events cant be predicted with total certainty. The best we say is how **likely** there are to happen using the idea of probability. Probability can be thought as a mathematical framework for reasoning about uncertainty and developing approaches to inference problem.

## Probabilities Model

A probabilities model is a mathematical description of a random phenomenon. It is defined by its **sample space**, **events** within the sample space, and **probabilities** associated with each event.

The sample space ( $S$ or $\Omega$) for a probability model is the set of all possible outcomes. 

An event $A$ is a subset of the sample space $S$.

A probability is a numerical value assigned to a given event $A$. The probability of an event is written $P(A)$, and describes the long-run relative frequency of the event. Probability is assigned using a probability law, which is defined according to the experiment.

Every probabilistic model involves an underlying process, called the **experiment**. An experiment will produce exactly one and only one element of several possible outcomes.

## Sample Space

Sample space is a list (set) of all possible outcomes for an experiment. Sample is denoted by capital $S$ or $\Omega$. Example of sample space for experiment of flipping a coin onces is ${H, T}$ which was two possible outcomes, either flip will result in heads ($H$) or tails $(T)$.

### Rules for sample space

Not all sets can be termed as sample space. For a set to be consider as sample space it must obey the following rules.

#### Mutually Exclusive

Every outcome (element) in the sample space set must be mutually exclusive. Which means when are experiment is perform the outcome must belong to one and only one element of the set, no two outcomes of the set can occur at a same time.

#### Collectively Exhaustive

Sample set must contains every possible outcome of an experiment. There should not be any outcome of an experiment which does not belongs to sample space.

### Representation

There are multiple ways to represent sample space.

#### List of outcomes

One way if representing sample space is to list all possible outcomes for the experiment.

For example, suppose our experiment involves flipping a coin twice. Then our sample space will be as follows.

$$
S = \Omega = \{ HH, HT, TH, TT \}
$$

#### Tree based sequential description

Another way of representing sample space is to sequentially represent events in the sample space as branches of a tree and thus each branch represent a possible outcome of an experiment and the leaf of the tree represents the sample space for the experiment.

For example, suppose our experiment involves flipping a coin twice. Then our sample space will be as follows.

![Tree Sample Space](static/img/notes/mathematics/intro-probability/sample_space_tree.png)

### Discrete Sample Space

A sample space is said to be discrete sample space if the sample space ($\Omega$) is a finite set or countable infinity.

Example of discrete sample space are flipping a coin, rolling a dice etc.

### Continuous Sample Space

Sample space is called as continuous if the sample space ($\Omega$) contains uncountable infinite set of possible outcomes.

Consider the example below, let see our experiment in to see the where the dart lands in the area marked by the rectangle at $x=1$ and $y=1$. 

![Continuous Sample Space](static/img/notes/mathematics/intro-probability/continous_sample_space.png)

Here our sample space is all the infinite set of rational points in the square reason. So our sample space is defined as follows,

$$
S = \Omega = \{ (x, y) | 0 \leqslant x, y \leqslant 1 \}
$$

**Note:**

Continuous sample spaces are interesting because we have a sample space with infinite outcomes in it. In the above example, the probability of hitting a point let say $(\frac{1}{2}, \frac{1}{3})$ with infinite precession is **zero**. But the collective probability is $1$, because the dart will definitely land somewhere in the 1 by 1 area. So to overcome this when we calculate probability we calculate probability of **event**, which is collection of many individual point (subset of sample space) instead of calculating probability of individual point. Even in discrete case we work with events, to be consistent.

## Probability Axioms

Every probabilistic model must follow these three rules called the axioms of probability.

### Nonnegativity

This axiom states that probability of an event cannot be negative.

Let $A$ be any event,

$$
P(A) \geqslant 0
$$

### Normalization

Normalization axiom states that the probability of sample space is always $1$

$$
P(\Omega) = 1
$$

### Additivity

Additivity axioms states that the probability of two event occurring together is sum of there individual probability.

Let $A$ and $B$ be two events,

$$
\text{If} A \cap B \ne \phi \; \text{then} \; P(A \cup B) = P(A) + P(B)
$$


## Uniform Law

### Discrete Case

Let $X$ and $Y$ be two event of rolling a fair die. Then we can find out probabilities as follows.

The probability of $X = 1 \; \text{and} \; Y = 1$ or $X = 1 \; \text{and} \; Y = 2$ is ,

Le $A$ be the event where $X=1 \; \text{and} \; Y=1$ and $B$ be the event where $X=1 \; \text{and} \; Y=2$, then 

$$
\begin{align}
P(A \cup B) &= P(A) + P(B) \tag{Additivity} \\
&= \frac{1}{36} + \frac{1}{36} \tag{Each outcome is equally likely} \\
&= \frac{2}{36} \\
&= \frac{1}{18} \\
\end{align}
$$

The probability of $X = 1 \; \text{and} \; Y = 1$ or $X = 1 \; \text{and} \; Y = 2$ is $\frac{1}{18}$.

Another way of looking at the problem is listing down all the possible outcomes (Sample Space) and then counting the number of outcome we are interested in. 

So our sample space would be,

$$
S = \Omega = \{ (1,1), (1,2), (1,3), ....., (6,6) \}
$$

In that we are interested in two outcomes, one is $(1, 1)$ and $(1,2)$. So the probability will be number of interested event upon the total number of possible events which is $\frac{2}{36}$ or $\frac{1}{18}$. This is called the **Discrete uniform law**. 

It states that, If all outcomes are equally likely and let $A$ be some event. Then

$$
P(A) = \frac{\text{Number of element of A}}{\text{total number of element in sample space}}
$$


### Continuous Case

For the case of continuous sample space the uniform law is simply the area under the event. 

For example.

![Continuous Sample Space Example](static/img/notes/mathematics/intro-probability/continous_sample_space_example.png)

To calculate the probability of shaded reason where $X \leqslant \frac{1}{2}$ and $Y \leqslant \frac{1}{2}$. We calculate the area of the shaded reason, generally using integrals. But here since it forms a simple triangle we can use the area of triangle formula to calculate the area. So the probability of the event is,

$$
P(A) = \frac{1}{2} * \frac{1}{2} * \frac{1}{2} = \frac{1}{8}
$$


## Countable Additivity Axiom

We can extend the additivity axiom to countable infinite set. As follows,

Suppose we have countable infinite sample space $S$ and let $A_1, A_2, A_3, ....$ are disjoint events, then.

$$
P(A_1 \cup A_2 \cup A_3 \cup ... ) = P(A_1) + P(A_2) + P(A_3) + ...
$$

For example.

Let our experiment be flipping coin until we get a head. So our sample space becomes the number of flips (Countable Infinity).

$$
S = \Omega = \{1, 2, 3, ... \}
$$

Now the probability if each event is a simple geometric series.

$$
P(1) = \frac{1}{2}, P(2) = \frac{1}{4}, P(3) = \frac{1}{8}, ...
$$

Now lets say we want to find out probability of event where number of tosses are even. Thus

$$
\begin{align}
P({2, 4, ...}) &= P(2) + P(4) + ... \tag{Countable Additivity Axiom} \\
&= \frac{1}{2^2} + \frac{1}{2^4} + ... \\
&= \frac{1}{3} \tag{Sum of geometric progression}
\end{align}
$$

Thus there is $\frac{1}{3}$ or $33%$ probability there heads will turn up in even number of chances
