## Introduction.

The mathematical theory of probability gains practical value and an intuitive meaning in connection with real or conceptual experiments such as tossing a coin once, tossing a coin one hundred times, throwing three dice, arranging a deck of cards, matching two decks of cards, playing roulette, observing the life-span of a radio-active atom or a person, the number of busy trunklines in a telephone exchange, a random noise in an electrical communication system, routine quality control of a production process, frequency of accidents, the position of a particle under diffusion.

All these descriptions are rather vague, and in, order to render the theory meaningful, we have to agree on what we mean by possible results of the experiment or observation in question. 

---
**Definition.** (Event) The results of an experiment or observations are called *events*.

---

We speak of the event that (i) rolling a dice resulted in face value $6$, (ii) tossing a coin resulted in a heads, (iii) of five coins tossed more than three fell heads.

We shall distinguish between compound(decomposable) and simple(elementary) events. For example, saying that a throw with two dice resulted in "sum six" amounts to saying that it resulted in 

$$\{(1,5),(2,4),(3,3),(4,2),(5,1)\}$$

and this enumeration decomposes the event "sum six" into five simple events. Similarly, the event "two odd faces" admits of the decomposition 

$$\begin{align*}\{&(1,1),(1,3),(1,5),\\&(3,1),(3,3),(3,5)\\&(5,1),(5,3),(5,5)\}\end{align*}$$

into nine simple events. Note that if a throw results in $(3,3)$, then the same throw also results in the events "sum six" and "two odd faces"; these events are not mutually exclusive and hence may occur simultaneously.

## The Sample Space.

(a) Distribution of three balls in three cells.

$$\begin{bmatrix}
abc & - & - \\
- & abc & - \\
- & - & abc \\
ab & c & -\\
ac & b & - \\
bc & a & - \\
ab & - & c\\
ac & - & b \\
bc & - & a
\end{bmatrix}\quad\begin{bmatrix}
a & bc & - \\
b & ac & - \\
c & ab & - \\
a & - & bc\\
b & - & ca \\
c & - & ab \\
- & a & bc\\
- & b & ca \\
- & c & ab
\end{bmatrix}\quad\begin{bmatrix}
- & ab & c \\
- & ac & b \\
- & bc & a \\
a & b & c\\
a & c & b \\
b & a & c \\
b & c & a\\
c & a & b \\
c & b & a
\end{bmatrix}
$$

If we want to speak about experiements or observations in a theoretical way and without ambiguity, we must first agree on the simple events representing the thinkable outcomes; they define the idealized experiment. It is usual to refer to a simple event such as $\{a|b|c\}$ as **sample points** or **points** for short. 

---
**Definition.**(Sample point) Every indecomposable result of an experiment is a simple event, and is called a sample point.

---

When distributing $3$ balls to $3$ cells, each ball $i=1,2,3$ can be assigned to one of three cells, so there are $3 \times 3 \times 3 = 3^3$ sample points. All events connected with the experiment with this experiement can be described in terms of the sample points. For example, the event "each cell is singly occupied" can be described by collection of the following $6$ sample points:

$$\{(a,b,c),(a,c,b),(b,a,c),(b,c,a),(c,a,b),(c,b,a)$$

The experiment of distributing $10$ balls to $10$ has $10^{10}$ outcomes or sample points.

---
**Definition.** (Sample space) The sample $S$ of an experiment is the collection of all possible outcomes of the experiment, i.e. all the sample points.

---

The sample space of an experiment can be finite, countably infinite (similar to the set $\mathbf{N}$) or uncountable (similar to the set $\mathbf{R})$. When the sample space is finite, we can visualize it, enumerate it like in the example above of the distribution of $3$ balls in $3$ cells.

(b) Distribution of $r$ balls in $n$ cells. THe more general case of distributing $r$ balls to $n$ cells can be studied in the same manner. Each ball $i=1,2,\ldots,r$ can be assigned one of $1,2,3,...n$ cells. So, this experiment has $\underbrace{n \times n \times \ldots n}_{r\text{ terms}} = n^r$ outcomes or sample points.

To us, the sample space defines the idealized experiment. We use the *picturesque language* of balls and cells, but the same sample space admits a great variety of different practical interpretations. To clarify this point, we list here a number of situations in which the intuitive background varies; all are however, abstractly equivalent to the scheme of placing $r$ balls into $n$ cells, in the sense that the outcomes differ only in their verbal description.

(b,1) *Birthdays.* The possible configurations of the the birthdays of $r$ people correspond to the different arrangements of $r$ balls in $n=365$ cells. (assuming one calendar year to have $365$ days).

(b,2) *Accidents.* Classifying $r$ accidents according to the week days when they occurred is equivalent to placing $r$ balls into $n=7$ cells.

(b,3) In *firing* at $n$ targets, the hits correspond to the balls, and the targets to the cells.

(b,4) *Sampling*. Let a group of $r$ be classifed according to say age, or profession. The classes play the role of our cells, the people that of balls.

(b,5) *Irradiation in biology.* When the cells of the retina of the eyes are exposed to light, the light particles play the role of balls, and the actual cells are the "cells" of our model.

(b,6) In *cosmic ray experiments* the particles hitting the Geiger counters represent the balls, and the counters function as cells.

(b,7) An elevator starts with $r$ passengers and stops at $n$ floors. The different arrangements of discharging the passengers are replicas of the different distributions of $r$ balls in $n$ cells. 

(b,8) *Dice*. The possible outcomes of a throw of $r$ dice correspond to placing $r$ balls into $n=6$ cells. When tossing a coin, we in effect dealing with only $n=2$ cells.

(b,9) *Random digits.* The possible orderings of a sequence of $r$ digits correspond to the distribution of $r$ balls into ten cells called $0,1,2,\ldots,9$.

(b,10) The *sex distribution* of $r$ persons. Here we have $n=2$ cells and $r$ balls.

(b,11) *Theory of photographic emulsions.* A photographic plate is covered with grains sensitive to light quanta : a grain reacts if it is hit by a certain number, $r$ of the quanta. For the theory of black-white contrast, we must know how many cells are likely to be hit by $r$ quanta. We have here an occupancy problem where the grains correspond to the cells, and the light quant to the balls.

We can now make the following statement about sample space and events.

Every thinkable outcome of a random experiment can be described by a sample point. We define the term event to mean the same as any aggregate of sample points.

---
Example. (Coin Flips). A coin is flipped $10$ times. Writing Heads as $H$ and Tails as $T$, a possible outcome is $HHHTHHTTHT$, and the sample space is the set of all possible strings of length $10$ of $H$s and $T$s. We can (and will) encode $H$ as $1$ and $T$ as $0$, so that an outcome is a sequence $(s_1,\ldots,s_10)$ with $s_j \in \{0,1\}$ and the sample space $S$ is the set of all such sequences. Now, let's look at some events:

---

Let $A_1$ be the event that the first flip is Heads. As a set

$$A_1 = \{(1,s_2,\ldots,s_{10}) : s_j \in \{0,1\} \text{ for }2 \leq j \leq 10 \}$$

This is subset of the sample space, so it is indeed an event; saying that $A_1$ occurs is the same thing as saying that the first flip is Heads. Similarly, let $A_j$ be the event that the $j$th flip is Heads for $j=2,3,4,\ldots,10$.

Let $B$ be the event that at least one flip was Heads. As a set,

$$B = \bigcap_{j=1}^{10} A_j$$

Let $C$ be the event that all the flips were Heads. As a set, 

$$C = \bigcap_{j=1}^{10} A_j$$

Let $D$ be the event that there were at least two consecuetive heads. As a set,

$$D = \bigcap_{j=1}^{9}(A_j \cap A_{j+1})$$

---
Example. (Pick a card, any card). A standard deck of playing cards has $13$ ranks : $\{2,3,4,5,6,7,8,9,10,\text{Jack},\text{Queen}, \text{King}, \text{Ace}\}$ $\times$ $4$ suits : $\{\text{Heart}, \text{Spade}, \text{Club}, \text{Diamond}\}$. Pick a card from a standard deck of $52$ cards. The sample space $S$ is the set of all $52$ cards (so there are $52$ sample points) one for each card. Consider the following four events:

- $A$: card is an ace.
- $B$: card has a black suit.
- $D$: card is a diamond.
- $H$: card is a heart.
---

As a set, $H$ consists of $13$ cards: 

$\{\text{Ace of hearts},\text{Two of Hearts},\ldots,\text{King of hearts}\}$

We can create various other events in terms of $A,B,D,H$. For example, $A \cap H$ is the event that the card is the Ace of Hearts, $A \cap B$ is the event $\{\text{Ace of spades},\text{Ace of clubs}\}$ and $A \cup D \cup H$ is the event that the card is red or an ace. 

There are *many* other events that could be defined using this sample space. In fact, the counting methods introduced later in this chapter show that there are $2^{52} \approx 4.5 \times 10^{15}$  events in this problem, even though there are only $52$ sample points.


## Relation amongst events.

---
**Definition**. We use the notation $A = 0$ or $A = \emptyset$ to express that the event $A$ is a null event and contains no sample points (is impossible). The zero must be interpreted in a symbolic sense and not as a numeral. 

---

---
**Definition.** The event consisting of all points not contained in the event $A$ will be called the complementary event (or negation of $A$) and will be denoted by $A^C$. In particular, $S^C = \emptyset$.

---

With any two events $A$ and $B$ one can associate two new events defined by the conditions "both $A$ and $B$ occur" and "atleast one of $A$ and $B$" occurs. These events are denoted by $A \cap B$ and $A \cup B$. The event $A \cap B$ contains all the sample points which are common to $A$ and $B$. If $A$ and $B$ exclude each other, then there are no points common to $A$ and $B$ and the event $A \cap B$ is impossible, analytically this situation is described by $A \cap B = \emptyset$, which should be read as $A$ and $B$ are mutually exclusive.

Mathematically,

$$A \cup B = \{x : x \in A \text{ or } x \in B\}$$

and

$$A \cap B = \{x : x \in A \text{ and } x \in B\}$$

The event $A \cap B^C$, means that both $A$ and $B^C$ should occur, in other words, $A$ but not $B$ occurs. This is also known as set difference and denoted by the event $A - B$. 

Similarly, $A^C \cap B^C$ means that neither $A$ nor $B$ occurs. The event $A \cap B$ means that atleast one of the events $A$ and $B$ occurs; it contains all sample points except those that belong neither to $A$ nor to $B$. Thus,

$$A \cup B = (A^C \cap B^C)^C$$

### De-Morgan's Laws



---
Theorem. Let $A$ and $B$ be any two events. We claim:

$$\begin{align*}
(A \cup B)^C &= A^C \cap B^C\\
(A \cap B)^C &= A^C \cup B^C
\end{align*}
$$

---

*Proof.*

(I) $\Longrightarrow$ direction.

Let $x$ be a point in $(A \cup B)^C$. Then $x$ belongs to neither $A$, nor $B$. Consequently, $x \in (A^C \cap B^C)$. So, $(A \cup B)^C \subseteq A^C \cap B^C$. 

$\Longleftarrow$ direction.

Let $x$ be a point in $A^C \cap B^C$. By definition, $x$ belongs to both the sets $A^C$ and $B^C$. So, $x \notin A$ and $x \notin B$. Consequently, $x$ belongs to a set, that contains all points except those from either $A$ or $B$. Thus, $x \in (A \cup B)^C$.

(II) It is an easy exercise to prove this proposition.

## Discrete Sample Space.

---
**Definition**(Discrete Sample Space). A sample space is called discrete if it contains a countable number of points. A countable set is one whose elements have a one-to-one correspondence with the natural numbers. 

---

(a) Let us toss a coin as often as necessary to turn up one head. The points of the sample space can then be enumerated as:

$$\{H,TH,TTH,TTTH,TTTTH,\ldots\}$$

This is countable and hence discrete sample space.

(b) Let us pick a real number at randomly from the interval $[0,1]$. The points of this sample space cannot be enumerated. There is no way to list them all, and you cannot have a one-to-one correspondence between the points in $[0,1]$ and the natural numbers. You cannot count them. Hence, this sample space 

$$S = \{s : s\in [0,1]\}$$

is not discrete.

## Basic definitions and conventions.

---
**Fundamental Convention.** Given a discrete sample space $S$ with sample points $\{s_1,s_2\,ldots\}$ we shall assume that with each point $s_j$, there is associated  a number, called the probability of $s_j$ and denoted by $P(\{s_j\})$. This number is non-negative and less than unity. We have the following axioms:

(1) The probability of any event $A$ is the sum of the probabilities of all sample points in it. 

$$P(A) = \sum_{s \in A}P(\{s\})$$

(2) By convention, the probability of the entire sample space $S$ is taken to be unity.

$$P(S) = \sum_{s \in S} P(\{s\}) = 1$$

Intuitively, this holds because, when a random experiment is performed, no matter what the outcome, it always belongs to $S$. So, the event $S$ is a certain event. 

(3) The probability of an event $A$ lies between $0$ and $1$.

Since $A \subseteq S$, the sample points in $A$ are a subset of the sample points in $S$. Their total contribution $\sum_{s \in A}P(\{s\})$ must be non-negative and less than $\sum_{s \in S}P(\{s\}) = 1$. Hence,

$$0 \leq P(A) \leq 1$$

---

### Boole's Inequality.

Consider now two arbitrary events $A_1$ and $A_2$. To compute the probability $P\{A_1 \cup A_2 \}$ that atleast $A_1$ or $A_2$ occur, we have to add the sample points contained in either $A_1$ or in $A_2$, but each point is to be counted only once. We have the following inequality: 

$$P(A_1 \cup A_2) \leq P(A_1) + P(A_2)$$

Now, if $s$ is any point contained in both $A_1$ and in $A_2$, then $P(\{s\})$ occurs twice in the right hand member but once in the left hand member. Therefore, the right side exceeds the left side by a factor $P(A_1 \cap A_2)$.

### Inclusion-Exclusion Principle.

---
**Theorem.** For any two events $A_1$ and $A_2$, the probability that atleast one of $A_1$ or $A_2$ occur is given by,

$$P(A_1 \cup A_2) = P(A_1) + P(A_2) - P(A_1 \cap A_2)$$

If $P(A_1 \cap A_2) = 0$, that is if $A_1$ and $A_2$ are mutually exclusive then the above equation reduces to :

$$P(A_1 \cup A_2) = P(A_1) + P(A_2)$$

---

---
Example. A coin is tossed twice. For the sample space, we take the four points $HH$, $HT$, $TH$ and $TT$ and associate with each the probability $\frac{1}{4}$. Let $A_1$ and $A_2$ be respectively the events, "head at first" and "head at second" trial.

---
Then, $A_1$ consists of $\{HH,HT\}$ and $A_2$ consists of $\{HH,TH\}$. Furthermore, $\{A_1 \cup A_2\}$ consists of $\{HH,HT,TH\}$ and $\{A_1 \cap A_2\}$ consists of a single point $\{HH\}$. Thus, 

$$P(A_1 \cup A_2) = \frac{1}{2} + \frac{1}{2} - \frac{1}{4} = \frac{3}{4}$$


Historically, the earliest definition of probability of an event was to count the number of ways an event could happen and divide by the total number of outcomes for the experiement. We call this the naive definition of probability; and it relies on strong assumptions. Neverhthless, it is important to understand, and it is extremely useful when not misused.

---
**Definition.** (Naive Definition of Probability).  Let $A$ be an event for an experiment with a finite sample $S$. The  probability of the event $A$ is 

$$P(A) = \frac{\text{Number of sample points in A}}{\text{number of sample points in S}} = \frac{n(A)}{n(S)}$$

if and only if all sample points are equally likely (equiprobable).

---
