# Lesson 1
## Probability theory
### Random events. Conditional probability. Bayes formula.  Independent trials.

A **random event** is one that may or may not happen under certain conditions.

Examples of a random event:
1. When two dice are rolled, the number 1 appears on one and the number 2 appears on the other.
2. A bank customer has not paid back a loan.
3. The temperature in Moscow over the last ten days has not exceeded 29 degrees Celsius.
4. A coin was flipped a hundred times, an eagle fell out on 55 occasions. 

An event can be called **reliable** if, as a result of the test, it is bound to happen.

An **unlikely** event will never happen.

Examples of a credible event:
1. A number not exceeding 6 is rolled on a dice.
2. A coin is flipped and either heads or tails fall.
3. The coin is flipped a hundred times and tails are not more than 100 times.

Examples of an impossible event:
1. Two dice are rolled once and the sum of the numbers is 15.
2. A coin was tossed a hundred times and tails came up 55 times and heads up 56 times.
3. Three dice were rolled once and the sum of the numbers was 2.

For a random event, there is a concept of **relative frequency**, which is the ratio of the number of events that occurred to the total number of trials:

$$W(A) = \frac{m}{n}$$

In the formula $W(A)$ is the relative frequency of event $A$;

$m$ is the number of events $A$ occurred;

$n$ is the total number of trials.

Let's look at examples of random events.

**Example 1**

Simulate a 60-fold roll of the dice with the function [random.randint](https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.random.randint.html) library **numpy**, which means $n$ = 60.

The event $A$ will be the number 3 and find its relative frequency.

In [None]:
import numpy as np

n = 60

b = np.random.randint(1, 7, size=n)
b

array([6, 4, 5, 2, 1, 1, 3, 4, 5, 3, 2, 5, 2, 4, 6, 1, 5, 3, 6, 3, 6, 1,
       1, 1, 4, 4, 5, 2, 5, 1, 6, 4, 2, 5, 3, 3, 2, 2, 3, 2, 3, 3, 2, 3,
       1, 2, 5, 6, 4, 2, 2, 1, 4, 3, 3, 2, 3, 4, 5, 3])

Calculate the power of the subset where the number 3 fell out as a result of the test, that is, the event $A$ was observed:

In [None]:
m = len(b[b==3])
m

14

Now we can calculate the relative frequency of the event $A$:

In [None]:
W = m / n
W

0.23333333333333334

**Example 2**

Let's look at a more complicated example. Let us simulate a situation where two dice are rolled.

We will find the frequency of a random event $B$ where the first dice rolled 1 and the second dice rolled 2. 

Let us perform 360 tests for this purpose. Immediately set the number $n$:

In [None]:
n = 360

In [None]:
c = np.random.randint(1, 7, size=n)
d = np.random.randint(1, 7, size=n)

In [None]:
c

array([1, 1, 6, 3, 3, 5, 3, 6, 2, 4, 1, 1, 5, 3, 5, 3, 4, 6, 5, 5, 6, 3,
       5, 4, 3, 6, 3, 4, 5, 4, 3, 1, 4, 5, 1, 6, 5, 6, 3, 4, 3, 3, 3, 4,
       5, 2, 4, 2, 1, 6, 6, 5, 6, 3, 5, 3, 4, 2, 6, 6, 1, 1, 5, 5, 6, 5,
       3, 5, 4, 6, 3, 1, 3, 6, 1, 1, 6, 5, 2, 1, 2, 4, 6, 6, 4, 1, 6, 6,
       5, 2, 1, 4, 3, 4, 1, 1, 3, 4, 4, 2, 4, 1, 6, 6, 5, 2, 6, 2, 3, 5,
       1, 3, 3, 6, 3, 2, 1, 6, 3, 5, 4, 2, 2, 4, 5, 6, 4, 5, 5, 6, 2, 5,
       3, 6, 1, 3, 6, 3, 4, 5, 2, 4, 2, 2, 2, 3, 1, 3, 4, 4, 1, 4, 1, 6,
       3, 6, 3, 6, 6, 4, 6, 3, 3, 5, 4, 2, 3, 2, 6, 6, 1, 5, 2, 3, 5, 1,
       5, 3, 3, 5, 5, 1, 3, 2, 6, 2, 2, 5, 3, 5, 2, 6, 2, 3, 5, 3, 5, 1,
       2, 6, 5, 6, 4, 2, 2, 2, 2, 3, 2, 6, 1, 1, 1, 5, 6, 5, 5, 5, 1, 6,
       3, 5, 1, 5, 2, 5, 3, 6, 1, 3, 5, 4, 2, 3, 1, 4, 2, 3, 6, 3, 6, 5,
       4, 6, 3, 1, 4, 6, 4, 3, 2, 1, 6, 2, 4, 4, 6, 1, 5, 2, 4, 6, 6, 6,
       6, 4, 4, 4, 3, 5, 2, 2, 1, 1, 4, 5, 1, 4, 5, 2, 6, 3, 2, 2, 5, 1,
       3, 6, 4, 4, 1, 3, 3, 1, 6, 3, 1, 5, 4, 1, 5,

In [None]:
d

array([2, 5, 5, 5, 6, 6, 5, 3, 4, 4, 5, 2, 3, 5, 4, 5, 6, 4, 5, 3, 3, 3,
       1, 6, 2, 4, 4, 2, 3, 3, 6, 5, 1, 1, 1, 4, 4, 5, 3, 5, 2, 6, 1, 3,
       5, 4, 1, 1, 2, 6, 5, 3, 3, 5, 5, 2, 2, 2, 6, 1, 1, 3, 1, 5, 2, 4,
       5, 2, 6, 4, 1, 2, 1, 4, 1, 4, 3, 4, 5, 2, 3, 6, 6, 1, 1, 2, 6, 6,
       4, 2, 6, 3, 5, 1, 4, 1, 5, 2, 5, 6, 3, 4, 3, 5, 5, 6, 5, 2, 1, 3,
       3, 5, 1, 5, 5, 3, 3, 3, 1, 4, 3, 5, 5, 5, 2, 1, 1, 4, 4, 3, 3, 6,
       1, 5, 3, 4, 1, 2, 6, 5, 5, 1, 2, 5, 3, 1, 4, 1, 2, 3, 6, 5, 1, 1,
       4, 6, 6, 2, 3, 2, 2, 6, 1, 5, 2, 2, 6, 5, 2, 3, 1, 6, 2, 3, 2, 4,
       2, 1, 3, 1, 5, 5, 1, 6, 2, 5, 4, 4, 4, 2, 5, 3, 1, 5, 3, 3, 2, 6,
       6, 1, 1, 1, 1, 6, 2, 5, 6, 1, 4, 3, 1, 1, 2, 5, 3, 5, 3, 5, 1, 3,
       3, 1, 1, 5, 2, 1, 4, 6, 3, 5, 6, 4, 5, 2, 5, 2, 4, 3, 4, 2, 4, 4,
       6, 2, 6, 4, 4, 3, 5, 3, 4, 4, 6, 1, 3, 3, 3, 5, 1, 5, 2, 2, 1, 2,
       6, 2, 5, 3, 2, 3, 2, 4, 5, 2, 5, 5, 4, 3, 4, 4, 2, 1, 2, 3, 1, 5,
       2, 2, 2, 2, 3, 5, 5, 1, 1, 2, 1, 1, 5, 2, 3,

Numbers at the same position in the $c$ and $d$ arrays will be considered as results obtained in one trial.

In the first trial the number $c[0]$ fell on the dice #1, and the number $d[0]$ fell on #2.

In [None]:
c[0]

1

In [None]:
d[0]

2

Count the number of times the first dice rolls a number 1 and the second rolls a number 2.

In [None]:
a = c[(c==1) & (d==2)]
m = len(a)
m

11

Calculate the relative frequency of event $B$:

In [None]:
W = m / n
W

0.030555555555555555

#### Statistical probability

If the number of trials $n$ is large enough, the value of the relative frequency $W$ will tend towards a particular number. It is called **statistical probability** and is denoted as $P(A)$:

$$P(A) = \frac{m}{n}$$

The statistical probability can be calculated on the basis of data from multiple trials.

#### The classic definition of probability

In [None]:
from google.colab import drive
drive.mount('/content/drive')

If all possible outcomes are known in advance and are equally likely to occur (for example, in a coin flip or dice roll), the **classical definition of probability** can be used:

The probability of an event is the ratio of the number of elementary outcomes favourable to the event to the number of all equally possible outcomes of the experience in which it is likely to occur.


The formula for this definition is the same as for statistical probability:

$$P(A) = \frac{m}{n}$$

**Example 3**

Calculate the probability that the number 3 will appear on the dice. It is known that the dice have six faces with numbers from 1 to 6 and each can fall with equal probability. Using the classical probability formula we get the result:

$$P(A) = \frac{m}{n} = \frac{1}{6}$$

**Example 4**

Calculate with what probability the dice will roll a 2 or a 4. These events are equally probable. As follows from the previous example, each of these probabilities is 1/6. And these events are incompatible, i.e. they cannot occur simultaneously. Therefore we can add up their probabilities:

$$P(A∨B) = \frac{1}{6} + \frac{1}{6} = \frac{2}{6}$$

($∨$ — or).

For more complex cases, the number of outcomes $k$ favourable to the event, or the number of all elementary trial outcomes ($n$), can be calculated using combinatorics formulas.

#### Combinatorial formulas

The number of **combinations** of $n$ elements with $k$ elements in each (order is not important in combinations):

$$C_n^k = \frac{n!}{k!(n - k)!}$$

Let's write a function to calculate the number of combinations:

In [None]:
from math import factorial

In [None]:
def combinations(n, k):
    return int(factorial(n) / (factorial(k) * factorial(n - k)))

In [None]:
combinations(5, 1)

5

Determine the number of **placements** of $n$ elements with $k$ elements in each. In placements the order is important, so there may be more placement options than combinations given $k$ and $n$.

$$A_n^k = \frac{n!}{(n - k)!}$$

In [None]:
def arrangements(n, k):
    return int(factorial(n) / factorial(n - k))

In [None]:
arrangements(5, 5)

120

Number of **permutations** of $n$ elements - in permutations the order is important, but the difference from placements is that all available $n$ elements apply:

$$P_n = n!$$

In [None]:
def permutations(n):
    return int(factorial(n))

Let's look at examples of how combinatorics formulas are applied.

**Example 5**

How many ways can 4 cards be chosen from a deck of 36 cards?

To answer this question, use the formula for calculating the number of combinations:

# $$C_{52}^4 = \frac{52!}{4!(52 - 4)!} = 270725$$

or

In [None]:
combinations(52, 4)

270725

**Example 6**

There are 20 shoppers in a shop. How many ways can they form a queue of 5 people?

In this example, the order in which the shoppers will queue is important, so apply the formula for finding the number of placements:

In [None]:
arrangements(20, 5)

1860480

**Example 7**

How many ways can 5 shoppers form a queue?

This example is similar to the previous one, but there is an important difference: you don't have to choose 5 shoppers out of 20. There are only 5 shoppers and they all have to be in the queue. Apply the formula to find the number of permutations:

In [None]:
permutations(5)

120

**Example 8**

From a deck of 36 cards, 5 cards are chosen at random. How many ways can these cards be chosen so that there are 2 to 3 aces among them?

To solve this problem, first consider the situation where two aces out of four are chosen.

The number of such combinations will be equal:

In [None]:
combinations(4, 2)

6

The remaining three cards are chosen from a total of 32 cards - aces are not considered, as they have already been chosen:

In [None]:
combinations(32, 3)

4960

And in this case there will be such a number of combinations when five cards are chosen from a deck of 36 cards, two of which are aces:

In [None]:
6 * 4960

29760

Now consider the situation of choosing three aces out of four. The number of combinations will be equal:

In [None]:
combinations(4, 3)

4

The remaining two cards are chosen from 32 cards. The number of combinations will be equal:

In [None]:
combinations(32, 2)

496

And then there will be that number of combinations when five cards are chosen from a deck of 36 cards, three of which are aces:

In [None]:
4 * 496

1984

It remains to add up the resulting numbers of combinations:

In [None]:
29760 + 1984

31744

The number of combinations obtained using combinatorics formulas can be added 
and multiply. Adding in the example above corresponds to logical OR and multiplying to logical AND.

**Conjoint and incompatible events**

Joint events can happen in one trial, but incompatible events cannot. For example, when a dice is rolled once, the numbers 3 and 4 cannot be rolled at once - they are incompatible events. If two dice are rolled, one can roll a 4, and the other can roll a 5, these are joint events. The probabilities of the incompatible events can be added up:

$$P(A + B) = P(A) + P(B)$$

In the case of joint events, the sum-of-event formula is different: the probability of occurring together is subtracted from the sum of the probabilities of the individual events.

$$P(A + B) = P(A) + P(B) - P(AB)$$

**Probability of dependent and independent events**

An *independent* event is when the occurrence of one event does not affect the occurrence of the other. The opposite is true of dependent events.

The probability of two independent events occurring simultaneously is calculated using the formula:

$$P(A*B) = P(A) * P(B)$$

The probability of two dependent events occurring:

$$P(A*B) = P(A) * P(B\:|\:A) = P(B) * P(A\:|\:B)$$

**Example 9**

Find the probability that when two dice are rolled, the first will roll an even number and the second a multiple of three. Let us first consider the probability of an even number on the first dice. There can be three such cases: these are the events when numbers 2, 4 and 6 fall out. The probability for each of them is 1/6.

Since these three events are incompatible, the probability of their sum is equal to the sum of their probabilities:

$$P(even) = \frac{1}{6} + \frac{1}{6} + \frac{1}{6} = \frac{3}{6} = \frac{1}{2}$$

There are two matching events for the second dice: the numbers 3 and 6. These are also incompatible events, and we can add up their probabilities:

$$P(div. 3) = \frac{1}{6} + \frac{1}{6} = \frac{2}{6} = \frac{1}{3}$$

Given that a number on one die does not affect the result on the other - i.e. these events are independent - let's apply probability multiplication and get the final answer:

$$P = \frac{1}{2} * \frac{1}{3} = \frac{1}{6}$$

**Full probability formula**

If event $A$ can only occur when events $B_1, B_2, \dots, B_n$ form a complete group of incompatible events*, then the probability of $A$ is calculated using the formula:

**Complete group of events** means that one of them is bound to occur in a single trial.

$$P(A) = P(B_1) \cdot P(A\:|\:B_1) + P(B_2) \cdot P(A\:|\:B_2) \; + ... + \; P(B_n) \cdot P(A\:|\:B_n)$$

The expression $P(B\;|\;A)$ means the probability of event $B$ occurring, assuming event $A$ has occurred.

**Example 10**

There are three identical baskets. The first has three red and five green balls, the second has only red balls, and the third has only green balls. One basket is chosen at random and a ball is drawn from it at random. 

What is the probability that this ball is green?

Solve this problem using the full probability formula. As event $A$ we will assume that the green ball is retrieved, and as event $B$ we will assume that a certain basket is chosen. The probability of choosing one of the three baskets will be 1/3.

The probability of getting the green ball if the first basket is chosen is 5/(3 + 5), i.e. 5/8. If the second basket is chosen, the probability of getting the green ball is 0. In the case of the third basket, the probability is 1, since all the balls in it are green.

Substitute these values into the total likelihood formula and get the final answer:

$$P(A) = \frac{1}{3} \cdot \frac{5}{8} + \frac{1}{3} \cdot 0 + \frac{1}{3} \cdot 1 = \frac{5}{24} + 0 + \frac{1}{3} = \frac{13}{24}$$

#### Bayes formula

To determine the probability of event $B$, assuming event $A$ has already occurred, use Bayes' formula:

$$P(B\:|\:A) = \frac{P(B) \cdot P(A\:|\:B)}{P(A)}$$

In this formula, the probability $P(B\;|\;A)$ is called a posteriori, that is, determined after the event $A$ has occurred. 

And the probability $P(B)$ is a priori, determined before the test.

**Example 11**

A biathlon competition. One of the three athletes shoots and hits the target. 

The probability of this event is 0.2 for the first athlete, 0.4 for the second and 0.7 for the third.

Problem: to find the probability that the shots are fired by the third athlete.

Before we knew the result of the shot, we had only an a priori probability that the shot was fired, and it is equal to 1/3 - if we take into account that all the athletes have equal chances to shoot.

Knowing the result of the shot, we can re-estimate the probabilities for each athlete - because their accuracy is different. These probabilities will be a posteriori, because they are obtained after we know that the shot was successful.

The event $A$ is the target hit, and the events $B_1, B_2$ and $B_3$ are the first, second or third shooter:

$$P(A\:|\:B_1) = 0.2$$

$$P(A\:|\:B_2) = 0.4$$

$$P(A\:|\:B_3) = 0.7$$

The probability that a third athlete fired - assuming the shot was successful - will be found using the formula:

$$P(B_3\:|\:A) = \frac{P(B_3) \cdot P(A\:|\:B_3)}{P(A)}$$

---



Все величины подставим из условия, а знаменатель дроби распишем по формуле полной вероятности:

$$P(B_3\:|\:A) = \frac{\frac{1}{3} \cdot 0.7}{\frac{1}{3} \cdot 0.2 + \frac{1}{3} \cdot 0.4 + \frac{1}{3} \cdot 0.7} = \frac{7}{13}$$