This is a note while I read Chapter 6 of the book "Principles of Data Science".

This chapter will focus on some of the more advanced topics in probability theory, including the following topics:
- exhaustive events 
- bayes theorem 
- basic prediction rules 
- random variables 

# Collectively exhaustive events
When given a set of two or more events, if at least one of the events must occur, then such a set of events is said to be collectively exhaustive. 

Side note: There is a socalled MECE principle. Mutually Exclusive, Collectively Exhaustive. The principle is often used when writing an article to describe sth. 

Example: 
- Given a set of events ${temperature < 60, temperature > 90}$. It is not collectively exhaustive, because there is ${60 <= temperature <= 90}$. However, they are mutually exhaustive because both cannot happen at the same time. 
- In a dice roll, the set of events of rolling ${1, 2, 3, 4, 5, 6}$ are collectively exhaustive. 

# Bayesian ideas revisited
When speaking about Bayes, you are speaking about the following 3 things and how they all interact with each other: 
- A prior distribution 
- A posterior distribution 
- A likelihood

Basically, we are concerned with finding the posterior. That's the thing we want to know. 

Another way to phrase the Bayesian way of thinking is that data shapes and updates our belief. We have a prior probability, or what we naively think about a hypothesis, and then we have a posterior probability, which is what we think about a hypothesis, given some data. 

## Bayes theorem 
 
$$
P(A|B)=\frac{P(A)*P(B|A)}{P(B)}
$$

The theorem is concluded from the fact that $P(B)*P(A|B) = P(A)*P(B|A)$. 

You can think of Bayes theorem as follows: 
- It is a way to get from P(B|A) to P(A|B) (if you only have one) 
- It is a way to get P(A|B) if you already know P(A) (without knowing B)

Let's try thinking about Bayes using the terms hypothesis and data. Suppose H = your hypothesis about the given data, and D = the data that you are given. 

Bayes can be interpreted as trying to figure out P(H|D) ( The probability that our hypothesis is correct, given the data at hand). 

$$
P(H|D)=\frac{P(H)*P(D|H)}{P(D)}
$$

- P(H) is the probability of the hypothesis before we observe the data, called the prior probability or just prior. 
- P(H|D) is what we want to compute, the probability of the hypothesis after we oberve the data, called the posterior. 
- P(D|H) is the probability of the data under the given hypothesis, called the likelihood. 
- P(D) is the probability of the data under any hypothesis, called the normalizing contant. 



## An example 
Consider that you have two people in charge of writing blog posts for your company - Lucy and Avinash. From past performances, you have liked 80% of Lucy's work and only 50% of Avinash's work. A new blog post comes to your desk in the morning, but the author isn't mentioned. You love the article. What is the probability that it came from Avinash? Each blogger blogs at a very similar rate. 

- H=hypothesis=the blog came from Avinash 
- D=data=you loved the blog post 

- P(H|D) is what we want to compute. It means the probability that it comes from Avinash, given that I loved it. 
- P(H) is the probability that an article come from Avinash. 
- P(D) is the probability that I love an article. 
- P(D|H) is the probability that I loved it, given that it came from Avinash. 

P(D|H) is 0.5. 

P(H) is 0.5. 

$P(D) = P(H)P(D|H) + P(\overline{H})P(D|\overline{H}) = 0.5 * 0.5 + 0.5 * 0.8$ 

P(H|D) is 0.38 based on above. 

# Random variables 
A random variable uses real numerical values to describe a probabilisic event. 

In math and programming, we were used to the fact that a variable takes on a certain value. In a random variable, we are subject to randomness, which means that our variables' values are, well just that, variable. They might take on multiple values depending on the environment. 

We generally use single capital letters(mostly the specific letter X) to denote random variables. For example, we might have: 
- X = the outcome of a dice roll 
- Y = the revenue earned by a company this year 
- Z = the score of an applicant on an interview coding quiz(0-100%) 

Effectively, a random variable is a function that maps values from the sample space of an event(the set of all possible outcomes) to a probability value(between 0 and 1). Think about the event as being expressed as the following: 

$$
f(event) = probability 
$$

There are 2 main types of random variables: discrete and continuous. 

## Discrete random variables 
A discrete random variable only takes on a countable number of possible values. For example, the outcome of a dice roll, as shown here: 

X = the outcome of a single dice roll 

<table>
<tr><td>Value</td><td>X = 1</td><td>X = 2</td><td>X = 3</td><td>X = 4</td><td>X = 5</td><td>X = 6</td></tr>
<tr><td>Probability</td><td>1/6</td><td>1/6</td><td>1/6</td><td>1/6</td><td>1/6</td><td>1/6</td></tr>
</table>

We'll use a probability mass function(PMF) to describe a discrete random variable. 

P(X = x) = PMF. 

So, for a dice roll, P(X=1)=1/6, and P(X=5)=1/6. 

Random variables have many properties, two of which are their *expected value* and the *variance*. 

For a discrete random variable, we can also use a simple formula, shown as follows, to calculate the expected value: 

$$
Expected\ value = E[X] = \mu_x = \sum{x_i p_i}
$$

The formula for the variance of a discrete random variable is expressed as follows: 

$$
Variance = V[X] = \sigma_x^2=\sum{(x_i - \mu_x)^2 p_i}
$$

## types of discrete random variables 
### binomial random variables 
A binomial setting has the following 4 conditions:
- The possible outcomes are either success or failure 
- The outcomes of trials cannot affect the outcome of another trial 
- The number of trials was set(a fixed sample size) 
- The chance of success of each trial must always be p. 

A binomial random variable is a discrete random variable, X, that counts the number of successes in a binomial setting. The parameters are n = the number of trials and p = the chance of success of each trial. 

The PMF for a binomial random variable is as follows: 

$$
P(X=k) = \left\lgroup \matrix{n\\k} \right\rgroup p^k(1-p)^{n-k}
$$

Here, $\left\lgroup \matrix{n\\k} \right\rgroup = the\ binomial\ coefficient = \frac{n!}{(n-k)!k!}$

Shortcuts to binomial expected value and variance. 

$$
E(X) = np \\
V(X) = np(1-p)
$$

### Geometric random variable 
It is actually quite similar to the binomial random variable in that we are concerned with a setting in which a single event is occurring over and over. However, in the case of a geometric setting, the major difference is that we are not fixing the sample size. 

Specifically, a geometric setting has the following four conditions: 
- The possible outcomes are either success or failure 
- The outcomes of trials cannot affect the outcome of another trial 
- The number of trials was not set 
- The chance of success of each trial must always be p. 

Note that these are the exact same conditions as a binomial variable, except the 3rd condition. 

A geometric random variable is a discrete random variable, X, that counts the number of trials needed to obtain one success. The parameters are p = the chance of success of each trial, and (1-p) = the chance of failure of each trial. 

The formula for the PMF is as follows: 
$$
P(X=x) = (1-p)^{x-1}p
$$

Example - wheather 

There is a 34% chance that it will rain on any day in April. Find the probability that the first day of rain in April will occur on April 4. It means from April 1 to 3, there is no rain. And on April 4, there is rain. 

$$
P(4) = 0.66^3 * 0.34 = 0.63 
$$

Shortcuts to geometric expected value and variance. 

E(X)=1/p 

V(X)=(1-p)/p2 ( it is ambiguous, need to figure out in future whether the 2 belongs to numerator or denominator ) 

### Poisson random variable 
To understand why we would need this random variable, imagine that an event that we wish to model has a small probability of happening and that we wish to count the number of times that the event occurs in a cerain time frame. If we have an idea of the average number of occurrences, $\mu$, over a specific period of time, given from past instances, then the Poisson random variable, denoted by $X=Poi(\mu)$, counts the total number of occurrences of the event during that given time period. 

In other words, the Poisson distribution is a discrete probability distribution that counts the number of events that occur in a given interval of time. 

If we let $X = \text{the number of events in a given interval}$, and the average number of events per interval is the $\lambda$ number, then the probability of observing x events in a given interval is given by the following formula: 

$$
P(X=x)=\frac{e^{-\lambda}\lambda^x}{x!}
$$

Here, e = Euler's constant

Example - call center 

The number of calls arriving at your call center follows a Poisson distribution at the rate of 5 calls/hour. What is the probability that exactly 6 calls will come in between 10 and 11 p.m.? 

$$
P(X=6) = \frac{e^{-\lambda}\lambda^x}{x!} = \frac{e^{-5}5^6}{6!} = 0.146 
$$

Shortcuts to Poisson expected value and variance 

$$
E(X) = \lambda \\
V(X) = \lambda 
$$

## continuous random variables 
A continuous random variable can take on an infinite number of possible values, not just a few countable ones. We use PDF instead of PMF to describe the functions.

If X is a continuous random variable, then there is a function, f(x), such that for any constans a and b: 
$$
P(a \le X \le) = \int\limits_a^bf(x)dx
$$

The preceding f(x) function is known as the PDF(probability density function). 

The most important continuous distribution is the standard normal distribution. The PDF of this distribution is as follows: 

$$
f(x)=\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}
$$

Here, $\mu$ is the mean of the variable and $\sigma$ is the standard deviation. 