### IDs:
Insert yours IDs to the cell below

ID #1:

ID #2:


## Read the following instructions carefully:

1. This jupyter notebook contains all the step by step instructions needed for this exercise.
1. You are free to add cells.
1. Write your functions and your answers in this jupyter notebook only.
1. Answers to theoretical questions should be written in **markdown cells (with $\LaTeX$ support)**.
1. Submit this jupyter notebook only using your ID as a filename. Not to use ZIP or RAR. For example, your Moodle submission file name should look like this (two id numbers): `123456789_987654321.ipynb`.

### Question 1 - Defective products

In a manufacturing pipeline products are 3% defective. We are interested in examining a defective product to see what goes wrong on the belt. We need to ask the facility manager to send us a set of independent samples for examination.

#### 1.A

How many independent samples should we ask for in order to have a 85% probability of having at least one defective product in the batch sent? You should write a function.

In [26]:
import numpy as np
from scipy.stats import binom, nbinom


In [20]:
def math_calc(prob_of_defect=0.03, level_of_certainty=0.85):
    return int(np.ceil(np.log(1-level_of_certainty)/np.log(1-prob_of_defect)))
math_calc()

63

In [16]:
# we want to calc p(x >= 1) so we will calc 1 - p(x < 1) => 1 - p(x = 0)
def iterative_calc_binom(number_of_defect=1, prob_of_defect=0.03, level_of_certainty=0.85):
    sample_size = 1
    while (1 - binom.cdf(number_of_defect-1, sample_size, prob_of_defect)) < level_of_certainty:
        sample_size += 1
    return sample_size
iterative_calc_binom()

63

In [27]:
def iterative_calc_nbinom(number_of_defect=1, prob_of_defect=0.03, level_of_certainty=0.85):
    sample_size = 1
    while nbinom.cdf(sample_size, number_of_defect, prob_of_defect, number_of_defect) < level_of_certainty:
        sample_size += 1
    return sample_size
iterative_calc_nbinom()

63

#### 1.B
Answer this part again with the following changes: products are 4% defective and we want a 95% probability of at least one defective product in the batch.

In [22]:
print(math_calc(0.04, 0.95))
print(iterative_calc_binom(prob_of_defect=0.04, level_of_certainty=0.95))

74
74


#### 1.C 

Consider the following cases and calculate how many independent samples are required: 

1. Products are 10% defective and we want a 90% probability of at least 5 defective products in the batch.
1. Products are 30% defective and we want a 90% probability of at least 15 defective products in the batch.

Explain the difference between the two results. You should use mathematical reasoning based on the properties of distributions you saw in class and visualizations in your answer.

In [29]:
print(f"case1, using binom: {iterative_calc_binom(5,0.1,0.9)}, using nbinom: {iterative_calc_nbinom(5,0.1,0.9)}")
print(f"case2, using binom: {iterative_calc_binom(15,0.3,0.9)}, using nbinom: {iterative_calc_nbinom(15,0.3,0.9)}")


case1, using binom: 78, using nbinom: 78
case2, using binom: 64, using nbinom: 64


### Explaining the difference:

Define two RV's for the two use cases.

$Y_1$ for the first case, and $Y_2$ for the second case.

$Y_1 \sim \text{NBin}(5, 0.1)$<br />
$Y_2 \sim \text{NBin}(15, 0.3)$
<br /><br />

Lets calculate their Expectation, using NBin Expectation formula: $E[Y] = {r \over p}$:<br />
$E[Y_1] = {5 \over 0.1} = 50$<br />
$E[Y_2] = {15 \over 0.3} = 50$

Lets calculate their Variance, using NBin Variance formula: $Var[Y] = {r(1-p) \over p^2}$:<br />
$Var[Y_1] = {5(1-0.1) \over 0.1^2} = 450$<br />
$Var[Y_2] = {15(1-0.3) \over 0.3^2} = 116.66$

And so, even though they have similar expectation the different variance causes the calculated cdf to produce such a difference in the results


### Question 2 - Rent distributions in Randomistan

The state of Randomistan conducted a survey to study the distribution of rent paid in two neighboring towns, Stochastic Heights and Random Grove, to be denoted SH and RG.<br> 

Here are some findings of the survey:
* The population of SH and RG is 16,000 and 22,000 respectively. <br>
* The mean rent in SH and RG is 6300RCU and 4200RCU respectively.
* The median rent is 4600RCU in both towns.
* The IQR of the rent is smaller in SH than in RG.

All data generated in this question needs to be consistent with these findings.

c = np.random.normal(mean1, sd1, 3000)n // 2#### 2.A
Draw histograms that describe 2 different scenarii of possible distributions of rent in the two towns.Your histograms should:<br>
* Use bins of 100RCU each.
* Have at least 10 non zero bins.

#### 2.B
Draw a histogram of a third scenario with the same properties. <br>
In addition, in this scenario the rent in SH should have a higher variance than the rent in RG.

The survey also examined the per household income (PHI) in these two places.<br>

It found that:<br>
* The mean of PHI in SH is 12500 and in RG is 8500.
* The median is 12000 in SH and 8000 in RG.
* The covariance of the rent and the PHI was observed to be as in the formula below with $\alpha=97\%$ and $\alpha=89\%$ in SH and in RG respectively.<br><br>
$$Cov(rent, PHI) = \alpha * \sqrt{Var(rent)} * \sqrt{Var(PHI)}$$

#### 2.C
Produce rent and PHI data for the two cities, that is consistent with these findings. The covariances in your data can deviate by up to 1% from the numbers given $\alpha$.

#### 2.D
Produce two heatmaps that describe these two bivariate joint distributions. Make sure you carefully consider the selected binning resolution.

### Question 3 - Covariance and independence

#### a) What is the variance of the sum X +Y + Z of three random variables in terms of the variances of X, Y and Z and the covariances between each pair of random variables?


\begin{align*}
Var(X+Y+Z) & \stackrel{*}{=} Var(X+Y) + Var(Z) + 2Cov(X+Y,Z) = Var(X) + Var(Y) + 2Conv(X,Y) + Var(Z) + 2Cov(X+Y,Z) \Rightarrow \\
\newline
& \stackrel{**}{=} Var(X) + Var(Y) + Var(Z) + 2Cov(X,Y) + 2Cov(Y,Z) + 2Cov(X,Z)
\end{align*}

\begin{align*}
* &Var(X + Y) = Var(X) + Var(Y) + 2Cov(X,Y) \Rightarrow Var(X + Y + Z) = Var((X + Y) + Z) = Var(X + Y) + Var(Z) + 2Cov(X + Y,Z)
\newline
** &Cov(X + Y,Z) = E((X + Y),Z) - E(X + Y)E(Z) \stackrel{***}{=} E(XZ) + E(YZ) - E(X)E(Z) - E(Y)E(Z) = Cov(X,Z) + Cov(Y,Z)
\newline
*** &Cov(X,Y) = E(XY) - E(X)E(Y)
\end{align*}

#### b) What happens if X,Y,Z are pairwise independent?

If X,Y,Z are pairwise independent than:
$$Cov(X,Y)=Cov(X,Z)=Cov(Y,Z) = 0$$
Plugging this results into equation of part (a) we get:
$$Var(X+Y+Z) = Var(X) + Var(Y) + Var(Z)$$

#### c) If X,Y,Z are pairwise independent, are they necessarily collectively independent? Prove your answer.

No, let's take for example tossing 3 fair coins.
We define 3 events: A, B, C as follows:
𝐴 - be the event that the first two tosses returned the exact same value.
B - be the event that the last two tosses returned the exact same value.
C - be the event that the first and last tosses returned the exact same value.

To Proove that they are pair independent, we need to show that:
<br/>
$P(A|B) = P(A) $
<br/>
$P(A|C) = P(A)$
<br/>
$P(B|A) = P(B)$
<br/>
$P(B|C) = P(B)$
<br/>
$P(C|A) = P(C)$
<br/>
$P(C|B) = P(C)$
<br/>
<br/>
proof:
$P(A) = P(B) = P(C) = \frac{1}{2}$
<br/>
$P(A|B) = P(A|B) = P(A|B) = P(A|B) = P(A|B) = P(A|B) = \frac{1}{2}$
<br/>
**Q.E.D.**

Now we need to show that they are not mutually independent, in the same way, meaning:
<br/>
$P(A|BC) \neq P(A) $
<br/>
$P(B|AC) \neq P(B)$
<br/>
$P(C|AB) \neq P(C)$
<br/>

$P(A|BC) = P(B|AC) = P(C|AB) = 1$, but as we seen $P(A) = P(B) = P(C) = \frac{1}{2}$
<br/>
And so, not mutually independent.
<br/>
**Q.E.D.**

### Question 4 - Convolutions

#### 4.A
Write a program, `Q = NFoldConv(P , n)`, that takes as input:
* A distribution, P, of a random variable that takes finitely many integer values
* An integer n

and produces the distribution, Q, of the sum of n independent repeats of random variables, each of which has the distribution P.

#### 4.B
Compute the distribution of the sum of the results of rolling a fair octahedron 17 times.

<img src="https://upload.wikimedia.org/wikipedia/commons/2/27/Octahedron.jpg" width="200">


#### 4.C
Assume that the price of a stock changes in any given day according to (in NIS):

$$P=\begin{pmatrix}
-1 & 0 & 1 & 2 & 3 \\
0.3 & 0.15 & 0.15 & 0.15 & 0.25
\end{pmatrix}$$

1. What is the distribution of the change in stock after 2 consecutive days of (independent) changes? After 5 consecutive days? 

2. What is the probability that the stock has gained strictly more than 7NIS after 5 days? Has lost strictly more than 4NIS? Explain your answers.