1. **Restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart)
2. **Run all cells** (in the menubar, select Cell$\rightarrow$Run All).
3. __Use the__ `Validate` __button in the Assignments tab before submitting__.

__Include comments, derivations, explanations, graphs, etc.__ 

You __work in groups__ (= 3 people). __Write the full name and S/U-number of all team members!__

---

# Assignment 1 (Statistical Machine Learning 2024)
# **Deadline: 27 September 2024**

## Instructions
* Fill in any place that says `YOUR CODE HERE` or `YOUR ANSWER HERE` __including comments, derivations, explanations, graphs, etc.__ 
Elements and/or intermediate steps required to derive the answer have to be in the report. If an exercise requires coding, explain briefly what the code does (in comments). All figures should have titles (descriptions), axis labels, and legends.
* Please do __not add new cells__ to the notebook, try to write the answers only in the provided cells. Before you turn the assignment in, make sure everything runs as expected.
* __Use the variable names given in the exercises__, do not assign your own variable names. 
* __Only one team member needs to upload the solutions__. This can be done under the Assignments tab, where you fetched the assignments, and where you can also validate your submissions. Please do not change the filenames of the individual Jupyter notebooks.

For any problems or questions regarding the assignments, ask during the tutorial or send an email to charlotte.cambiervannooten@ru.nl and janneke.verbeek@ru.nl .

## Introduction
Assignment 1 consists of:
1. Polynomial curve fitting (50 points);
2. Gradient descent (25 points);
3. __Fruit boxes (25 points);__
4. Probability factorization (BONUS 10 points);

## Libraries

Please __avoid installing new packages__, unless really necessary.

In [None]:
import IPython
assert IPython.version_info[0] >= 3, "Your version of IPython is too old, please update it to at least version 3."

# Necessary imports (for solutions)
import math
import numpy as np
import matplotlib.pyplot as plt
from collections import namedtuple

# Set fixed random seed for reproducibility
np.random.seed(2022)

## Fruit boxes (weight 25)
Suppose we have two healthy but curiously mixed boxes of fruit, with one box containing 8 apples and 4 grapefruit and the other containing 15 apples and 3 grapefruit. One of the boxes is selected at random and a piece of fruit is picked (but not eaten) from the chosen box, with equal probability for each item in the box. The piece of fruit is returned and then once again from the *same* box a second piece is chosen at random. This is known as sampling with replacement. Model the chosen box with the random variable $B$, the first piece of fruit with the variable $F_1$, and the second piece with $F_2$.
### Exercise 3.1
What is the probability that the first piece of fruit is an apple given that the second piece of fruit was a grapefruit? How can the result of the second pick affect the probability of the first pick?

YOUR ANSWER HERE
$P(box1)=P(box2)=\frac{1}{2}$

box1 = apple:8, grapefruit:4

box2 = apple:15, grapefruit:3

#### Apple
$P(apple)$ = P(apple|box1)P(box1) + P(apple|box2)P(box2) = $\frac{8}{12}$ * $\frac{1}{2}$ + $\frac{15}{18}$ * $\frac{1}{2}$ = $\frac{8}{24}$ + $\frac{15}{36}$ = $\frac{1}{3}$ + $\frac{5}{12}$ = $\frac{4}{12}$ + $\frac{5}{12}$ = $\frac{9}{12}$ = $\frac{3}{4}$

$P(apple|box1)$ = $\frac{8}{12}$ = $\frac{4}{6}$

$P(apple|box2)$ = $\frac{15}{18}$ = $\frac{5}{6}$

$P(box1|apple)$ = $\frac{P(apple|box1)P(box1)}{P(apple)}$ = $\frac{\frac{8}{12} * \frac{1}{2}}{\frac{3}{4}}$ = $\frac{1}{3}$ * $\frac{4}{3}$ = $\frac{4}{9}$

$P(box2|apple)$ = 1 - P(box1|apple) = 1 - $\frac{4}{9}$ = $\frac{5}{9}$

#### Grapefruit
$P(gf)$ = P(gf|box1)P(box1) + P(gf|box2)P(box2) = $\frac{4}{12}$ * $\frac{1}{2}$ + $\frac{3}{18}$ * $\frac{1}{2}$ = $\frac{4}{24}$ + $\frac{3}{36}$ = $\frac{1}{6}$ + $\frac{1}{12}$ = $\frac{3}{12}$ = $$\frac{1}{4}$

$P(gf|box1)$ = $\frac{4}{12}$ = $\frac{1}{3}$

$P(gf|box2)$ = $\frac{3}{18}$ = $$\frac{1}{6}$

$P(box1|gf)$ = $\frac{P(gf|box1)P(box1)}{P(gf)}$ = $\frac{\frac{4}{12} * \frac{1}{2}}{\frac{1}{4}}$ = $\frac{\frac{1}{3} * \frac{1}{2}}{\frac{1}{4}}$ = $\frac{1}{6}$ * $\frac{4}{1}$ = $\frac{4}{6}$ = $\frac{2}{3}$

$P(box2|gf)$ = 1 - P(box1|gf) = 1 - $\frac{2}{3}$ = $\frac{1}{3}$

#### Main
$P(pick1=apple|pick2=grapefuit)$

= P(pick1=apple|box1) * P(box1|pick2=gf) + P(pick1=apple|box2) * P(box2|pick2=gf)

= P(apple|box1) * P(box1|gf) + P(apple|box2) * P(box2|gf)

= $\frac{4}{6}$ * $\frac{2}{3}$ + $\frac{5}{6}$ * $\frac{1}{3}$

= $\frac{8}{18}$ + $\frac{5}{18}$

= $\frac{13}{18}$

= $0.7222$

###### *P(pick1|box) = P(pick2|box) = P(fruit|box) because both picks were made at random from the same box

#### Alternative second pick
$P(pick1=apple|pick2=apple)$

= P(apple|box1) * P(box1|apple) + P(apple|box2) * P(box2|apple)

= $\frac{4}{6}$ * $\frac{4}{9}$ + $\frac{5}{6}$ * $\frac{5}{9}$

= $\frac{16}{54}$ + $\frac{25}{54}$

= $\frac{41}{54}$

= $0.7593$

The probability difference of pick1 given alternative pick2 choice is 3.71%. Which does not seem to be a big influence with the majority of the difference coming from the randomness choices and distribution of the system. 

Please add the final result you got in the cell below! (Add it as a fraction, not an estimate. For example, write __1/3__, do not round to a number of decimals.)

In [None]:
"""
The variable p is probability of the first piece of fruit being
an apple given that the second piece of fruit was a grapefruit.
"""
# YOUR CODE HERE
p = 13/18
raise NotImplementedError()

In [None]:
"""
Hidden check for value of variable p.
"""

### Exercise 3.2
Imagine now that after we remove a piece of fruit, it is not returned to the box. This is known as sampling without replacement. In this situation, recompute the probability that the first piece of fruit is an apple given that the second piece of fruit was a grapefruit. Explain the difference.

YOUR ANSWER HERE 
$P(box1)=P(box2)=\frac{1}{2}$

box1 = apple:8, grapefruit:4, total:12

box2 = apple:15, grapefruit:3, total:18


$P(pick1=apple|pick2=gf)$

= $\frac{P(pick2=gf|pick1=apple,box1)P(pick1=apple|box1)}{P(pick2=gf|box1)}$
+ $\frac{P(pick2=gf|pick1=apple,box2)P(pick1=apple|box2)}{P(pick2=gf|box2)}$

= $\frac{\frac{4}{11} * \frac{8}{12}}{\frac{4}{11} + \frac{3}{11}} + \frac{\frac{3}{17} * \frac{15}{18}}{\frac{3}{17} + \frac{2}{17}}$

= $\frac{\frac{32}{11 *12}}{\frac{7}{11}} + \frac{\frac{45}{17*18}}{\frac{5}{17}}$

= $\frac{32}{84} + \frac{9}{18}$

= $\frac{8}{21} + \frac{1}{2}$

= $\frac{16}{42} + \frac{21}{42}$

= $\frac{37}{42}$

= $0.8809$

Because the sampling is without replacement, the distribution of elements will change after each pick such that P(pick1|box1) != P(pick2|box1) for all elements}{pick values.

The probability is higher in this case because pick1 and pick2 have different values, such that pick1 increases the chance of pick2 happening because the pick1 is not replaced back into the distribution. 

Please add the final result you got in the cell below! (Add it as a fraction, not an estimate. For example, write __1/3__, do not round to a number of decimals.)

In [2]:
"""
The variable p is probability of the first piece of fruit being
an apple given that the second piece of fruit was a grapefruit
when the sampling was done without replacement.
"""
# YOUR CODE HERE
p = 37/42

In [3]:
"""
Hidden check for value of variable p.
"""

'\nHidden check for value of variable p.\n'

### Exercise 3.3
Starting from the initial situation (i.e., sampling with replacement), we add a dozen oranges to the first box and repeat the experiment. Show that now the outcome of the first pick has no impact on the probability that the second pick is a grapefruit. Are the two picks now dependent or independent? Explain your answer.

YOUR ANSWER HERE
$P(box1)=P(box2)=\frac{1}{2}$

box1 = apple:8, grapefruit:4, orange:12 = 24 total

box2 = apple:15, grapefruit:3 = 18 total

#### Apple
$P(apple|box1) = \frac{8}{24} = \frac{1}{3}$

$P(apple|box2) = \frac{15}{18} = \frac{5}{6}$

$P(apple) = P(apple|box1)P(box1) + P(apple|box2)P(box2) = \frac{1}{3} * \frac{1}{2} + \frac{5}{6} * \frac{1}{2} = \frac{1}{6} + \frac{5}{12} = \frac{7}{12}$

$P(box1|apple) = \frac{P(apple|box1)P(box1)}{P(apple)} = \frac{\frac{1}{3} * \frac{1}{2} }{ \frac{7}{12}} = \frac{1}{6} * \frac{12}{7} = \frac{2}{7}$

$P(box2|apple) = 1 - P(box1|apple) = 1 - \frac{2}{7} = \frac{5}{7}$

#### Grapefruit
$P(gf|box1) = \frac{4}{24} = \frac{1}{6}$

$P(gf|box2) = \frac{3}{18} = \frac{1}{6}$

$P(gf) = P(gf|box1)P(box1) + P(gf|box2)P(box2) = \frac{1}{6} * \frac{1}{2} + \frac{1}{6} * \frac{1}{2} = \frac{1}{12} + \frac{1}{12} = \frac{2}{12} = \frac{1}{6}$

$P(box1|gf) = \frac{P(gf|box1)P(box1)}{P(gf)} = \frac{\frac{1}{6} * \frac{1}{2} }{ \frac{1}{6}} = \frac{1}{12} * 6 = \frac{1}{2}$

$P(box2|gf) = 1 - P(box1|gf) = 1 - \frac{1}{2} = \frac{1}{2}$

#### Orange
$P(orange|box1) = \frac{12}{24} = \frac{1}{2}$

$P(orange|box2) = 0$

$P(orange) = P(orange|box1)P(box1) + P(orange|box2)P(box2) = \frac{1}{2} * \frac{1}{2} + 0 * \frac{1}{2} = \frac{1}{4}$

$P(box1|orange) = \frac{P(orange|box1)P(box1)}{P(orange)} = \frac{\frac{1}{2} * \frac{1}{2} }{ \frac{1}{4}} = \frac{1}{4} * 4 = 1$

$P(box2|orange) = 1 - P(box1|orange) = 0$

#### Main
P(pick2) is not influenced by pick1 because the sampling is done with replacement such that the distribution of pick does not change after a picking.

$P(pick2=gf|pick1=apple)
= P(gf|box1) * P(box1|apple) + P(gf|box2) * P(box2|apple)
= \frac{1}{6} * \frac{2}{7} + \frac{1}{6} * \frac{5}{7}
= \frac{2}{42} + \frac{5}{42}
= \frac{7}{42}
= \frac{1}{6}$

$P(pick2=gf|pick1=orange)
= P(gf|box1) * P(box1|orange) + P(gf|box2) * P(box2|orange)
= \frac{1}{6} * 1 + \frac{1}{6} * 0
= \frac{1}{6}$

$P(pick2=gf|pick1=gf)
= P(gf|box1) * P(box1|gf) + P(gf|box2) * P(box2|gf)
= \frac{1}{6} * \frac{1}{2} + \frac{1}{6} * \frac{1}{2}
= \frac{1}{12} + \frac{1}{12}
= \frac{1}{6}$

###### *P(pick1|box) = P(pick2|box) = P(fruit|box) because both picks were made at random from the same box.
