1. **Restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart)
2. **Run all cells** (in the menubar, select Cell$\rightarrow$Run All).
3. __Use the__ `Validate` __button in the Assignments tab before submitting__.

__Include comments, derivations, explanations, graphs, etc.__ 

You __work in groups__ (= 3 people). __Write the full name and S/U-number of all team members!__

GROUP NUMBER (brightspace): 43
* Student 1 Stian Grønlund, s1122151:
* Student 2 name, S/U-number:
* Student 3 name, S/U-number:

---

# Assignment 1 (Statistical Machine Learning 2024)
# **Deadline: 27 September 2024**

## Instructions
* Fill in any place that says `YOUR CODE HERE` or `YOUR ANSWER HERE` __including comments, derivations, explanations, graphs, etc.__ 
Elements and/or intermediate steps required to derive the answer have to be in the report. If an exercise requires coding, explain briefly what the code does (in comments). All figures should have titles (descriptions), axis labels, and legends.
* Please do __not add new cells__ to the notebook, try to write the answers only in the provided cells. Before you turn the assignment in, make sure everything runs as expected.
* __Use the variable names given in the exercises__, do not assign your own variable names. 
* __Only one team member needs to upload the solutions__. This can be done under the Assignments tab, where you fetched the assignments, and where you can also validate your submissions. Please do not change the filenames of the individual Jupyter notebooks.

For any problems or questions regarding the assignments, ask during the tutorial or send an email to charlotte.cambiervannooten@ru.nl and janneke.verbeek@ru.nl .

## Introduction
Assignment 1 consists of:
1. Polynomial curve fitting (50 points);
2. Gradient descent (25 points);
3. __Fruit boxes (25 points);__
4. Probability factorization (BONUS 10 points);

## Libraries

Please __avoid installing new packages__, unless really necessary.

In [16]:
import IPython
assert IPython.version_info[0] >= 3, "Your version of IPython is too old, please update it to at least version 3."

# Necessary imports (for solutions)
import math
import numpy as np
import matplotlib.pyplot as plt
from collections import namedtuple

# Set fixed random seed for reproducibility
np.random.seed(2022)

## Fruit boxes (weight 25)
Suppose we have two healthy but curiously mixed boxes of fruit, with one box containing 8 apples and 4 grapefruit and the other containing 15 apples and 3 grapefruit. One of the boxes is selected at random and a piece of fruit is picked (but not eaten) from the chosen box, with equal probability for each item in the box. The piece of fruit is returned and then once again from the *same* box a second piece is chosen at random. This is known as sampling with replacement. Model the chosen box with the random variable $B$, the first piece of fruit with the variable $F_1$, and the second piece with $F_2$.
### Exercise 3.1
What is the probability that the first piece of fruit is an apple given that the second piece of fruit was a grapefruit? How can the result of the second pick affect the probability of the first pick?

The second pick of the fruit depends on the pick of the first fruit because we pick the second fruit from the same box. Even though the probability of picking the fruit within the box does not change, but the fact that we picked a specific fruit either as first or as a second one informs us about which box was more likely to be picked from.
We can calculate this using Bayes' Theorem as:

$$
\begin{align}
    P(F_1 = A \mid F_2 = G) &= \frac{P(F_2 = G \mid F_1 = A) P(F_1 = A)}{P(F_2 = G)}
\end{align}
$$


### 1. Finding $\( P(F_2 = G \mid F_1 = A) \)$
We will find it by using the expression $$
\begin{align}
    P(F_2 = G \mid F_1 = A) &= P(F_2 = G \mid B = 1) \cdot P(B = 1|F_1=A)   + P(F_2 = G \mid  B=2) \cdot P(B=2|F1=A) \\
\end{align}
$$

First, we calculate $\( P(B = 1 \mid F_1 = A) \)$  using Bayes' theorem:

$$
\begin{align}
    P(B = 1 \mid F_1 = A) &= \frac{P(F_1 = A \mid B = 1) P(B = 1)}{P(F_1 = A)} = \frac{\left(\frac{8}{12}\right) \left(\frac{1}{2}\right)}{(\frac{1}{2} \cdot \frac{8}{12}) + (\frac{1}{2} \cdot \frac{15}{18}) } = \frac{\left(\frac{2}{3}\right) \left(\frac{1}{2}\right)}{\frac{3}{4}} = \frac{\frac{2}{6}}{\frac{3}{4}} = \frac{\frac{1}{3}}{\frac{3}{4}} = \frac{4}{9}
\end{align}
$$

Then $P(B = 2 \mid F_1 = A) = 1 - \frac{4}{9} = \frac{5}{9}$.
We can directly observe that $P(F_2 = G \mid B = 1) = \frac{4}{12} $ and $P(F_2 = G \mid  B=2)=\frac{3}{18}$
Then plugging everything together we get:

$$
\begin{align}
    P(F_2 = G \mid F_1 = A)&= \frac{4}{12}  \cdot \frac{4}{9} + \frac{3}{18} \cdot \frac{5}{9} = \frac{13}{54}
\end{align}
$$

### 2. Finding $\( P(F_1 = A) \)$

The probability of getting an apple from either box:

$$
\begin{align}
    P(F_1 = A) &= P(B = 1) \cdot P(F_1 = A \mid B=1) + P(B=2) \cdot P(F_1 = A \mid B=2) = \frac{1}{2} \cdot \frac{8}{12} + \frac{1}{2} \cdot \frac{15}{18} = \frac{3}{4}
\end{align}
$$

### 3. Finding $\( P(F_2 = G) \)$
Analogically to $P(F_1 = A)$ calculation we get:
$$
\begin{align}
    P(F_2 = G) &= P(B = 1) \cdot P(F_2 = G \mid B=1) + P(B=2) \cdot P(F_2 = G \mid B=2) = \frac{1}{2} \cdot \frac{4}{12} + \frac{1}{2} \cdot \frac{3}{18} = \frac{3}{12} = \frac{1}{4}
\end{align}
$$
### 3. Plugging everything together:

$$
\begin{align}
   P(F_1 = A \mid F_2 = G) &= \frac{ \frac{13}{54} \cdot \frac{3}{4}}{\frac{1}{4}} &= \frac{13}{18}}
\end{align}
$$


Please add the final result you got in the cell below! (Add it as a fraction, not an estimate. For example, write __1/3__, do not round to a number of decimals.)

In [15]:
"""
The variable p is probability of the first piece of fruit being
an apple given that the second piece of fruit was a grapefruit.
"""
p=13/18

In [12]:
"""
Hidden check for value of variable p.
"""

'\nHidden check for value of variable p.\n'

### Exercise 3.2
Imagine now that after we remove a piece of fruit, it is not returned to the box. This is known as sampling without replacement. In this situation, recompute the probability that the first piece of fruit is an apple given that the second piece of fruit was a grapefruit. Explain the difference.

We find the answer very similarly as in 3.1 except that where relevant we take into consideration different probability of the second draw.
First we start with Bayes' Theorem as:

$$
\begin{align}
    P(F_1 = A \mid F_2 = G) &= \frac{P(F_2 = G \mid F_1 = A) P(F_1 = A)}{P(F_2 = G)}
\end{align}
$$


### 1. Finding $\( P(F_2 = G \mid F_1 = A) \)$
We will find it by using the expression $$
\begin{align}
    P(F_2 = G \mid F_1 = A) &= P(F_2 = G \mid B = 1, F_1=A) \cdot P(B = 1|F_1=A)   + P(F_2 = G \mid  B=2, F_1=A) \cdot P(B=2|F1=A) \\
\end{align}
$$

From 3.1 we have $\( P(B = 1 \mid F_1 = A) \)=\frac{4}{9}$  and $P(B = 2 \mid F_1 = A) = \frac{5}{9}$ as it doesn't depend on second draw.

Next, because we assume that first draw was an apple, then for a second draw we get:
$$
\begin{align}
    P(F_2 = G \mid B = 1, F_1=A)&= \frac{4}{11}
\end{align}
$$
Analogically for box 2:

$$
\begin{align}
    P(F_2 = G \mid  B=2, F_1=A)&= \frac{3}{17}
\end{align}
$$

Then putting everything together:
$$
\begin{align}
    P(F_2 = G \mid F_1 = A)&= \frac{4}{11}  \cdot \frac{4}{9} + \frac{3}{17} \cdot \frac{5}{9} = \frac{437}{1683}
\end{align}
$$

### 2. Finding $\( P(F_1 = A) \)$

The probability of getting an apple from either box we already have from 3.1: $P(F_1 = A) = \frac{3}{4}$


### 3. Finding $\( P(F_2 = G) \)$
We need to get marginal probability by summing out first fruit pick and box. Then, for first box:
$$
\begin{align}
    P(F_2 = G \mid B = 1) &= P(F_2 = G| F1=A) + P(F_2 = G| F1=G) = \frac{4}{11} + \frac{3}{11} = \frac{7}{11}
\end{align}
$$
Analogically, for box 2:
$$
\begin{align}
    P(F_2 = G \mid B = 2) &= P(F_2 = G| F1=A) + P(F_2 = G| F1=G) = \frac{3}{17} + \frac{2}{17} = \frac{5}{17}
\end{align}
$$
Then adding everything together:
$$
\begin{align}
    P(F_2 = G) &= \frac{7}{11} + \frac{5}{17} = \frac{4}{11} + \frac{3}{11} = \frac{174}{187}
\end{align}

$$
### 4. Plugging everything together:

$$
\begin{align}
   P(F_1 = A \mid F_2 = G) &= \frac{ \frac{437}{1683} \cdot \frac{3}{4}}{\frac{174}{187}} &= \frac{437}{2088}}
\end{align}
$$



To compute $ P(F_2 = G)$, we need to consider all possible ways the second fruit can be a grapefruit, accounting for both boxes and the outcomes of the first draw.

We have:

$$
P(F_2 = G) = P(B = 1) \cdot P(F_2 = G \mid B = 1) + P(B = 2) \cdot P(F_2 = G \mid B = 2)
$$

For box 1:
$$
\begin{align*}
P(F_2 = G \mid B = 1) &= P(F_1 = A \mid B = 1) \cdot P(F_2 = G \mid B = 1, F_1 = A) + P(F_1 = G \mid B = 1) \cdot P(F_2 = G \mid B = 1, F_1 = G)
\end{align*}
$$

$$
\begin{align*}
 P(F_1 = A \mid B = 1) \cdot P(F_2 = G \mid B = 1, F_1 = A) = \frac{8}{12} \cdot \frac{4}{11} = \frac{8}{33}
\end{align*}
$$

$$
\begin{align*}
P(F_1 = G \mid B = 1) \cdot P(F_2 = G \mid B = 1, F_1 = G) = \frac{4}{12} \cdot \frac{3}{11} = \frac{1}{11}
\end{align*}
$$

Then pulling everything in:
$$
\begin{align*}
P(F_2 = G \mid B = 1) &= frac{8}{33} + \frac{1}{11} = \frac{1}{3}
\end{align*}


For box 2

Similarly:
$$
\begin{align*}
P(F_2 = G \mid B = 2) &= P(F_1 = A \mid B = 2) \cdot P(F_2 = G \mid B = 2, F_1 = A) + P(F_1 = G \mid B = 2) \cdot P(F_2 = G \mid B = 2, F_1 = G)
\end{align*}
$$

$$
\begin{align*}
 P(F_1 = A \mid B = 2) \cdot P(F_2 = G \mid B = 2, F_1 = A) = \frac{15}{18} \cdot \frac{3}{17} = \frac{15}{102}
\end{align*}
$$

$$
\begin{align*}
 P(F_1 = G \mid B = 2) \cdot P(F_2 = G \mid B = 2, F_1 = G) = \frac{3}{18} \cdot \frac{2}{17} = \frac{2}{102}
\end{align*}
$$

Then pulling everything in:
$$
\begin{align*}
P(F_2 = G \mid B = 2) &= \frac{15}{102} + \frac{2}{102} = \frac{17}{102}
\end{align*}
$$



**Now, compute \( P(F_2 = G) \):**

\[
\begin{align*}
P(F_2 = G) &= P(B = 1) \cdot P(F_2 = G \mid B = 1) + P(B = 2) \cdot P(F_2 = G \mid B = 2) \\
&= \left( \frac{1}{2} \cdot \frac{1}{3} \right) + \left( \frac{1}{2} \cdot \frac{17}{102} \right) \\
&= \frac{1}{6} + \frac{17}{204}
\end{align*}
\]

Simplify the fractions:

1. **Find a common denominator:**

   - The least common denominator (LCD) of 6 and 204 is 204.
   - Convert fractions:
     \[
     \frac{1}{6} = \frac{34}{204}, \quad \frac{17}{204} = \frac{17}{204}
     \]

2. **Add the fractions:**

   \[
   \begin{align*}
   P(F_2 = G) &= \frac{34}{204} + \frac{17}{204} \\
   &= \frac{51}{204}
   \end{align*}
   \]

3. **Simplify the result:**

   - Divide numerator and denominator by their greatest common divisor (GCD), which is 51:
     \[
     \frac{51 \div 51}{204 \div 51} = \frac{1}{4}
     \]

Therefore,

\[
P(F_2 = G) = \frac{1}{4}
\]

---

This means the probability that the second fruit drawn is a grapefruit is \( \frac{1}{4} \).


- \( B \) as the event of choosing a box. There are two possibilities: \( B_1 \) represents Box 1 (8 apples and 4 grapefruit) and \( B_2 \) represents Box 2 (15 apples and 3 grapefruit).
- \( F_1 \) represents the first piece of fruit chosen.
- \( F_2 \) represents the second piece of fruit chosen.

We are asked to find the probability of \( F_1 \) being an apple given that \( F_2 \) is a grapefruit:

$$
\begin{align}
    P(F_1 = A \mid F_2 = G)
\end{align}
$$

This can be expressed using Bayes' Theorem as:

$$
\begin{align}
    P(F_1 = A \mid F_2 = G) &= \frac{P(F_2 = G \mid F_1 = A) P(F_1 = A)}{P(F_2 = G)}
\end{align}
$$

### 1. Finding $\( P(F_2 = G \mid F_1 = A) \)$
We will find it by using the expression $$
\begin{align}
    P(F_2 = G \mid F_1 = A) &= P(F_2 = G \mid B = 1) \cdot P(B = 1|F_1=A)   + P(F_2 = G \mid  B=2) \cdot P(B=2|F1=A) \\
\end{align}
$$

First, we calculate $\( P(B = 1 \mid F_1 = A) \)$  using Bayes' theorem:

$$
\begin{align}
    P(B = 1 \mid F_1 = A) &= \frac{P(F_1 = A \mid B = 1) P(B = 1)}{P(F_1 = A)} = \frac{\left(\frac{8}{12}\right) \left(\frac{1}{2}\right)}{(\frac{1}{2} \cdot \frac{8}{12}) + (\frac{1}{2} \cdot \frac{15}{18}) } = \frac{\left(\frac{2}{3}\right) \left(\frac{1}{2}\right)}{\frac{3}{4}} = \frac{\frac{2}{6}}{\frac{3}{4}} = \frac{\frac{1}{3}}{\frac{3}{4}} = \frac{4}{9}
\end{align}
$$

Then $P(B = 2 \mid F_1 = A) = 1 - \frac{4}{9} = \frac{5}{9}$.

Next, we calculate $\( P(F_2 = G \mid B = 1) \)$:
If $\( F_1 \)$ is an apple the box will now have 7 apples and 4 grapefruit. The probability that the second fruit is a grapefruit is then $\( \frac{4}{11} \)$.

If $\( F_1 \)$ is a grapefruit the box will now have 8 apples and 3 grapefruit. The probability that the second fruit is a grapefruit is then $\( \frac{3}{17} \)$.

$$
\begin{align}
    P(F_2 = G \mid B = 1) &= P(F_2 = G \mid B = 1, F_1=G) + P(F_2 = G \mid B = 1, F_1=A) = \frac{4}{12} \cdot \frac{3}{11} + \frac{8}{13} \cdot \frac{4}{11} = \frac{11}{13} = \frac{1}{3}
\end{align}
$$

Next, calculate $\( P(F_2 = G \mid  B=2) \)$:
If $\( F_1 \)$ is an apple the box will now have 14 apples and 3 grapefruit. The probability that the second fruit is a grapefruit is then $\( \frac{3}{17} \)$.

If $\( F_1 \)$ is a grapefruit the box will now have 15 apples and 2 grapefruit. The probability that the second fruit is a grapefruit is then $\( \frac{2}{17} \)$.

$$
\begin{align}
    P(F_2 = G \mid B = 2) &= P(F_2 = G \mid B = 2, F_1=G) + P(F_2 = G \mid B = 2, F_1=A) = \frac{3}{18} \cdot \frac{2}{17} + \frac{15}{18} \cdot \frac{3}{17} = \frac{10}{102} = \frac{5}{51}
\end{align}
$$


Now, combining everything together:

$$
\begin{align}
    P(F_2 = G \mid F_1 = A) &= 
    &= \frac{4}{9} \cdot \frac{1}{3} + \frac{5}{9} \cdot \frac{5}{51} = \frac{93}{9*51}= \frac{31}{153}
\end{align}
$$

### 2. Finding \( P(F_1 = A) \)

The probability of getting an apple from either box:

$$
\begin{align}
    P(F_1 = A) &= P(B_1) \cdot P(F_1 = A \mid B_1) + P(B_2) \cdot P(F_1 = A \mid B_2) \\
    &= \frac{1}{2} \cdot \frac{8}{12} + \frac{1}{2} \cdot \frac{15}{18} \\
    &= \frac{1}{2} \cdot \frac{2}{3} + \frac{1}{2} \cdot \frac{5}{6} \\
    &= \frac{1}{3} + \frac{5}{12} \\
    &= \frac{4}{12} + \frac{5}{12} \\
    &= \frac{9}{12} \\
    &= \frac{3}{4}
\end{align}
$$

### 3. Finding \( P(F_2 = G) \)

This requires considering all possible scenarios:

$$
\begin{align}
    P(F_2 = G) &= P(F_2 = G \mid F_1 = A) + P(F_2 = G \mid F_1 = G)
\end{align}
$$
We already have $P(F_2 = G \mid F_1 = A)$ so we have to find $P(F_2 = G \mid F_1 = G)$ analogically.

So we need to find

$$
\begin{align}
    P(F_1 = G \mid F_2 = G) &= \frac{P(F_2 = G \mid F_1 = A) P(F_1 = A)}{P(F_2 = G)}
\end{align}
$$



Please add the final result you got in the cell below! (Add it as a fraction, not an estimate. For example, write __1/3__, do not round to a number of decimals.)

In [13]:
"""
The variable p is probability of the first piece of fruit being
an apple given that the second piece of fruit was a grapefruit
when the sampling was done without replacement.
"""
# YOUR CODE HERE
raise NotImplementedError()

NotImplementedError: 

In [14]:
"""
Hidden check for value of variable p.
"""

'\nHidden check for value of variable p.\n'

### Exercise 3.3
Starting from the initial situation (i.e., sampling with replacement), we add a dozen oranges to the first box and repeat the experiment. Show that now the outcome of the first pick has no impact on the probability that the second pick is a grapefruit. Are the two picks now dependent or independent? Explain your answer.

In this particular scenario the probability of the second pick of the fruit when it is a grapefruit does not inform us about the probability of the first fruit because the probabilities $P(F_2=G)$ are identical for both boxes. And because in previous scenario the fact that $F_2=G$ informed us about more likely box, we now lose that information. We can express it like this:
$$
\begin{align}
    P(F_2 = G|B=1) = \frac{1}{2} \cdot \frac{4}{24} = \frac{1}{6}, \\
    P(F_2 = G|B=2) = \frac{1}{2} \cdot \frac{3}{18} = \frac{1}{6}, \\
    P(F_2 = G|B=1) = P(F_2 = G|B=2) 
\end{align}
$$
However, we do not think that variables are independent because in general scenario if amount of fruits would change (as in initial condition) the two picks would be for sure dependable.