<table align="left" style="border-style: hidden" class="table"> <tr><td class="col-md-2"><img style="float" src="../icon.png" alt="Prob140 Logo" style="width: 120px;"/></td><td><div align="left"><h3 style="margin-top: 0;">Probability for Data Science</h3><h4 style="margin-top: 20px;">UC Berkeley, Spring 2023</h4><p>Ani Adhikari</p>CC BY-NC-SA 4.0</div></td></tr></table><!-- not in pdf -->

This content is protected and may not be shared, uploaded, or distributed.

In [None]:
# Run this cell to set up your notebook

# These lines make warnings go away
import warnings
warnings.filterwarnings('ignore')

import numpy as np
from scipy import stats
from datascience import *
from prob140 import *

# These lines do some fancy plotting magic
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

# Homework 11

### Instructions

Your homeworks will generally have two components: a written portion and a portion that also involves code.  Written work should be completed on paper, and coding questions should be done in the notebook. Start the work for the written portions of each section on a new page. You are welcome to $\LaTeX$ your answers to the written portions, but staff will not be able to assist you with $\LaTeX$ related issues. 

It is your responsibility to ensure that both components of the lab are submitted completely and properly to Gradescope. **Make sure to assign each page of your pdf to the correct question. Refer to the bottom of the notebook for submission instructions.**

Every answer should contain a calculation or reasoning. For example, a calculation such as $(1/3)(0.8) + (2/3)(0.7)$ or `sum([(1/3)*0.8, (2/3)*0.7])`is fine without further explanation or simplification. If we want you to simplify, we'll ask you to. But just ${5 \choose 2}$ by itself is not fine; write "we want any 2 out of the 5 frogs and they can appear in any order" or whatever reasoning you used. Reasoning can be brief and abbreviated, e.g. "product rule" or "not mutually exclusive."

### Preliminary: Line Plots

In Data 8 you used `Table.plot` to draw line plots, but in this class the line plots have typically been drawn using the plot function of matplotlib. Here is the syntax; you are going to need it in the exercises. It's easier than setting up tables first, especially when you want to overlay multiple plots.

The `pyplot` module of matplotlib has been imported for you as `plt`. This is its standard abbreviation.

Let `x` and `y` be two numerical arrays of the same length. Then `plt.plot(x, y)` produces a line plot with values of `x` on the horizontal axis and the corresponding values of `y` on the vertical. The plot "joins the dots" `(x.item(0), y.item(0))`, `(x.item(1), y.item(1))`, `(x.item(2), y.item(2))`, and so on.

The optional argument `lw=n` can be used to set a line width of n units. Please use `lw=2` in this homework.

The semicolon at the end of the last line of code in each cell stops matplotlib from blurting out text that we don't need here.

Run these cells to see some examples. Note the overlaid plots: they are straightforward to draw, and Python chooses a different color for each plot.

In [None]:
x = np.arange(0, 1.01, 0.01)
x_squared = x ** 2
x_cubed = x**3
plt.plot(x, x_squared, lw=2)
plt.plot(x, x_cubed, lw=2);

You can add titles, labels, and legends.

In [None]:
plt.plot(x, x_squared, lw=2, label = 'Square')
plt.plot(x, x_cubed, lw=2, label = 'Cube')
plt.legend()
plt.xlabel('x')
plt.title('Powers of $x$');

As another example, the code below was used in Homework 9 Exercise 3d to plot gamma densities.

In [None]:
t = np.arange(0, 25, 0.01)
r_1 = stats.gamma.pdf(t, 1, scale=1/0.25)
r_1_5 = stats.gamma.pdf(t, 1.5, scale=1/0.25)
r_2 = stats.gamma.pdf(t, 2, scale=1/0.25)
r_2_5 = stats.gamma.pdf(t, 2.5, scale=1/0.25)
r_3 = stats.gamma.pdf(t, 3, scale=1/0.25)
plt.plot(t, r_1, lw=2, label='gamma (1, 0.25)')
plt.plot(t, r_1_5, lw=2, label='gamma (1.5, 0.25)')
plt.plot(t, r_2, lw=2, label='gamma (2, 0.25)')
plt.plot(t, r_2_5, lw=2, label='gamma (2.5, 0.25)')
plt.plot(t, r_3, lw=2, label='gamma (3, 0.25)')
plt.xlabel('$t$')
plt.legend()
plt.title('Gamma Densities: Same Rate and Different Shape Parameters');

## 1. MLE of the Exponential Rate ##
For $n > 1$, let $X_1, X_2, \ldots , X_n$ be i.i.d. exponential $(\lambda )$ variables. 

**a)** Let $\hat{\lambda}_n$ be the maximum likelihood estimate (MLE) of the parameter $\lambda$. Find $\hat{\lambda}_n$ in terms of the sample mean $\bar{X}_n = \frac{1}{n} \sum_{i=1}^n X_i$. The subscript $n$ in $\bar{X}_n$ is there to remind us that we have the average of $n$ values. It doesn't refer to the $n$th sampled value $X_n$.

**b)** Use facts about sums and linear transformations to find the distribution of $\bar{X}_n$ with little or no calculation. Recognize it as one of the famous ones and provide its name and parameters. Use it to find $E(\hat{\lambda}_n)$.

**c)** Is $\hat{\lambda}_n$ an unbiased estimate of $\lambda$? If it is biased, does it overestimate on average, or does it underestimate? Is it asymptotically unbiased? That is, does $E(\hat{\lambda}_n)$ converge to $\lambda$ as $n \to \infty$?

\newpage

## 2. Discrete and Continuous Random Selections ##
Fix a positive integer $n$, and let $p$ be strictly between 0 and 1.

Suppose Person A picks a number uniformly in the interval $(0, n)$. Let $X$ be the number Person A picks.

Suppose that independently of Person A, Person B picks an integer $Y$ according to the binomial $(n, p)$ distribution, for example by using `stats.binom.rvs(n, p, size=1)` or by tossing a $p$-coin $n$ times and recording the number of heads.

Find $P(X < Y)$.

\newpage

## 3. MLE and MAP Estimates ##

The coin is tossed 10 times and the resulting sequence is HTTHHHTHTH. In the parts below, we refer to this information as "the data".

**a)** Under the assumption that the coin lands heads with a fixed unknown probability $p$, find the MLE of $p$ based on the data. 

**b)** Suppose now that the coin lands heads with a random probability $X$. Let the prior density of $X$ be uniform on the unit interval. Find the MAP estimate of the probability of heads, given the data.

**c)** Show that if $r > 1$ and $s > 1$ then the mode of the beta $(r, s)$  distribution is $(r-1)/(r+s-2)$. Remember to ignore multiplicative constants and take the log before maximizing.

**d)** Suppose instead that the prior density of $X$ is $f(x) = 4x^3$ if $0 < x < 1$ and $0$ otherwise. Find the MAP estimate of the probability of heads, given the data.

\newpage

## 4. Heads in Tosses of a Random Coin
Let $X$ be a random proportion with a prior distribution that is beta $(2, 3)$. Given $X=p$, let $I_1, I_2, I_3, \ldots$ be i.i.d. Bernoulli $(p)$.

**a)** Plot the prior density of $X$.

In [None]:
# Answer to a

x = np.arange(0, 1.01, 0.01)
...
plt.title('Beta $(2, 3)$ density');

**b)** Let $H_7 = \sum_{k=1}^7 I_k$. For each $h = 0, 1, \ldots, 7$, plot the posterior density of $X$ given $H_7 = h$. All $8$ plots should be on the same graph. 

Use as many lines of code as you need. You don't have to include labels and a legend, but the title should say which densities you are plotting.

In [None]:
# Answer to b

...

plt.title('... (...) Densities for $0 \leq h \leq 7');

**c)** What is the MAP estimate of the random probability of heads given $H_7 = 2$? Calculate the estimate using the appropriate formula and confirm that your answer agrees with the estimate visible in the appropriate graph above.

**d)** Find $P(I_8 = 1 \mid H_7 = 2)$. Your answer should be a decimal value.

**e)** Find $P(I_8 = 1, I_9 = 1, I_{10} = 1 \mid H_7 = 2)$. Your answer should be a decimal value. Is it equal to the cube of the answer to Part (d)? If not, which is bigger?

\newpage

## 5. Waiting for a Random Coin to Land Heads ##
Let $X$ be a random proportion. Given $X=p$, let $T$ be the number of tosses till a $p$-coin lands heads.

**a)** Let $P(X = 1/10) = 1/4$, $P(X = 1/7) = 1/4$, and $P(X = 1/3) = 1/2$. Find $E(T)$.

**b)** Find $E(T)$ if $X$ has the beta $(r, s)$ density for some $r > 1$. Simplify all integrals and Gamma functions in your answer.

**c)** Let $X$ have the beta $(r, s)$ density. For fixed $k > 0$, find the posterior density of $X$ given $T = k$. Identify it as one of the famous ones and state its name and parameters.

## Submission Instructions ##

Many assignments throughout the course will have a written portion and a code portion. Please follow the directions below to properly submit both portions.

### Written Portion ###
*  Scan all the pages into a PDF. You can use any scanner or a phone using applications such as CamScanner. Please **DO NOT** simply take pictures using your phone. 
* Please start a new page for each question. If you have already written multiple questions on the same page, you can crop the image in CamScanner or fold your page over (the old-fashioned way). This helps expedite grading.
* It is your responsibility to check that all the work on all the scanned pages is legible.
* If you used $\LaTeX$ to do the written portions, you do not need to do any scanning; you can just download the whole notebook as a PDF via LaTeX.

### Code Portion ###
* Save your notebook using `File > Save and Checkpoint`.
* Generate a PDF file using `File > Download As > PDF via LaTeX`. This might take a few seconds and will automatically download a PDF version of this notebook.
    * If you have issues, please post a follow-up on the general Homework 11 Ed thread.
    
### Submitting ###
* Combine the PDFs from the written and code portions into one PDF. [Here](https://smallpdf.com/merge-pdf) is a useful tool for doing so. 
* Submit the assignment to Homework 11 on Gradescope. 
* **Make sure to assign each page of your pdf to the correct question.**
* **It is your responsibility to verify that all of your work shows up in your final PDF submission.**

If you are having difficulties scanning, uploading, or submitting your work, please read the [Ed Thread](https://edstem.org/us/courses/35049/discussion/2398718) on this topic and post a follow-up on the general Homework 11 Ed thread.

## **We will not grade assignments which do not have pages selected for each question.** ##