<table align="left" style="border-style: hidden" class="table"> <tr><td class="col-md-2"><img style="float" src="../logo.png" alt="Prob140 Logo" style="width: 120px;"/></td><td><div align="left"><h3 style="margin-top: 0;">Probability for Data Science</h3><h4 style="margin-top: 20px;">UC Berkeley, Fall 2023</h4><p>Ani Adhikari and Alexander Strang</p>CC BY-NC-SA 4.0</div></td></tr></table><!-- not in pdf -->

This content is protected and may not be shared, uploaded, or distributed.

In [None]:
# Run this cell to set up your notebook

# These lines make warnings go away
import warnings
warnings.filterwarnings('ignore')

import numpy as np
from scipy import stats
from datascience import *
from prob140 import *

# These lines do some fancy plotting magic
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

# Homework 11

### Instructions

Your homeworks will generally have two components: a written portion and a portion that also involves code.  Written work should be completed on paper, and coding questions should be done in the notebook. Start the work for the written portions of each section on a new page. You are welcome to $\LaTeX$ your answers to the written portions, but staff will not be able to assist you with $\LaTeX$ related issues. 

It is your responsibility to ensure that both components of the lab are submitted completely and properly to Gradescope. **Make sure to assign each page of your pdf to the correct question. Refer to the bottom of the notebook for submission instructions.**

Every answer should contain a calculation or reasoning. For example, a calculation such as $(1/3)(0.8) + (2/3)(0.7)$ or `sum([(1/3)*0.8, (2/3)*0.7])`is fine without further explanation or simplification. If we want you to simplify, we'll ask you to. But just ${5 \choose 2}$ by itself is not fine; write "we want any 2 out of the 5 frogs and they can appear in any order" or whatever reasoning you used. Reasoning can be brief and abbreviated, e.g. "product rule" or "not mutually exclusive."

## 1. Poisson MGF ##
Let $X$ have Poisson($\mu$) distribution, and let $Y$ independent of $X$ have Poisson $(\lambda)$ distribution.

**a)** Find the mgf of $X$.

**b)** Use the result of (a) to show that the distribution of $X+Y$ is Poisson.

\newpage

## 2. Gamma Tail Bound ##

Before you do this exercise, carefully study a [relevant example](http://prob140.org/textbook/content/Chapter_19/04_Chernoff_Bound.html#application-to-the-normal-distribution) in the textbook. You will have to follow similar steps.

You will need the [mgf of the gamma distribution](http://prob140.org/textbook/content/Chapter_19/02_Moment_Generating_Functions.html#mgf-of-a-gamma-r-lambda-random-variable). Also remember that you found the gamma mean and variance in Homework 9.

Let $X$ have the gamma $(r, \lambda)$ distribution. 

**a)** Show that $P(X \ge 2E(X)) \le \left(\frac{2}{e}\right)^r$.

**b)** Find Markov's and Chebyshev's bounds on $P(X \ge 2E(X))$. 

**c) [CODE]** Fix $\lambda = 1$. Display overlaid plots of the following four graphs as functions of $r$, for $r$ in the interval $(0.5, 15)$ :

- The exact tail probability $P(X \ge 2E(X))$
- The bound in Part **a**: $\left(\frac{2}{e}\right)^r$
- Chebyshev's bound on $P(X \ge 2E(X))$
- Markov's bound on $P(X \ge 2E(X))$

The code uses `plt.plot` which you have used before. The expression `stats.gamma.cdf(x, r, scale=1)` evaluates to the cdf of the gamma $(r, 1)$ distribution at the point $x$.

In [None]:
# Answer to c
r = np.arange(0.05, 15, 0.1) 

markov_bound = ...

chebyshev_bound = ...

part_a_bound = ...

# Use as many lines as you need for the exact values
exact = ...
...

plt.plot(r, exact, lw=2, label='Exact Chance')
plt.plot(r, part_a_bound, lw=2, label='Part (a) Bound')
plt.plot(r, chebyshev_bound, lw=2, label='Chebyshev Bound')
plt.plot(r, markov_bound, lw=2, label='Markov Bound')
plt.legend()
plt.xlabel('$r$')
plt.ylim(0, 1)
plt.xlim(0, 15)
plt.title('$P(X \geq 2E(X))$ for $X$ gamma $(r, 1)$');

\newpage

## 3. MLE of the Exponential Rate ##
For $n > 1$, let $X_1, X_2, \ldots , X_n$ be i.i.d. exponential $(\lambda )$ variables. 

**a)** Let $\hat{\lambda}_n$ be the maximum likelihood estimate (MLE) of the parameter $\lambda$. Find $\hat{\lambda}_n$ in terms of the sample mean $\bar{X}_n = \frac{1}{n} \sum_{i=1}^n X_i$. The subscript $n$ in $\bar{X}_n$ is there to remind us that we have the average of $n$ values. It doesn't refer to the $n$th sampled value $X_n$.

**b)** Use facts about sums and linear transformations to find the distribution of $\bar{X}_n$ with little or no calculation. Recognize it as one of the famous ones and provide its name and parameters. Use it to find $E(\hat{\lambda}_n)$.

**c)** Is $\hat{\lambda}_n$ an unbiased estimate of $\lambda$? If it is biased, does it overestimate on average, or does it underestimate? Is it asymptotically unbiased? That is, does $E(\hat{\lambda}_n)$ converge to $\lambda$ as $n \to \infty$?

\newpage

## 4. Discrete and Continuous Random Selections ##
Fix a positive integer $n$, and let $p$ be strictly between 0 and 1.

Suppose Person A picks a number uniformly in the interval $(0, n)$. Let $X$ be the number Person A picks.

Suppose that independently of Person A, Person B picks an integer $Y$ according to the binomial $(n, p)$ distribution, for example by using `stats.binom.rvs(n, p, size=1)` or by tossing a $p$-coin $n$ times and recording the number of heads.

Find $P(X < Y)$.

\newpage

## 5. MLE and MAP Estimates ##

The coin is tossed 10 times and the resulting sequence is HTTHHHTHTH. In the parts below, we refer to this information as "the data".

**a)** Under the assumption that the coin lands heads with a fixed unknown probability $p$, find the MLE of $p$ based on the data. 

**b)** Suppose now that the coin lands heads with a random probability $X$. Let the prior density of $X$ be uniform on the unit interval. Find the MAP estimate of the probability of heads, given the data.

**c)** Show that if $r > 1$ and $s > 1$ then the mode of the beta $(r, s)$  distribution is $(r-1)/(r+s-2)$. Remember to ignore multiplicative constants and take the log before maximizing.

**d)** Suppose instead that the prior density of $X$ is $f(x) = 4x^3$ if $0 < x < 1$ and $0$ otherwise. Find the MAP estimate of the probability of heads, given the data.

## Submission Instructions ##

Many assignments throughout the course will have a written portion and a code portion. Please follow the directions below to properly submit both portions.

### Written Portion ###
*  Scan all the pages into a PDF. You can use any scanner or a phone using applications such as CamScanner. Please **DO NOT** simply take pictures using your phone. 
* Please start a new page for each question. If you have already written multiple questions on the same page, you can crop the image in CamScanner or fold your page over (the old-fashioned way). This helps expedite grading.
* It is your responsibility to check that all the work on all the scanned pages is legible.
* If you used $\LaTeX$ to do the written portions, you do not need to do any scanning; you can just download the whole notebook as a PDF via LaTeX.

### Code Portion ###
* Save your notebook using `File > Save and Checkpoint`.
* Generate a PDF file using `File > Download As > PDF via LaTeX`. This might take a few seconds and will automatically download a PDF version of this notebook.
    * If you have issues, please post a follow-up on the general Homework 11 Ed thread.
    
### Submitting ###
* Combine the PDFs from the written and code portions into one PDF. [Here](https://smallpdf.com/merge-pdf) is a useful tool for doing so. 
* Submit the assignment to Homework 11 on Gradescope. 
* **Make sure to assign each page of your pdf to the correct question.**
* **It is your responsibility to verify that all of your work shows up in your final PDF submission.**

If you are having difficulties scanning, uploading, or submitting your work, please read the [Ed Thread](https://edstem.org/us/courses/43303/discussion/3344107) on this topic and post a follow-up on the general Homework 11 Ed thread.


## **We will not grade assignments which do not have pages selected for each question.** ##