### Background
The goal of this question is to get a better understanding of the Poisson
distribution and conjugate priors. Assume that we are recording the number of cars
crossing an intersection per day for a traffic survey. We have a dataset that consists of
the number of cars that have crossed the intersection collected over a period of n days:
<div align = "center"> $\{k_1, k_2, ...., k_n\}$
    </div>

A good way to model these counts if we assume they are iid samples from a Poisson
distribution: 
<div align = "center"> $\mathbb{P}(k_i = k) = \frac{\lambda^k\exp(-\lambda)}{k!}$
    </div>
where $\lambda > 0$ is an unknown parameter we wish to determine.

1. **Given a random variable $K\sim Pois(\lambda)$ compute $\mathbb{E}[K]$**<br/><br/>
$E(K) = \sum_{k=0}^{\infty}k \frac{\lambda^k e^{-\lambda}}{k!} = \sum_{k=1}^{\infty}k \frac{\lambda^k e^{-\lambda}}{k!}$<br/> <br/>
$=\sum_{k=1}^{\infty} \frac{\lambda^{k-1}\lambda e^{-\lambda}}{(k-1)!} = \lambda e^{-\lambda}\sum_{k=1}^{\infty} \frac{\lambda^{k-1}}{(k-1)!} = \lambda e^{-\lambda}\sum_{i=0}^{\infty} \frac{\lambda^{i}}{i!} = \lambda e^{-\lambda}e^{\lambda} = \lambda$

2. **Compute the likelihood of the data $\mathbb{P}(k_1,k_2,...,k_n|\lambda)$**<br/><br/>
$P(k_1,k_2,...,k_n|\lambda) = P(k_1|\lambda) P(k_2|\lambda) ...... P(k_n|\lambda)$ <br/><br/>
$=\frac{\lambda^{k_1}e^{-\lambda}}{k_1!}\frac{\lambda^{k_2}e^{-\lambda}}{k_2!}.....\frac{\lambda^{k_n}e^{-\lambda}}{k_n!} = \frac{\lambda^{k_1+k_2+.....+k_n}  \exp(-n\lambda)}{k_1!k_2!....k_n!}$

3. **The maximum likelihood estimator of $\lambda$ with respect to the collected data.**<br/><br/>
To find $\underset{\beta}{\arg\max} P(k_1,...,k_n|\lambda)$, it's equivalent to find the log of it.<br/><br/>
$\underset{\beta}{\arg\max} \log(\frac{\lambda^{k_1+k_2+.....+k_n} \exp(-n\lambda)}{k_1!k_2!....k_n!}) = \underset{\beta}{\arg\max} \log(\lambda^{k_1+k_2+.....+k_n}\exp(-n\lambda))$ <br/><br/>
$= \underset{\beta}{\arg\max} \log(\lambda^{k_1+k_2+.....+k_n}) - n\lambda$<br/><br/>
let leftside be $l = \log(\lambda^{k_1+k_2+.....+k_n}) - n\lambda$<br/><br/>
$\frac{\partial l}{\partial \lambda} = \frac{\sum_{i=1}^{n}k_i \lambda^{\sum k_i - 1}}{\lambda^{\sum k_i}} - n = (\sum_{i=1}^{n}k_i) \frac{1}{\lambda} - n = 0$ <br/><br/>
$\lambda = \frac{\sum k_i}{n} = E(K)$

4. We noticed the result in part3 is the same as what we get in part1. This intuitively makes sense because the best guess among n trial should the mean of these n trials.

5. **Now let's put a prior distribution of Gamma on the parameter $\lambda$**<br/><br/>
<div align = "center">$\lambda \sim Gamma(\alpha,\beta)$
    </div>
for $\alpha > 0$ and $\beta>0$ then the pdf of $\lambda$ is given by:<br/>
<div align = "center">$P(\lambda|\alpha, \beta) = \frac{\beta^{\alpha}\lambda^{\alpha-1}\exp(-\beta\lambda)}{\Gamma(\alpha)}$
    </div>
where
<div align = "center">$\Gamma(\alpha)=\int_0^{\infty}x^{\alpha-1}exp(-x)dx$
    </div> <br/>
We'll show the posterior distribution of $P(\lambda|k_1,k_2,......,k_n)$ is a also Gamma distribution .

Proof: $P(\lambda|k_1,....,k_n) = \frac{P(k_1,...,k_n|\lambda)P(\lambda)}{P(k_1,....,k_n)} \propto P(k_1,...,k_n|\lambda)P(\lambda)$ <br/><br/>
$P(k_1,...,k_n|\lambda) = \frac{\lambda^{(k_1+....+k_n)}\exp(-n\lambda)}{k_1!....k_n!}$<br/><br/>
$P(\lambda;\alpha, \beta) = \frac{\beta^{\alpha}\lambda^{\alpha-1}\exp(-\beta\lambda)}{\Gamma(\alpha)}$<br/><br/>
Therefore, $P(k_1,...,k_n|\lambda)P(\lambda) = \frac{\lambda^{(k_1+....+k_n)}\exp(-n\lambda)}{k_1!....k_n!}\frac{\beta^{\alpha}\lambda^{\alpha-1}\exp(-\beta\lambda)}{\Gamma(\alpha)}$

$=\frac{\beta^{\alpha}}{(k_1!....k_n!)\Gamma(\alpha)}\lambda^{(\alpha-1)+\sum k_i} exp[-(n+\beta)\lambda] \propto \frac{\beta^{\alpha}\lambda^{(\alpha+\sum k_i)-1}\exp(-(n+\beta)\lambda)}{\Gamma(\alpha)}\sim Gamma(\alpha+\sum k_i, n+\beta)$

6. **Compute the maximum a posteriori (MAP) estimate of $\lambda$**<br/><br/>
<div align = "center">$\underset{\beta}{\arg\max}P(\lambda|k_1,k_2,....,k_n)$
    </div> 
Firstly, $P(\lambda|k_1,....,k_n)\propto L = \beta^{\alpha}\lambda^{(\alpha+\sum k_i)-1}\exp(-(n+\beta)\lambda)$ <br/><br/>
$\underset{\beta}{\arg\max}P(\lambda|k_1,k_2,....,k_n) = \underset{\beta}{\arg\max} \log L$ <br/><br/>
$= \underset{\beta}{\arg\max}\log[\beta^{\alpha}\lambda^{(\alpha+\sum k_i)-1}\exp(-(n+\beta)\lambda)]$<br/><br/>
$= \underset{\beta}{\arg\max}\log\beta^{\alpha} + \log\lambda^{(\alpha+\sum k_i-1)} - (n+\beta)\lambda$<br/><br/>
denote this by $\underset{\beta}{\arg\max} (Q)$ <br/><br/>
$\Rightarrow \frac{\partial Q}{\partial \lambda}= \frac{\alpha + \sum k_i - 1}{\lambda} - (n+\beta) = 0$ <br/><br/>
Therefore, $\hat \lambda_{MAP} = \frac{\alpha + \sum k_i -1}{n+\beta}$

### Summary
$\hat \lambda_{MAP}$ is positive correlates to $\frac{\sum k_i -1}{n} \sim$ average # car perday, which makes sense, since with higher average observed, the distribution parameter $\lambda$ is expected to shift postively.