### Background on ML Estimation of Sinusoid in Noise Parameters

For this second part, consider a single sinusoid in white Gaussian noise with variance $\sigma^2$
$$
y_k =s_k +v_k = A_1 \cos(2\pi f_1 k + \phi_1) + v_k \; , \; k = 0, \cdots , n − 1 .
$$
The Maximum Likelihood estimates of the paramteres $\sigma^2, A_1, \phi_1$ and $f_1$ can be shown to be obtained as
$$
\begin{split}
\hat{f}_1 = \operatorname*{arg\,max}_f |\mathcal{Y}(f)| \qquad \qquad \qquad \qquad
\\
\hat{A}_1 = \frac{2}{n} | \mathcal{Y}(\hat{f}_1) | \qquad \qquad \qquad \qquad \qquad 
\\
\hat{\phi}_1 = \operatorname*{arg}_{n - 1} \mathcal{Y}(\hat{f}_1) \qquad \qquad \qquad \qquad \qquad 
\\
\widehat{\sigma^2} = \frac{1}{n} \sum_{k=0}^{n-1} (y_k − \hat{A}_1 \cos(2\pi \hat{f}_1 k + \hat{\phi}_1))^2
\end{split}
$$
where $Y(f)= \sum_{k=0}^{n-1} y_k e^{−j 2 \pi f k} $ and $ \operatorname{arg} \{ \rho e^{j \theta} \} = \theta$. In order to turn the optimization problem $k=0$ for $\hat{f}_1$ into a practical algorithm, we shall use the DFT. First decide on an acceptable “bias” in the ability to resolve the maximum of $|\mathcal{Y}(f)|$. Let’s call the frequency resolution $\Delta f$ :
$$ \Delta f = \frac{1}{m}, \; m = \frac{1}{\Delta f} .$$

In order to have a DFT with such a frequency resolution, we need to have a signal of length $m$. We assume $m > n$, the number of samples available. Hence zero pad y_k to obtain $y_0,y_1,...,y_{m−1}$ where in fact $y_n = y_{n+1} = y_{m−1} = 0$. Take the DFT of the zero padded sequence (in Matlab, y_0 is the first element of a signal vector, y_1 the second etc.)
$$Y_l = \sum_{k=0}^{m-1} y_k e^{j 2 \pi \frac{l}{m} k} = \sum_{k=0}^{n-1} y_k e^{j 2 \pi \frac{l}{n} k} .$$
Then the Maximum Likelihood estimates can be approximately obtained as
$$
\begin{split}
\hat{l} = \operatorname*{arg\,max}_l |\mathcal{Y}_l|, \qquad \hat{f}_1 = \frac{\hat{l}}{m}, \qquad \hat{A}_1 = \frac{2}{n} | \mathcal{Y}_\hat{l} | \qquad \qquad 
\\
\hat{\phi}_1 = \operatorname*{arg} \mathcal{Y}\hat{l}, \quad 
\widehat{\sigma^2} = \frac{1}{n} \sum_{k=0}^{n-1} (y_k − \hat{A}_1 \cos(2\pi \hat{f}_1 k + \hat{\phi}_1))^2
\end{split}
$$


The search for $f_1$ should be limited to the interval $[0, \frac{1}{2} ]$ (the DFT will show a symmetrical peak at $1-\hat{f}_1$ ). Note that zeropadding only allows us to get within $\Delta f = \frac{1}{m}$ of the maximum of $|\mathcal{Y}(f)|$. It does not improve the estimation accuracy (variance) of the estimator $\hat{f}_1$. The variance can only be reduced by increasing the number of samples $n$ (see Cramer-Rao bound).

The Cramer-Rao bounds for the estimation of the various parameters can be shown to be:

$$
\begin{gather}
CRB_{\widehat{\sigma^2}} = \frac{2 \sigma^4}{n} \qquad CRB_{\hat{A}_1} = \frac{2 \sigma^2}{n} \\CRB_{\widehat{\phi_1}} = \frac{8 \sigma^2}{n A_1^2 } \qquad CRB_{\widehat{f_1}} = \frac{6 \sigma^2}{\pi^2 n^3 A_1^2 }
\end{gather}
$$