**Tutorial 4 - Linear Regession on Galaxy Spectra with Templates**

In this tutorial we will learn how to read data from a fits file, fit a spectrum to a series of templates and determine the spectrographic redshift of a galaxy.

When assigning redshifts to galaxies with a full observed spectrum, one typically fits model spectra for the rest-frame galaxy spectrum to the observed one while accounting for the redshift of the wavelengths. In this challenge, we first want to try to fit a local spectrum with templates using a linear regression and least-squares before trying to estimate the redshift of a redshifted spectrum.

1) Extract templates spectra from the fits file k_nmf_derived.newdefault.fits.  Use astropy.io.fits.open() for this.  The function returns an hdul object.  You can look at the contents of it with hdul.info().  The templates we want are in the second table - templates[1].data.  There are 5 templates with each one having 27330 wavelengths so this will be a 27330 x 5 array.  The wavelengths that correspond to the pixels are in the templates[11].data.  The other tables contain templates with and without smoothing, dust, lines, etc.  You can look at them if you wish. Print the header information for the second table.

In [56]:
from astropy.io import fits

templates = fits.open('k_nmf_derived.newdefault.fits')
templates.info()
print(templates[1].header)
M = templates[1].data
lam = templates[11].data

print(np.shape(M))

2) Plot the template spectra with proper axis labels.  Using log scaling on the y-axis makes it clearer.

Fitting templates to a spectrum:

The linear model for the spectrum is of the form
\begin{equation}
f_\lambda = \sum_i T_\lambda^i \theta^i
\end{equation}
where the $i$th template is $T_\lambda$.  This can be written in matrix notation as
\begin{equation}
\pmb{f} = \pmb{M}\pmb{\theta}.
\end{equation}
$\theta^i$ is the weight given to the $i$th template.

In the next steps we will formulate and solve the least squares problem for finding the best fit amplitudes of the templates from part 1 when fitting them to the data of a local (redshift zero) spectrum in `localspectrum2.csv`. You will implement your own linear least squares solver and compare it to `numpy`'s implementation.

3) Load a spectrum from file localspectrum2.csv (np.loadtxt()) and plot it.  This file contains both the wavelengths and the fluxes of each pixel.  (use pandas.read_csv).

**Find the least-squares solution for the template coefficients.**

4) Make a function that returns the pseudo-inverse of the $\pmb{M}$ matrix given in the lecture notes,
\begin{equation}
\pmb{M}^+ = \left( \pmb{M}^{T} \pmb{M} \right)^{-1} \pmb{M}^T
\end{equation}
(you will need functions np.dot(), np.linalg.inv() and remember that the transpose of a matrix `M` is `M.T`.  *You should be inverting a small matrix, not a large one.*  If you are inverting a large matrix you have not taken the transposes properly.)

Use this pseudo-inverse to find the least-squared solution and print it.  Which template spectrum is closest to the galaxy's spectrum?

5) The least-squared problem can also be solved using the function  numpy.linalg.lstsq().  Use this function to solve the problem and show that it agrees with your solution above.  

6) Plot the data again and the model spectrum on top of it.  The model spectrum is the sum of the templates with the best fit coefficients.

7) Plot the fractional residuals as a function of $\lambda$, 

$ \frac{|~f_\lambda-d_\lambda ~|}{f_\lambda}$

where $d_\lambda$ is the data and $f_\lambda$ is the prediction of your model.

**Fitting for the redshift**

8) Upload the data `redshiftedspectrum.txt` with np.loadtxt().  It contains a redshifted spectrum.  Use the same wavelengths as before for each pixel.  This is not the same spectrum as used above.

9) Fitting both redshift and template amplitudes is no longer a linear problem. The observed wavelength $\lambda_o$ is related to the restframe wavelength $\lambda$ (the wavelength it was emitted at) by
\begin{equation}
\lambda = \frac{\lambda_o }{ (1+z)}
\end{equation}
where $z$ is the redshift.

Implement the following steps:

Make an array of 1000 redshifts evenly spaced between 0 to 1.

Loop through the redshifts doing the following:

    find restframe wavelengths of the observed wavelengths

    find which wavelengths fall within the range of the template
    index = np.where(lambda < lambda_rest[-1])
 
    use the numpy.interp() function to interpolate 
       the observed spectrum to the resframe wavelengths of 
       the templates.
        yinterp = np.interp(lambda[inds], lambda_rest, data)
    
    use np.linalg.lstsq() to find the coefficients
       (make sure you only use observed wavelengths)
   
    save the residual for each redshift

10) Plot the residuals as a function of redshift.
What is the best redshift? (np.argmin())

11) Convert what was in the loop above into a function that takes the redshift and returns the sum of squares.  Use this function, `scipy.optimize.minimize` and an initial guess based on the results of 9) to find the best fit redshift.

12) Plot the model spectrum and data with respect to observed wavelength.