# Determine the parameters of the low redshift galaxy luminosity function

In this project you will reproduce the work by Blanton et al. (2003) who determined the first time the absolute magnitude distribution of galaxies on a significantly large spectroscopically observed sample, the Early Data Release (EDR) of the Sloan Digital Sky Survey (SDSS). You will obtain the necessary data from the SDSS SkyServer, a relational database system, which is available after free registration.

* Link to the Blanton et al. (2003) paper: https://arxiv.org/abs/astro-ph/0210215

## Sloan Digital Sky Survey

The Sloan Digital Sky Survey (sometimes also called the Cosmic Genome Project) is a systematic photographic and specroscopic mapping of the extragalactic universe. During the first phase of the project, an imaging survey was conducted to photograph the sky in five optical wavelength band with a 120 megapixel camera. It was followed by a spectroscopic survey where more than 2 million objects, including more than a million galaxies were observed with a multi-fiber spectrograph.

The SDSS catalog (numerical parameters of the detected objects extracted from the multi-band images) is available as a relational database via SkyServer.

* Link to SDSS SkyServer: http://skyserver.sdss.org/

## Magnitudes

Luminosity $L$ is a primary intrinsic property of galaxies, it is the amount of energy emitted per second in a given wavelength range. Like power, it can be measured in W or erg s$^{-1}$ but astronomers prefer to express it in units of solar luminosity $L_{\textrm{Sol}}$, i.e. the luminosity of the Sun. The luminosity of the largest galaxies can be as high as $10^{12}$-$10^{14} L_\textrm{Sol}$. When brightness is measured from a distance, we measure flux instead of luminosity:

$$ F = \frac{L}{4\pi D_L}, $$

where $D_L$ is the _luminosity distance_ which has a special meaning in the expanding universe. Distances are measured through spectroscopic redshift which, in first order at very low redshifts yields a distance $D$ according to Hubble's law:

$$ c z = H_0 D, $$

where $H_0$ is the Hubble constant, $c$ is the speed of light and $z$ is the redshift. Please read on for details on why not to use Hubble's first order law in this project.

Instead of expressing brightness as luminosity or flux, astronomers prefer the negative logarithmic magnitude system. The apparent magnitude $m$ of a galaxy is defined with its flux as

$$ m = -2.5 \log_{10} F + m_0, $$

where $m_0$ is a constant fixed by the _magnitude system_ used. in case of the _AB_ magnitude system of SDSS, it is set to be $m_0 = -48.6$ regardless of the bandpass filter used for the observation. The absolute magnitude $M$ of a galaxy could be defined similarily from luminosity, but astronomers use the traditional definition

$$ M = m - DM - K - A, $$

where $A$ is the foreground extinction, see later, and $DM$ is the _distance modulus_ defined as

$$ DM = 5 \log_10 D_L $$

and $K$ is the so called K-correction that comes from the fact that the spectra of distant galaxies are redshifted with respect to the bandpass filter used for the observation.

The luminosity distance $D_L$ can be computed from the redshift $z$ using a formula derived from general relativity for the expanding space-time of the universe. For the exact formula see Hogg (2000) Eq. 21.

Link to Distance Measures in Cosmology paper: https://arxiv.org/abs/astro-ph/9905116

Measuring the apparent magnitude of resolved sources in CCD images is rather arbitrary. Various method exists from which the so called Petrosian magnitudes are the best-suited to compare galaxies which appear different in size in images.

The term $A$ is a correction for foreground extinction (reddening) caused by the dust in our Galaxy. This value can be looked up for each filter of any magnitude system by using a dust map, but this value is also available in the SkyServer databae for every object.

## The luminosity function

The distribution of intrinsic galaxy brightness is called the luminosity function and this is a primary statistical tool to analyze the evolution of populations of galaxies of various types. It also carries a lot of information about the formation of galaxies which we cannot detail here. Schechter (1976) parametrized the luminosity function in the form

$$ \phi(L) dL = \phi_0 \left( \frac{L}{L_*} \right)^{\alpha} \exp \left( - \frac{L}{L_{*}} \right) dL, $$

where $\phi_0$, $L_*$ and $\alpha$ are paremeters to be fitted. The value of $phi(L)$ is proportional to the probability of a randomly observed galaxy having a luminosity between $L$ and $L + dL$. The formula is most often rewritten to magnitudes and plotted in the $M$ -- $\log_{10} \phi(M)$ plane.

Fig. 15. of Blanton et al. (2003) illustrates the typical shape of the luminosity function in a log-log plot for various values of its parameters.

* Link to Schechter (1976) paper: http://adsabs.harvard.edu/abs/1976ApJ...203..297S

## The Malmquist bias

Statistics of galaxies is prone to a serious selection effect, the Malmquist bias. Telescopes have a limited light collection capability, hence observations at the faint end are limited by flux, i.e. apparent magnitude. When counting galaxies, this bias has to be taken care of rigorously. A simple way of doing this is to assign a weight to each galaxy proportional to the probability a galaxy with absolute magnited $M$ is observed or not as a given $m_\textrm{limit}$ magnitude limit. Using this probability is usually refered as the $1 / V_\textrm{max}$ method described, for example, by Davis & Huchra (1982), from Eq. 5 and on. The weight of each galaxy is taken as

$$ w = \frac{1}{D_{C, \textrm{max}}^3}, $$

where $D_{C, \textrm{max}}$ is the _comoving distance_ to the redshift $z_\textrm{max}$ at which the galaxy would fall out of the magnitude limit. In this formula we assumed a flat cosmology, see Hogg (2000).

* Link to Davis & Huchra (1982) paper: http://adsabs.harvard.edu/abs/1982ApJ...254..437D

## Tasks

### 1. Data aquisition

Register to SDSS SkyServer and collect the necessary data from the SDSS DR7 main galaxy sample. Preferably use the database to compute the absolute magnitudes. You will need the `PhotoObj` table to get the magnitude `petro_r` and the `SpecObj` table to get the redshift `z`. The join between the two tables can be made with `PhotoObj.objID = SpecObj.bestObjID`. Functions are provided for cosmological distance calculation.

### 2. Fitting the histogram with maximum likelihood method

Create a histogram of absolute magnitudes with appropriate weighting of the galaxies to correct for the Malmquist bias. The magnitude limit of the spectroscopic sample is set explicitly to $m_\textrm{limit} = 17.77$ by the targeting algorithm which selects galaxies for spectroscopic follow-up observations. Pay attention to correct error estimates of histogram counts. Fit the histogram with the Schechter function using the maximum likelihood method.

### 3. Kernel density estimator method or hierarchical Bayes

Extend the simple histogram method of Task 2 to include the uncertainties of magnitude measurement. One way to do this is to use a Kernel Density Estimatior in place of a simple histogram. Alternatively, you can use hierarchical Bayesian modeling to determine the posterior of the parameters.

### 4. Reproduce the Gaussian mixture method of Blanton et al.

Try to fit the luminosity function following Blanton et al. (2003), as described in the explanation of Eq. 5.