# Forward Model Prototype

This notebook constructs a forward model based on a literature review for observed data given AGN properties

## Selection

It will likely be important to know what *kinds* of galaxies I would expect to have data on, when looking for correlations later.


*SDSS selection: photometric detection, then spectroscopic follow-up*

In DR7 (used for the OSSY catalog that this work relies on)

### SDSS Photometric Detection

We assume that SDSS photometrically detects all galaxies that fullfull the r < 17.77 selection cut below. Therefore, photometric detection introduces approximately no incompleteness.


### SDSS Spectroscopic Selection Function

In summary:
- Every extended source with apparent r < 17.77 is spectroscopically targeted as **galaxies** (except 6% for having a nearby companion)
- Otherwise, extended sources with (u* - g*) < 0.9, l < 0 and k < 0 are targeted as **low-z quasars**

TODO: review the QSO/galaxy flags, and ideally the more specific QSO_CAP/QSO_SKIRT (low-z quasar) flags as well. Why were the OSSY galaxies selected?

**For the main galaxy sample:**

Primary [source](http://iopscience.iop.org/article/10.1086/342343/fulltext/) and [online summary](http://classic.sdss.org/dr7/algorithms/target.html#main)

> Galaxy photometric properties are measured using the Petrosian magnitude system, which measures flux in apertures determined by the shape of the surface brightness profile

> The main galaxy sample consists of galaxies with r-band Petrosian magnitudes r ≤ 17.77 and r-band Petrosian half-light surface brightnesses μ50 ≤ 24.5 mag arcsec-2 (Ed: surface brightness only rejects 0.1% of galaxies). These cuts select about 90 galaxy targets per square degree, with a median redshift of 0.104

> The completeness of the sample is high, exceeding 99% (...) About 6% of galaxies that satisfy the selection criteria are not observed because they have a companion closer than the 55'' minimum separation of spectroscopic fibers

>  The SDSS spectra are of high enough signal-to-noise ratio (S/N > 4 per pixel) that essentially all targeted galaxies (99.9%) yield a reliable redshift (i.e., with statistical error less than 30 km s-1)

**For the Main Quasar Sample (QSO_CAP, QSO_SKIRT)**

Primary [source](http://iopscience.iop.org/article/10.1086/340187/fulltext/), same online summary as above

> Whereas stars have a spectrum that is roughly blackbody in shape, quasars have spectra that are characterized by featureless blue continua and strong emission lines, causing quasars to have colors quite different from those of stars. As a result of their distinct colors, quasar candidates can be identified as outliers from the stellar locus.

> Quasar candidates are selected via their nonstellar colors in ugriz broadband photometry and (Ed: additionally) by matching unresolved sources to the FIRST radio catalogs

> The ugri-selected objects are targeted to **i = 19.1**, and the griz-selected objects are targeted to **i = 20.2**

The griz selection is aimed at high-z quasars and is not helpful to me. These targets are flagged QSO_HIZ.

> Extended sources are also targeted as low-redshift quasar candidates in order to investigate the evolution of active galactic nuclei (AGNs) at the faint end of the luminosity function

>  Extended objects must have colors that are far from the colors of the main galaxy distribution and that are consistent with the colors of AGNs

> During the ugri color selection process, both extended and point-source objects are targeted as quasar candidates; we do not explicitly differentiate between quasars and their lower luminosity cousins that are typically extended. Note that by using PSF magnitudes throughout, we isolate the colors of any pointlike components of galaxies with active nuclei.

This figure shows how galaxies have quite different colors to stars, and are therefore well-seperable with a few cuts:

 <img src="http://cdn.iopscience.com/images/1538-3881/123/6/2945/Full/fg4.jpg" alt="From http://iopscience.iop.org/article/10.1086/340187/fulltext/" height="400" width="400">
 
 > We use two color cuts to reject extended objects unlikely to harbor an active nucleus. First, **extended objects that are detected in both u and g, that have errors less than 0.2 mag in each band, and that have (u* - g*) > 0.9 are rejected.** (...) This cut misses the extension of the galaxy locus to somewhat bluer u* - g* colors, and thus we apply a second cut, **rejecting extended objects with l > 0 and k > 0** (in the notation of Newberg & Yanny 1997; see also Appendix A). This second cut effectively removes all extended objects that are "above" and to the "left" of the stellar locus in the (u* - g*), (g* - r*) color-color diagrams
 
 There are various small manual inclusion and exclusion color-space regions aimed at e.g. picking out mid-z quasars, removing white dwarfs, etc. This becomes more detailed than I want to get into for this prototype.


> We would also like to quantify the completeness as a function of redshift and magnitude; we do so with the grid of model quasars discussed above. These simulated quasars have a much more homogeneous distribution in redshift-magnitude space than do real quasars.

> To help determine the survey completeness, we also make use of simulated quasar colors. We calculate the simulated distribution of quasar colors at a given redshift and magnitude, following the procedures described in Fan (1999) and Fan et al. (2001a). The intrinsic quasar spectrum model includes a power-law continuum and a series of broad emission lines. We use the same distributions of the power-law index for the quasar continuum (α = 0.5 ± 0.3, fν ∝ ν-α) and the equivalent widths for the emission lines as in Fan (1999; except for Fe II, which now has a larger equivalent width). The synthetic quasar absorption spectrum takes into account intervening H I absorbers along the line of sight, including Lyα forest systems, Lyman-limit systems, and damped Lyα systems using distribution functions similar to those used by Fan (1999). Finally, we calculate the SDSS magnitudes and the associated photometric error from the model spectrum in each band using the filter transmission curves and system efficiency of Stoughton et al. (2002), assuming a median seeing of 1&farcs;4 FWHM. The simulated colors are generated for a uniformly distributed grid of redshift and i magnitude.

The SED of the simulated quasars is very important for this calculation - it defines completeness *of what*.

TODO: read Fan

For simulated quasars with the SED described above at z < 0.5, SDSS spectroscopy [claims to be](http://iopscience.iop.org/article/10.1086/340187/fulltext/201557.tb6.html#tb6fnb) 100% complete for apparent i magnitude < 19.1 (and 0% complete otherwise, due to the 19.1 ugri-selected limited).

 <img src="http://cdn.iopscience.com/images/1538-3881/123/6/2945/Full/fg10.gif" alt="From http://iopscience.iop.org/article/10.1086/340187/fulltext/201557.fg10.html" height="400" width="400">
 
 *Contours show 10%, 25%, 50%, 75% and 90% completeness of simulated quasars*
 
 Translated to absolute magnitude, this is actually probing quite a specific i-band space: -19 < Mi < 22.
 
  <img src="http://cdn.iopscience.com/images/1538-3881/123/6/2945/Full/fg11.gif" alt="From http://iopscience.iop.org/article/10.1086/340187/fulltext/201557.fg11.html" height="400" width="400">
  
   *Contours show 10%, 25%, 50%, 75% and 90% completeness of simulated quasars*
   
   > The overall shape of the featureless continua of quasars is well approximated by a power law (Vanden Berk et al. 2001), although the continuum need not be a power law physically. Since a redshifted power law remains a power law with the same spectral index, quasar colors are only a weak function of redshift for z ≤ 2.2 as emission lines move in and out of the filters (Richards et al. 2001)

A useful explanation of the different magnitudes available from SDSS: http://www.sdss.org/dr12/algorithms/magnitudes/#mag_psf

Richard:
    
 > Yasuda et al. (2001) and Scranton et al. (2002) show that the star-galaxy separation is reliable at least to r* ∼ 21—typically much fainter than the limit explored by quasar target selection

### UKIDSS Selection

As described in the `get_ukidss_and_sdss` notebook, UKIDSS LAS has depths of: Y=20.2, J=19.6, H=18.8, K=18 for 5 sigma detection of a point source within a 2 arcsecond aperture. 

TODO is it fair to use this as the limiting depths for galaxies as well, or do I need to calculate this myself/find it elsewhere?





TODO explicitly plot the selection limits

### Color-Magnitude Bias

See sec. 2.5 of [this paper](https://academic.oup.com/mnras/article/404/3/1215/1049321)


Flux measurements close to the limiting magnitude become less reliable. 

As you approach the magnitude limit in some bands, the sample will start to become biased towards colors with deeper imaging. The conservative approach is to cut the sample while flux measurements in each band are still reliable and sources are confidently detected. The liberal approach is to weight each galaxy by its actual detection volume given the flux that you do see (or something similar...).

# AGN Selection

Oh 2011 and [2015](http://gem.yonsei.ac.kr/~ksoh/download/ossy_datafiles/Oh_etal_ApJS_219_1.pdf) compares the flux ratio in two shoulder regions near H-alpha lines to decide if the spectra has broad line H-alpha. Here's a great diagram: ![oh_2015_blr](oh_2015_type_1_broad_line_selection.png)

