This code repository contains code to completely reproduce the DLA catalog reported in
R Garnett, S Ho, S Bird, and J Schnedier. Detecting Damped Lyman-α Absorbers with Gaussian Processes. arXiv:1605.04460 [astro-ph.CO],
including all intermediate data products including the Gaussian process null model described therein. The provided parameters should exactly reproduce the catalog in that work; however, you may feel free to modify these choices as you see fit.
The pipeline has multiple stages, outlined and documented below.
The first step of the process is to load the DR12Q quasar catalog and available DLA catalogs, extract some basic data such as redshift, coordinates, etc., and apply some basic filtering to the spectra:
- spectra with z < 2.15 are filtered
- spectra identified in a visual survey to have broad absorption line (BAL) features are filtered
Relevant parameters in set_parameters
that can be tweaked if desired:
% preprocessing parameters
z_qso_cut = 2.15; % filter out QSOs with z less than this threshold
This process proceeds in three steps, alternating between the shell and MATLAB.
First we download the raw catalog data:
# in shell
cd data/scripts
./download_catalogs.sh
Then we load these catalogs into MATLAB:
% in MATLAB
set_parameters;
build_catalogs;
The build_catalogs
script will produce a file called file_list
in
the data/dr12q/spectra
directory containing relative paths to
yet-unfiltered SDSS spectra for download. The file_list
output is
available in this repository in the same location. The next step is to
download the coadded "speclite" SDSS spectra for these observations
(warning: total download is 35 GB). The download_spectra.sh
script
requires wget
to be available. On OS X systems, this may be
installed easily with Homebrew.
# in shell
cd data/scripts
./download_spectra.sh
download_spectra.sh
will download the observational data for the yet
unfiltered lines of sight to the data/dr12q/spectra
directory.
Now we load these data, continue applying filters, and do some basic preprocessing. The additional filters are:
- spectra that have no nonmasked pixels in the range [1310, 1325] Angstroms (QSO restframe) are filtered, as they cannot be normalized
- spectra with fewer than 200 nonmasked pixels in the range [911, 1217] Angstroms (QSO restframe) are filtered.
The preprocessing steps are to:
- truncate spectra to only contain pixels in the range [911, 1217] Angstroms QSO rest
- normalize flux and noise variance by dividing by the median flux in the range [1310, 1325] Angstroms QSO rest
Relevant parameters in set_parameters
that can be tweaked if
desired:
% preprocessing parameters
min_num_pixels = 200; % minimum number of non-masked pixels
% normalization parameters
normalization_min_lambda = 1310; % range of rest wavelengths to use Å
normalization_max_lambda = 1325; % for flux normalization
% file loading parameters
loading_min_lambda = 910; % range of rest wavelengths to load Å
loading_max_lambda = 1217;
When ready, the MATLAB code to preload the spectra is:
set_parameters;
release = 'dr12q';
file_loader = @(plate, mjd, fiber_id) ...
(read_spec(sprintf('%s/%i/spec-%i-%i-%04i.fits', ...
spectra_directory(release), ...
plate, ...
plate, ...
mjd, ...
fiber_id)));
preload_qsos;
The result will be a completed catalog data file,
data/[release]/processed/catalog.mat
, with complete filtering
information and a file containing preloaded and preprocessed data for
the 162861 nonfiltered spectra,
data/[release]/processed/preloaded_qsos.mat
.
Now we build our models, including our Gaussian process null model for quasar emission spectra and our model for spectra containing DLAs.
To build the null model for quasar emission spectra, we need to indicate a set of spectra to use for training, which should be nominally DLA-free. Here we select all spectra:
- in DR9
- not removed by our filtering steps during loading
- in the DR9 Lyman-alpha forest catalog, and
- not in the DR9 Lyman-alpha DLA concordance catalog
These particular choices may be accomplished with:
training_release = 'dr12q';
dla_catalog_name = 'dr9q_concordance';
train_ind = ...
[' catalog.in_dr9 & ' ...
'(catalog.filter_flags == 0) & ' ...
' catalog.los_inds(dla_catalog_name) & ' ...
'~catalog.dla_inds(dla_catalog_name)'];
After specifying the spectra to use in training_release
and
train_ind
, we call learn_qso_model
to learn the model.
To learn the model, you will need the MATLAB toolbox
minFunc
available from Mark Schmidt.
You should cd to the directory where you installed minFunc to and run:
addpath(genpath(pwd));
mexAll;
to initialize minFunc before the first time you use it.
Relevant parameters in set_parameters
that can be tweaked if
desired:
% null model parameters
min_lambda = 911.75; % range of rest wavelengths to Å
max_lambda = 1215.75; % model
dlambda = 0.25; % separation of wavelength grid Å
k = 20; % rank of non-diagonal contribution
max_noise_variance = 1^2; % maximum pixel noise allowed during model training
% optimization parameters
initial_c = 0.1; % initial guess for c
initial_tau_0 = 0.0023; % initial guess for τ₀
initial_beta = 3.65; % initial guess for β
minFunc_options = ... % optimization options for model fitting
struct('MaxIter', 2000, ...
'MaxFunEvals', 4000);
When ready, the MATLAB code to learn the null quasar emission model is:
training_set_name = 'dr9q_minus_concordance';
learn_qso_model;
The learned qso model is stored in
data/[training_release]/processed/learned_qso_model_[training_set_name].mat
.
We also need to specify a set of DLA parameter samples for the DLA
model. This is handled by the generate_dla_samples
script.
Relevant parameters in set_parameters
that can be tweaked if
desired:
% DLA model parameters: parameter samples
num_dla_samples = 10000; % number of parameter samples
alpha = 0.9; % weight of KDE component in mixture
uniform_min_log_nhi = 20.0; % range of column density samples [cm⁻²]
uniform_max_log_nhi = 23.0; % from uniform distribution
fit_min_log_nhi = 20.0; % range of column density samples [cm⁻²]
fit_max_log_nhi = 22.0; % from fit to log PDF
When ready, the MATLAB code to generate the DLA model parameter samples is:
training_release = 'dr12q';
generate_dla_samples;
Finally, we may use our built model to compute the posterior probability of containing a DLA along the line of sight as described in the paper above.
The processing code requires the C helper function voigt.c
to be
compiled to compute Voigt profiles quickly from MATLAB. This requires
the libcerf
library to be installed. This is available
from Homebrew-science
for OS X users. The code for this is:
% in MATLAB
mex voigt.c -lcerf
To perform a DLA search, we must specify a few things first. First, we must specify which quasar emission model to use; to select the one learned above, we may use
% specify the learned quasar model to use
training_release = 'dr12q';
training_set_name = 'dr9q_minus_concordance';
(the code will attempt to load the model from a file called
data/[training_release]/processed/learned_qso_model_[training_set_name].mat
.)
Next, we must specify which spectra to use to compute the DLA model prior Pr(M_DLA). Here we select all spectra that are:
- in DR9
- in the DR9 Lyman-alpha forest catalog, and
- not filtered by our filtering steps above.
These choices can be realized with:
% specify the spectra to use for computing the DLA existence prior
dla_catalog_name = 'dr9q_concordance';
prior_ind = ...
[' prior_catalog.in_dr9 & ' ...
' prior_catalog.los_inds(dla_catalog_name) & ' ...
'(prior_catalog.filter_flags == 0)'];
Next, we must specify which spectra to search for DLAs. Here we use all DR12Q spectra that were not filtered:
% specify the spectra to process
release = 'dr12q';
test_set_name = 'dr12q';
test_ind = '(catalog.filter_flags == 0)';
Relevant parameters in set_parameters
that can be tweaked if
desired, including function handles specifying the range of z_DLA to
search:
% model prior parameters
prior_z_qso_increase = kms_to_z(30000); % use QSOs with z < (z_QSO + x) for prior
% instrumental broadening parameters
width = 3; % width of Gaussian broadening (# pixels)
pixel_spacing = 1e-4; % wavelength spacing of pixels in dex
% DLA model parameters: absorber range and model
num_lines = 3; % number of members of the Lyman series to use
max_z_cut = kms_to_z(3000); % max z_DLA = z_QSO - max_z_cut
max_z_dla = @(wavelengths, z_qso) ... % determines maximum z_DLA to search
(max(wavelengths) / lya_wavelength - 1) - max_z_cut;
min_z_cut = kms_to_z(3000); % min z_DLA = z_Ly∞ + min_z_cut
min_z_dla = @(wavelengths, z_qso) ... % determines minimum z_DLA to search
max(min(wavelengths) / lya_wavelength - 1, ...
observed_wavelengths(lyman_limit, z_qso) / lya_wavelength - 1 + ...
min_z_cut);
When ready, the selected spectra can be processed with process_qsos
.
This script will write the results in
data/[release]/processed_qsos_[test_set_name].mat
.
The complete code for processing the spectra in MATLAB is:
% produce catalog searching [Lyoo + 3000 km/s, Lya - 3000 km/s]
set_parameters;
% specify the learned quasar model to use
training_release = 'dr12q';
training_set_name = 'dr9q_minus_concordance';
% specify the spectra to use for computing the DLA existence prior
dla_catalog_name = 'dr9q_concordance';
prior_ind = ...
[' prior_catalog.in_dr9 & ' ...
'(prior_catalog.filter_flags == 0) & ' ...
' prior_catalog.los_inds(dla_catalog_name)'];
% specify the spectra to process
release = 'dr12q';
test_set_name = 'dr12q';
test_ind = '(catalog.filter_flags == 0)';
% process the spectra
process_qsos;
Finally, we may create an ASCII catalog of the results if desired with
generate_ascii_catalog
, e.g.:
set_parameters;
training_release = 'dr12q';
release = 'dr12q';
test_set_name = 'dr12q';
generate_ascii_catalog;