# New sectioned configuration file help

This document explains how to write a `kids_ggl` halo model configuration file with sections and subsections, allowing the user to provide their own functions for e.g., the mass-observable relation and its intrinsic scatter.

**Important Note:**
The document below is a minimalist description of the configuration file, and not all parameters or components are included. The full configuration file required by `halo.model` can be found in the `demo` folder within the pipeline directory tree. Also note that none of this applies to the `esd_production` functionality; the configuration file remains exactly the same as in `v1.x` for `esd_production`.

## 1) Model

The configuration file starts with the model specification. This follows the same syntax as before:

```
model            halo.model
```

## 2) Model ingredients

### 2.1) HOD observables

The first section is the `observables` section:

```
[observables]
logmstar       10,11       11,12.5      log
```

The first column here is an arbitrary (and dummy) name for the observable in question; the second and third columns are the lower and upper bin limits used in the stacking (comma-separated). In this case, we use two bins in the ranges $10 \leq \log_{10} m_\star \leq 11$ and $11 \leq \log_{10} m_\star \leq 12.5$.

The final column is optional, and for now its only possible value is `log`. This tells the pipeline that the observable is given in log-space. This is necessary information for properly applying the HOD (i.e., mass-observable relation and mass-observable scatter function, a.k.a. occupation probability). If the column is not present, it is understood that the observable is given in linear space. Any value different than `log` will raise an `AssertionError`.

Note that the HOD itself does not care what observable it receives so long as everything is passed consistently. It is possible, for instance, to define an observable $m_\star^{12} \equiv m_\star/10^{12}$:

```
[observables]
mstar12        0.01,0.1       0.1,1.25
```

We will see what this implies for the rest of the configuration file below (note that $m_\star^{12}$ is not in log space, therefore we omit the fourth column).

### 2.2) Selection function

The second section concerns the selection function:

```
[selection]
selection_file.txt       z,logmstar,completeness      fixed_width
```

Columns here correspond to the file name, the names of the columns to be used, and the `astropy.io.ascii` format to be used when reading the file. Using inconsistent formats, or not providing one (which is not allowed by `kids_ggl`) would probably result in the column names not being interpreted correctly. We recommend always using `fixed_width` with its default settings to avoid any confusion. A file generated with `ascii.write(*args, format='fixed_width')` is also easily human-readable:

```
|    z | logmstar | completeness |
| 0.10 | 2.00e+09 |     3.36e-01 |
| 0.15 | 2.00e+09 |     3.36e-01 |
| 0.20 | 2.00e+09 |     3.36e-01 |
| 0.30 | 2.00e+09 |     3.36e-01 |
| 0.40 | 2.00e+09 |     3.36e-01 |
| 0.50 | 2.00e+09 |     3.36e-01 |
| 0.60 | 2.00e+09 |     3.36e-01 |
| 0.80 | 2.00e+09 |     3.36e-01 |
| 1.00 | 2.00e+09 |     3.36e-01 |
...
```

Column names specified in the second column above must be in that order: (redshift,observable,completeness). They will be recorded in that order irrespective of the order in which they are present in the file.

This section is the only section that can be disabled altogether. We've chosen to allow this so the user is not required to produce a dummy selection file if they do not have/want one. To ignore selection effects -- or, equivalently, to assume that the sample is 100% complete -- simply write

```
[selection]
None
```

### 2.3) Halo model ingredients

The third section is the `ingredients` section:

```
[ingredients]
centrals       True
pointmass      True
satellites     False
miscentring    False
twohalo        True
```

Each of the ingredients' name is now fixed (as `ingredients` is a dictionary in the pipeline), and their values are either `True` (used) or `False` (not used). In the example above, we're asking the pipeline to evaluate a model that includes only central galaxies that are always located at the center of the gravitational potential, plus a point-mass component to account for stellar mass in the central galaxy and the large-scale two-halo component.

## 3) Model parameters

From now on, things change. Like above, we now define model *sections*, which refer to each of the
components of the halo model. The order of sections must be followed for the model
to be set up properly, but each component may have a custom number of parameters. *This is the whole idea behind the new configuration file structure, and allows full flexibility in the model used without having to modify the backbone provided by `halo.model`.*:

```
[section1/subsection1]
name1         prior1     [values1 ...]
name2         prior2     [values2 ...]
...
[section1/subsection1/subsubsection1]
name1         prior1     [values1 ...]
name2         prior2     [values2 ...]
...
[section2/subsection1]
name1         prior1     [values1 ...]
name2         prior2     [values2 ...]
...
```

etc, etc. This will be enough to explain what's going on (empty lines and lines starting with `'#'` are ignored). The first column is the name that will be used in the MCMC output if the parameter is varied during the chain; the second specifies the prior function, and following columns specify parameters passed to the prior function. See the [priors](priors.ipynb) section for details.

In addition, the use of repeat parameters is supported:

```
[section1/subsection1]
name1         prior1     [values1 ...]
name2         prior2     [values2 ...]
...
[section2/subsection1]
section1.subsection1.name1       repeat
name2         prior2     [values2 ...]
...
```

The above syntax means that the first parameter of `section2/subsection1` is always the same as `name1` in `section1/subsection1`. This is useful if there are free parameters required in more than one place (for instance, `h` may be used in the cosmology as well as the mass-concentration or mass-observable relations, or some of the parameters used for satellite galaxies might be based on those obtained for centrals).

### 3.1) Cosmological parameters

The first section including parameters that may be sampled in the MCMC is the section listing cosmological parameters:

```
[cosmo]
sigma_8         fixed     0.8159
H0              fixed     67.74
omegam          fixed     0.3089
omegab_h2       fixed     0.02230
omegav          fixed     0.6911
n               fixed     0.9667
z               array     0.188,0.195
```

They need not all be fixed, but in this example they are. At any rate, this will always be just a list of parameters, so no major changes compared to the previous version. *Note that the section name **must** be `cosmo`, and the both the list and order of cosmological parameters is fixed.*

Note however that we introduced a new "prior" type: `array`. This replaces the old `hm_params` (with an "s" at the end), and refers to values that should always be treated as arrays; the values thus defined are always fixed. Redshift is the prototypical `array` variable in `halo.model`.

### 3.2) HOD parameters

Now we move on to the HOD proper, and this is where the fun starts. The following are sections that can be modified seamlessly to produce a variety of halo model prescriptions, taking advantage of the backbone established by `halo.model` (i.e., the user need not write their own full-fledged model to do this):

```
[hod/centrals/pointmass]
logmstar        array     10.3,11.5
point_norm      uniform     0.5     5       1
[hod/centrals/concentration]
name            duffy08_crit
cosmo.z         repeat
fc              uniform     0.2     5       1
cosmo.h         repeat
[hod/centrals/mor]
name            powerlaw
logM0           fixed          12.0
a               uniform        -5      5
b               student        1
[hod/centrals/scatter]
name            lognormal
sigma_c         jeffrey
[hod/centrals/miscentring]
name            fiducial
p_off           uniform        0       1
R_off           uniform        0       1.5
```

The whole idea behind this structure is that the HOD may be fully specified by the user, including for instance the complexity of the mass-observable scaling relation. Note that the HOD may also contain a model for satellites and potentially other ingredients, but a simple centrals-only model will serve our purpose here (but note that `halo.model` does require satellite sections to be defined; please refer to `demo/ggl_model_demo.txt` for a full working configuration file). While it is the order of the sections that is required by the halo model, the `hod/` prefix to all HOD sections is required for the configuration file to be read properly. In any case, we recommend that the section names not be modified for consistency and ease of interpretation by other users.

In the example above we've only included mandatory parameters for each prior type, to keep it simple. Note also that we introduced new priors here compared to `v1.x` (and the `name` parameter), in addition to some repeat parameters as described above. For more information see the [priors](priors.ipynb) section.

***Note:*** *The miscentring implementation has not yet been modified from `v1.x`, and therefore the `name` parameter is silent for now (but still must be defined and given a value). No matter the value given, miscentring will be modelled as in Viola et al. (2015). If anyone should require more flexibility please raise an issue and we will make this a more urgent update.*

## 4) Setup

There is an additional section, `setup`, which includes deatiles on, well, the setup of the model. There are three mandatory parameters in this section:

```
[setup]
delta            200
delta_ref        mean
distances        comoving
```

and other parameters that, if omitted, are assigned their default values. For the time being, these are the $\ln k$ and $log M$ binning schemes (the former is set to 10,000 bins in the range (-13,17), and the latter to 200 bins over (5,16)).

```
lnk_bins        10000
lnk_min         -13
lnk_max         17
logM_min        10
logM_max        16    
logM_bins       200
```

In your own model, the `setup` section should include any parameter in the halo model that would *never* be a free parameter (not even a nuisance parameter); for instance, binning schemes or any precision-setting values. Note that `setup` is a dictionary in `kids_ggl` and therefore the order of its entries is irrelevant.

## 5) Model output

In addition, the configuration file should include a section `output`, containing any outputs produced by the model in addition to the free parameters. These are given as a name and the data format to be used in the FITS file, in addition to the number of dimensions, if applicable. The typical FITS format would be `E`, corresponding to single-precision floating point. See the [astropy help](http://docs.astropy.org/en/stable/io/fits/usage/table.html#column-creation) for more details.

You will usually want to have each ESD component here at the very least. `halo.model` outputs the total ESD and the effective halo mass per bin. In our 2-bin example, we would write

```
[output]
esd            2,8E
Mavg           2,E
```

where the first line means to register two separate columns, each with elements corresponding to arrays of length 8 (the 8 R-measurements that make up the ESD profile); and the second line means to create two other columns each containing scalars corresponding to the effective halo masses (this is given by the output of `halo.model`, *not* by the name given in the first column above).

The first column corresponds to the names given to the columns in the output FITS file. When there is more than one "dimension" (there are 2 in this case), columns are labelled e.g., `esd1,esd2,...`.

There is one alternative to the example above:

```
[output]
esd       2,8E
Mavg        2E
```

which, consistent with the description above, will produce a column `Mavg` where each entry contains both masses, rather than making separate columns.

## 6) Sampler configuration

And finally the `sampler` section, which remains the same:

```
[sampler]
path_data            path/to/data
data                 shearcovariance_bin_*_A.txt     0,1,4
path_covariance      path/to/covariance
covariance           shearcovariance_matrix_A.txt    4,6
```

where `path_data` and `path_covariance` are optional. Note the (optional) use of a wildcard (`*`) in `data`: the pipeline will then select more than one file if available. Note that the file names must be such that, when sorted alpha-numerically, they are sorted consistent with the observable binning defined in the `[observables]` section. (This is properly taken care of by the KiDS-GGL ESD production pipeline).

The third column in `data` specifies which columns from the data should be used: R-binning column, ESD column, and optionally the multiplicative bias correction column. Similarly, the third column in `covariance` specifies the covariance column and the multiplicative bias correction column. The covariance file should follow the format produced by the ESD production part of this same pipeline. In both cases, the multiplicative bias correction column is *optional* (omit if the correction has already been applied). The numbers used above correspond to those required if the data come from the KiDS-GGL ESD production pipeline.

The `sampler` section then continues with a few more settings:

```
exclude              11,12              # bins excluded from the analysis (count from 0)
sampler_output       output/model.fits  # output filename (must be .fits)
sampler              emcee              # MCMC sampler (fixed)
nwalkers             100                # number of walkers used by emcee
nsteps               2000               # number of steps per walker
nburn                0                  # size of burn-in sample
thin                 1                  # thinning (every n-th sample will be saved, but values !=1 not fully tested)
threads              3                  # number of threads (i.e., cores)
sampler_type         ensemble           # emcee sampler type (fixed)
update               20000              # frequency with which the output file is written
```

where only `exclude` is optional.

## Future improvements

 * Custom relations should be written in a user-supplied file rather than in the pipeline source code.
 * Adding a `module_path` optional entry to each section would easily allow custom files: the pipeline could simply add that path to `sys.path` and import from there.
     * There is the pickling problem however. Need to check if the above would allow for multi-thread runs.
 * Might want to add a `model` section, in case the above is implemented but more generally for any future changes