# New sectioned configuration file help

This document explains how to write a configuration file with sections and subsections, allowing the user to provide their own functions for e.g., the mass-observable relation and its intrinsic scatter.

### 1) Model

Let's start with the model. This follows the same syntax as before:

```
model            halo.model
```

### 2) Model parameters

From now on, things change. We now define *sections*, which refer to each of the
components of the halo model. The order of sections must be followed for the model
to be set up properly, but each component may have a custom number of parameters
(and this is the whole idea behind the new configuration file structure):

```
[section1/subsection1]
parameter1
parameter2
[section1/subsection1/subsubsection1]
parameter1
parameter2
[section2/subsection1]
parameter1
parameter2
```

etc, etc. This will be enough to explain what's going on (empty lines and lines starting with `'#'` are ignored).

#### 2.1) Cosmological parameters

So the first section of the actual configuration file defines cosmological parameters:

```
[cosmo]
sigma_8         fixed     0.8159
H0              fixed     67.74
omegam          fixed     0.3089
omegab_h2       fixed     0.02230
omegav          fixed     0.6911
n               fixed     0.9667
z               array     0.188
```

They need not all be fixed, but in this example they are. At any rate, this will always be just a list of parameters, so no major changes compared to the previous version.

Note however that we introduced a new variable type: `array`. This replaces the old `hm_params` (with an "s" at the end), and refers to values that should always be treated as arrays; the values thus defined are always fixed. This is the case with redshift in the fiducial `halo.model`

#### 2.2) HOD parameters

Now we move on to the HOD, and this is where the fun begins.

**2.2.1) HOD observables**

The first section is the `observables` section:

```
[hod/observables]
logmstar       10,12.5       11.315
```

The first column here is an arbitrary (and dummy) name for the observable in question; the second column are the bin limits used in the stacking (comma-separated), and the third column are comma-separated averages in each of the bins. In this case, we use a single bin in the range $10 \leq \log_{10} m_\star \leq 12.5$, and the average log-stellar mass in our sample happens to be $\langle \log_{10} m_\star \rangle=11.315$. Note that the average may be defined as whatever the user wishes.

**2.2.2) HOD ingredients**

The second section is the `ingredients` section:

```
[hod/ingredients]
centrals       True
pointmass      True
satellites     False
miscentring    False
```

Each of the ingredients' name is now fixed (as `ingredients` is a dictionary in the pipeline), and their values are either True (used) or False (not used).

**2.2.3) Occupation parameters**

Now the fum starts. The following are sections that can be modified seamlessly within the context of any given halo model (i.e., the user need not write their own full-fledged model to do this):

```
[hod/centrals]
fc             uniform        0.1     5
bias           fixed          1
[hod/centrals/mor]
name           powerlaw
logM_0         fixed          14.0
a              uniform        -1      1
b              student        1
[hod/centrals/scatter]
name           lognormal
sigma_c        jeffrey
[hod/centrals/miscentring]
name           fiducial
p_off          uniform        0       1
R_off          uniform        0       1.5
```

The whole idea behind this structure is that the HOD may be fully specified by the user, including for instance the complexity of the mass-observable scaling relation. Note that the HOD may also contain a model for satellites and potentially other ingredients, but a simple centrals-only model will serve our purpose here.

In the example above we've only included mandatory parameters for each prior type, to keep it simple. Note also that we introduced new priors here compared to `v1.x` (and the `name` parameter). For more information see `priors.ipynb` in this same folder.

### 3) Setup

We would usually want to have an additional section, `[setup]`, which would include things like the `k`-binning scheme, e.g.,

```
[setup]
lnk_min          -13
lnk_max          13
kbins            1000
```

that is, essentially the `setup` section should include any parameter in the halo model that would *never* be a free parameter (not even a nuisance parameter); for instance, binning schemes or any precision-setting values.

### 4) Model output

In addition, the configuration file should include a section `output`, containing any outputs produced by the model in addition to the free parameters. You will usually want to have each ESD component here at the very least. In our case, we'll just output the ESD and the average mass:

```
[output]
esd            8E
Mavg           E
```

### 4) Sampler parameters

And finally the sampler section, which remains the same:

```
[sampler]
path_data            path/to/data
data                 shearcovariance_bin_*_A.txt     0,1,4
path_covariance      path/to/covariance
covariance           shearcovariance_matrix_A.txt    4,6
```

where `path_data` and `path_covariance` are optional. Note the (optional) use of a wildcard (`*`) in `data`: the pipeline will then select more than one file if available. Note that the file names must be such that, when sorted alpha-numerically, they are sorted in increasing observable binning. (This is properly taken care of by the KiDS-GGL ESD production pipeline).

The third column in `data` specifies which columns from the data should be used: R-binning column, ESD column, and multiplicative bias correction column. Similarly, the third column in `covariance` specifies the covariance column and the multiplicative bias correction column. The covariance file should follow the format produced by the ESD production part of this same pipeline. In both cases, the multiplicative bias correction column is *optional* (if the correction has already been applied). The numbers used above correspond to those required if the data come from the KiDS-GGL ESD production pipeline.

The `sampler` section then continues with a few more settings:

```
exclude              11,12              # bins excluded from the analysis (count from 0)
sampler_output       output/model.fits  # output filename (must be .fits)
sampler              emcee              # MCMC sampler (fixed)
nwalkers             100                # number of walkers used by emcee
nsteps               2000               # number of steps per walker
nburn                0                  # size of burn-in sample
thin                 1                  # thinning (every n-th sample will be saved, but values !=1 not fully tested)
threads              3                  # number of threads (i.e., cores)
sampler_type         ensemble           # emcee sampler type (fixed)
update               20000              # frequency with which the output file is written
```

where only `exclude` is optional.

## Coding up your own model

With all these sections and parameters, the pipeline can interpret any model passed to it, no matter the number of parameters or the ordering of the ingredients, provided that the configuration file is consistent with the model structure.

For instance, we might define a mass-observable relation with a power-law with mass and redshift,
$$
\log m_\star = A + B\log\left(\frac{M_\mathrm{h}}{M_0}\right) + C\left(\frac{1+z}{1+z_0}\right)\,,
$$
which coded up looks like:

In [7]:
def powerlaw_mz(M, z, A, B, C, logM0, z0):
    return 10**(A + B*(np.log10(M)-logM0) + C*((1+z)/(1+z0)))

Note that these custom functions must include the halo mass as the first argument; all other arguments must be
defined in the configuration file -- including the redshift, in this case. This would therefore require the central MOR section in the configuration file to look like

```
[hod/centrals/mor]
name           powerlaw_mz
z              fixed          0.188
A              uniform        10      16    12
B              uniform        0       5     1
C              uniform        -1      1     0
logM0          fixed          14
z0             fixed          0.1
```

For the time being, you must include your model in the file `kids_ggl_pipeline/halomodel/hod/relations.py`, and custom functions for the scatter about this relation must be included in the file `kids_ggl_pipeline/halomodel/hod/scatter.py`. This has the undesirable effect that this file, common to all users, might get clogged with trial-and-error attempts by various users. We will implement the ability to input user-provided files in the future.

*The only condition is that the order of sections **must follow the order defined in the coded model**.*

## Future improvements

 * Choosing between mass-concentration relations should be trivial under this scheme. Should add a subsection `[hod/centrals/concentration]` and possibly one for satellites
 * Custom relations should be written in a user-supplied file rather than in the pipeline source code.
 * Adding a `module_path` optional entry to each section would easily allow custom files: the pipeline could simply add that path to `sys.path` and import from there.
     * There is the pickling problem however. Need to check if the above would allow for multi-thread runs.
 * Might want to add a `model` section, in case the above is implemented but more generally for any future changes