## Processing Configuration

The processing\_config ("config") together with the kernel dataset represent the key inputs to the processing pipeline.  The processing config is based on the `mt_metadata.base.Base` class.  This means it is a container that has a JSON or dictionary representation.

It is the purpose of the config to encapsulate all the parameters required for processing.  There are many parameters and effort has been given to selecting some "reasonable" default values so that users do not need to worry about all of these parameters if they don't want to.

The ProcessingConfig is expected to evolve with aurora as new functionalities become available.  This is one reason why such a genreic data srtuctrue was selected 
In this tutorial, we will use the synthetic dataset example to show some of the features of the config object.

Hopefully, it will be fairly easy to add other parameters to the config, such as:
- coherencey sorting 
- polarization sorting
- ARMA prewhitening 
- Other tools from the community



There are two main ways one would normally build the object:
1. Use the ConfigCreator class
2. Edit existing config.json files

But one can also initialize a Processing object, which is what is done inside ConfigCreator



In [1]:
from aurora.config import Processing

In [2]:
p = Processing()

In [3]:
p

{
    "processing": {
        "channel_nomenclature.ex": "ex",
        "channel_nomenclature.ey": "ey",
        "channel_nomenclature.hx": "hx",
        "channel_nomenclature.hy": "hy",
        "channel_nomenclature.hz": "hz",
        "decimations": [],
        "id": null,
        "stations.local.id": null,
        "stations.local.mth5_path": null,
        "stations.local.remote": false,
        "stations.local.runs": [],
        "stations.remote": []
    }
}

### Using ConfigCreator

In [4]:
from aurora.config import BANDS_DEFAULT_FILE
from aurora.config.config_creator import ConfigCreator


The config creator does not need any arguments to initialize, but it uses a KernelDataset to generate a processing object with station and run information, and default settings are applied to the processing parameters.

In [5]:
cc = ConfigCreator()

The ConfigCreator class generates a processing config with default arguments if it is provided with a KernelDataset

## Example of making a KernelDataset from an mth5

In [6]:
from aurora.pipelines.run_summary import RunSummary
from aurora.test_utils.synthetic.paths import DATA_PATH
from aurora.transfer_function.kernel_dataset import KernelDataset

2022-09-09 03:39:01,054 [line 135] mth5.setup_logger - INFO: Logging file can be found /home/kkappler/software/irismt/mth5/logs/mth5_debug.log


The following file will already exist if the synthetic tests have been run locally.  If it doesn't, uncomment and run the cell below

In [7]:
#from aurora.test_utils.synthetic.make_mth5_from_asc import create_test12rr_h5
#create_test12rr_h5()


In [8]:
mth5_path = DATA_PATH.joinpath("test12rr.h5")

In [9]:
run_summary = RunSummary()
run_summary.from_mth5s([mth5_path,])
run_summary.df

2022-09-09 03:39:02,793 [line 739] mth5.mth5.MTH5.close_mth5 - INFO: Flushing and closing /home/kkappler/software/irismt/aurora/tests/synthetic/data/test12rr.h5


Unnamed: 0,survey,station_id,run_id,start,end,sample_rate,input_channels,output_channels,channel_scale_factors,mth5_path
0,none,test1,1,1980-01-01 00:00:00+00:00,1980-01-01 11:06:39+00:00,1.0,"[hx, hy]","[ex, ey, hz]","{'ex': 1.0, 'ey': 1.0, 'hx': 1.0, 'hy': 1.0, '...",/home/kkappler/software/irismt/aurora/tests/sy...
1,none,test2,1,1980-01-01 00:00:00+00:00,1980-01-01 11:06:39+00:00,1.0,"[hx, hy]","[ex, ey, hz]","{'ex': 1.0, 'ey': 1.0, 'hx': 1.0, 'hy': 1.0, '...",/home/kkappler/software/irismt/aurora/tests/sy...


In [10]:
kernel_dataset = KernelDataset()
kernel_dataset.from_run_summary(run_summary, "test1", "test2")
kernel_dataset.df

Unnamed: 0,survey,station_id,run_id,start,end,sample_rate,input_channels,output_channels,channel_scale_factors,mth5_path,remote,duration
0,none,test1,1,1980-01-01 00:00:00+00:00,1980-01-01 11:06:39+00:00,1.0,"[hx, hy]","[ex, ey, hz]","{'ex': 1.0, 'ey': 1.0, 'hx': 1.0, 'hy': 1.0, '...",/home/kkappler/software/irismt/aurora/tests/sy...,False,39999.0
1,none,test2,1,1980-01-01 00:00:00+00:00,1980-01-01 11:06:39+00:00,1.0,"[hx, hy]","[ex, ey, hz]","{'ex': 1.0, 'ey': 1.0, 'hx': 1.0, 'hy': 1.0, '...",/home/kkappler/software/irismt/aurora/tests/sy...,True,39999.0


In [11]:
kernel_dataset.mini_summary

Unnamed: 0,survey,station_id,run_id,start,end
0,none,test1,1,1980-01-01 00:00:00+00:00,1980-01-01 11:06:39+00:00
1,none,test2,1,1980-01-01 00:00:00+00:00,1980-01-01 11:06:39+00:00


### Create Config from KernelDataset

In [12]:
config = cc.create_from_kernel_dataset(kernel_dataset)

In [13]:
config

{
    "processing": {
        "channel_nomenclature.ex": "ex",
        "channel_nomenclature.ey": "ey",
        "channel_nomenclature.hx": "hx",
        "channel_nomenclature.hy": "hy",
        "channel_nomenclature.hz": "hz",
        "decimations": [
            {
                "decimation_level": {
                    "anti_alias_filter": "default",
                    "bands": [
                        {
                            "band": {
                                "decimation_level": 0,
                                "frequency_max": 0,
                                "frequency_min": 0,
                                "index_max": 30,
                                "index_min": 25
                            }
                        },
                        {
                            "band": {
                                "decimation_level": 0,
                                "frequency_max": 0,
                                "frequency_min": 0,
             

You can see the entire config by executing the cell below, or you can cut and paste the json code into a json editor (e.g. https://jsoneditoronline.org) and then you can look at it heirarchically.

![CONFIG_0](figures/config_0.png)

You can also transform the processing object to a json string

In [14]:
json_string = config.to_json()

In [15]:
json_string

'{\n    "processing": {\n        "channel_nomenclature.ex": "ex",\n        "channel_nomenclature.ey": "ey",\n        "channel_nomenclature.hx": "hx",\n        "channel_nomenclature.hy": "hy",\n        "channel_nomenclature.hz": "hz",\n        "decimations": [\n            {\n                "decimation_level": {\n                    "anti_alias_filter": "default",\n                    "bands": [\n                        {\n                            "band": {\n                                "decimation_level": 0,\n                                "frequency_max": 0,\n                                "frequency_min": 0,\n                                "index_max": 30,\n                                "index_min": 25\n                            }\n                        },\n                        {\n                            "band": {\n                                "decimation_level": 0,\n                                "frequency_max": 0,\n                                "freque

Which can be saved:

In [16]:
with open("config.json", "w") as fid:
    fid.write(json_string)



Config creator can also save the json, the default is to place it in a folder "config", with a self generated filename

In [17]:
cc.to_json(config)#, path="config.json")

### Default Parameters

The default config parameters are listed below:

In [18]:
input_channels = ["hx", "hy"],
output_channels = ["hz", "ex", "ey"],
estimator = None,
emtf_band_file = BANDS_DEFAULT_FILE,

# What can user change with the Config?

- Channel Nomenclature
    - for example ex,ey maybe called e1, e2 in the mth5
        - This is handled by passing a channel_nomenclature keyword argument.
        - Examples of systems that might need this are LEMI and Phoenix
- Windowing Parameters
    - Window shape (family)
    - Window length
    - Sliding window overlap
    - Clock-Zero (optional)
- Choice of Stations
    - This is currently done via KernelDataset
- Scale Factors for Individual Channels
    - allows for correcting, or trying to add simple, frequency independent gain corrections
- Frequency Bands 
    - Group the Fourier coefficients into bands to be processed together and averaged for a TF estimate
    - currently via an emtf-style band_setup file
- Number of Decimation Levels
    - currently via an emtf-style band_setup file
- Regression Estimator Engine
    - Currently only choices are RME, RME_RR (Regression M-estimate) and Remote reference regresson M-estimate
- Regression Parameters
    - Maximum number of iterations
    - Maximum number of redecending iterations
    - Minimum number of frequecny cycles


### Examples of changing these parameters

Parameters either live at the global level or at the decimation level.  Global parameters include:

- Station (and Run) Information
- Source mth5 archive path
- channel nomenclature


Most other parameters are defined for each _Decimation_ of the data.  

As it currently stands these are declared by looping over decimation levels -- but adding a method to set all decimation levels should be added.





## EMTF Band Setup File

The frequency bands will eventually be setup in a variety of ways, but currently aurora supports only specification of bands either by explicit construction or via EMTF "band setup" files.   

In [19]:
BANDS_DEFAULT_FILE

PosixPath('/home/kkappler/software/irismt/aurora/aurora/config/emtf_band_setup/bs_test.cfg')

Here is the content of a typical EMTF band setup file:

These legacy files have the following significance;
The first line, 25 indicates the number of bands, and there are 25 lines following, one line per frequency band.

Each line comprises three numbers:

`decimation_level, first_FC_index, last_FC_index`

where "FC" stands for Fourier coefficient

The decimation factor applied at each level was controlled in EMTF by a sepearate file, called `decset.cfg`. In the old EMTF codes, this controlled the window length, overlap, the decimation factor, and the corners of the anti-alias filter applied before downsampling.

The decimation factor in EMTF was almost always 4, and the default behaviour of the ConfigCreator is to assume a decimation factor of 4 at each level, but this can be changed manually. 