# Yamlfile in awkwardNN

- AwkwardNN processes data from a rootfile based on a yaml file.

- The yaml file specifies:
    - what fields from the rootfile are to be trained,
    - how fields from different nested branches are connected, and
    - the types of network blocks for each branch.


In [1]:
from awkwardNN.nets.awkwardNN import awkwardNN_fromYaml


## Creating Yaml Files from Root files

- You can create a yaml file from a root file with the functions below.

- In the rootfile notebook, I mentioned that a field in an event can have 1 of 4
possible data structure interpretations: fixed, jagged, object, or nested.
The yaml file partitions fields based on their interpretation.


In [2]:
root_filename = "../data/test_qcd_1000.root"
yaml_filename = "../test_qcd_1000_default.yaml"

# can get yaml dict from rootfile
yaml_dict = awkwardNN_fromYaml.get_yaml_dict_from_rootfile(root_filename)

# can save yaml dict to a yaml file
awkwardNN_fromYaml.save_yaml_dict(yaml_dict, yaml_filename)

# can create a yaml file directly from a rootfile
awkwardNN_fromYaml.create_yaml_file_from_rootfile(root_filename, yaml_filename)

  return cls.numpy.array(value, copy=False)


## Specifying Yaml files from functions

- Can also customize yaml file from the same functions.

- Can use keyword arguments to specify:
    - embed_dim
    - mode
    - fixed_mode
    - jagged_mode
    - object_mode
    - nested_mode
    - hidden_sizes
    - nonlinearity
    - phi_sizes
    - rho_sizes

In [3]:
kwargs1 = {'embed_dim': 64, 'fixed_mode': 'deepset', 'jagged_mode': 'gru',
           'phi_sizes': '(64, 40)', 'rho_sizes': '(50, 25)'}
kwargs2 = {'embed_dim': 50, 'jagged_mode': 'lstm', 'object_mode': 'lstm',
           'hidden_sizes': '(100, 40)'}
kwargs3 = {'mode': 'deepset', 'nonlinearity': 'tanh', 'jagged_mode': 'lstm'}

yaml_filename1 = '../test_qcd_1000_default1.yaml'
yaml_filename2 = '../test_qcd_1000_default2.yaml'
yaml_filename3 = '../test_qcd_1000_default3.yaml'

awkwardNN_fromYaml.create_yaml_file_from_rootfile(root_filename, yaml_filename1, **kwargs1)
awkwardNN_fromYaml.create_yaml_file_from_rootfile(root_filename, yaml_filename2, **kwargs2)
awkwardNN_fromYaml.create_yaml_file_from_rootfile(root_filename, yaml_filename3, **kwargs3)


## Possible keys in Yaml files

Can also modify the yaml file directly with the following keys:

- embed_dim - `int` - must be >0:
    - the output size for the network; also the input size for
    the next neural network above it
- mode - `str` - [`mlp`, `deepset`]:
    - the type of network to be used for each AwkwardNN network block
- fixed_mode - `str` - [`mlp`, `deepset`]:
    - the type of network to be used for each fixed network block
- jagged_mode - `str` - [`vanilla_rnn`, `gru`, `lstm`, `deepset`]:
    - the type of network to be used for each jagged network block
- object_mode - `str` - [`vanilla_rnn`, `gru`, `lstm`, `deepset`]:
    - the type of network to be used for each object network block
- nested_mode - `str` - [`vanilla_rnn`, `lstm`, `gru`, `deepset`, `mlp`]:
    - the type of network to be used for each nested network block
- hidden_sizes - `str`:
    - a string of comma separated, positive integers, surround by parentheses
    - e.g. "(30, 56, 32, 50)" or "(100, 100)"
    - the number of nodes in each layer of an mlp
- nonlinearity - `str` - [`relu`, `tanh`]:
    - nonlinear functions to used for `mlp`, `deepset`, & `vanilla_rnn`
- phi_sizes - `str`:
    - a string of comma separated, positive integers, surround by parentheses
    - the number of nodes in each layer of the first network in a
    deepset network
- rho_sizes - `str`:
    - a string of comma separated, positive integers, surround by parentheses
    - the number of nodes in each layer of the second network in a
    deepset network
- fields - `str`:
    - list of fields used to train a network block
- use - `bool`:
    - this field is by default not present in the yaml file, but it can be
    added as a field for any specific network mode - fixed, jagged, object, nested -
    to indicate whether it should be use or not.

Note: can also comment out individual fields if you don't want to
use them during training (comment out with #)

