## The end (?)

We've reached **the end** of this short Ph.D. course about reproducibility and deep learning experiments.

#### What to do?

## deasy-learning

In this **bonus** lecture to achieve the desired **10 hours**, I'm going to briefly show you some functionalities of my small **custom library**.

#### Disclaimer

- What you are going to see is not **groundbreaking** and probably well-known since 50 years. I just couldn't find anything suitable...but I'm a picky person time to time...

- **No official code yet**: the library is currently undergoing a **major refactoring**!

- **ETA** release date: approximately end of May/beginning of June

- The chosen **name** is embarassing! Any suggestions? (*DISI $\rightarrow$ deasy $\rightarrow$ diversified easy learning*)

- **Ping me** if you are interested! (*don't you have anything else to do?*)

## What are we going to cover?

- ``deasy-learning``
    - Motivational intro $\rightarrow$ *why writing a custom library*
    - Overview
    - Configuration and Component
    - Training a SVM model with deasy-learning
- Evaluation questionnaire
- Motivational outro

## Motivational Intro

*Shouldn't you be doing research instead of jamming with your mechanical keyboard to write buggy code?*

### Why deasy-learning 

*Take a seat, my friend, for I'm telling you my story now*

Around the middle of my Ph.D., I noticed that:

- I was **copy-pasting** quite a lot of code from one project to another (*w/ several bugs*)

- The majority of my **most frequent errors** were about running experiments with the **wrong hyper-parameters configuration**

- I had a lot of models (variants included) to test and their **management** was becoming **cumbersome**

### An example

For one paper, I had to test a GNN:

- A basic model (B)

- A variant with an additional layer (B + layer)

- 3 variants of 'B + layer' with a specific regularization (B + layer + regs)

- All the above models had to be tested on 5 different datasets

- All the above models had to be tested on 2 different input configurations (I1, I2)

$\rightarrow $ *Because yes*

#### Totaling: 10 models per dataset $\rightarrow$ 50 models

Actually, I have also an LSTM baseline and a BERT model, so the total number of models was **150**.

$\rightarrow$ *Yes, I like to hurt myself*

### An Example (cont'd)

I was **confident** and started writing up model configurations in JSON format (*a simplified version*)

```
"dataset1_B_GNN_I1": {
        "rnn_encoding": false
        "representation_mlp_weights": [
                    64,
                    256,
                    512
                ]
        "merge_vocabularies": false,
        "build_embedding_matrix": true,
        "embedding_model_type": "glove",
        "input_dropout_rate": 0.4,
        "feature_class": "pos_text_graph_dep_adj_features",
        "is_directional": false,
        "clip_gradient": true,
        "embedding_dimension":  200,
        "l2_regularization": 0.0002,
        "max_grad_norm":  40,
        "gcn_info": {
                "0": {
                    "message_weights": [
                        64
                    ],
                    "aggregation_weights": [
                        512,
                        512
                    ],
                    "node_weights": [
                        64
                    ],
                    "pooling_weights": [
                        64,
                        1
                    ]
                }
            },
        "use_position_feature":  true
        "tokenizer_args": {
                "filters": "",
                "oov_token": 1
        },
        "weight_predictions": true,
        "add_gradient_noise": true,
        "dropout_rate": 0.4,
        "optimizer_args": {
                "learning_rate": 0.0002
        },
        "max_graph_nodes_limit": 140
    }
```

Eventually:

- My JSON configuration file ended up having **more than 8k lines** - barely readable (*not all models were yet listed!*)

- I could **hardly recall** (each day) each parameter type, allowed values, potential conflicts, etc...

- I was **still messing around** with wrong hyper-parameter settings (*my eyes @_@*)

- I was **wasting quite a lot of time** by just making errors and setting up the right experiment

### I want to be lazy!

I looked at my configuration hell and reasoned on the **limitations**...

- **Hardly** readable (*wanna collaborate? enjoy...*)

- **No typing** (*err, what was the accepted format of this parameter?*)

- Custom types (e.g., numpy.ndarray) had to be **converted**...

- **No** possibility to define configuration **constraints**

- **No** possibility to add **descriptions**

#### I'm fucking lazy...

- I don't want to **waste time** on writing my configuration in a different format

- I want to run **different** experiments with the **effort comparable to a mouse click**

- I want to **focus** on my research (*Nobody believes you, Fede!*)

### F*** you configuration files!

- F*** you JSON
- F*** you JSONL
- F*** YAML (*horrible*)
- F*** JSONNET (*kill me*)
- F*** TXT (*really?*)
- F*** CFG
- **BIG** F*** to command line arguments (*I'll never be friend*)

$\rightarrow$ I want write my configurations in **Python**!

### Why not some existing library?

Well, this is entirely due to my **personal experience** and the fact that I'm a very stupid user

- [Allennlp](https://allenai.org/allennlp): cool features but fucking hard to **customize** unless you are from allennlp

- [ParlAI](https://parl.ai/): cool if you have to do the three commands written in the **tutorial page**. Otherwise, kill yourself

- [Huggingface](https://huggingface.co/): super cool if you are working with transformers but **horrible configuration** format $\rightarrow$ you need an open tab on their documentation to **understand each hyper-parameter**

- Tensorflow/Torch/Keras: not a simple way to **define configurations** except for **flags**. You'll never have me!!!

#### Disclaimer: 

I'm saying these are bad libraries at all! It's just that I was not able to use them efficiently...

$\rightarrow$ Why can't I run my SVM, decision tree, LSTM, BERT using the **same configuration** and **training format**?

## Overview

### Inspiration

I wanted to **simplify** my life and remember a cool feature of some library (*I think it was allenlp...*) 

- You could write your model and **register it** so that you could run your experiment by commandline

This feature led me to the following conclusions:

1. **Separate** configuration from logic
2. **Register** configuration and logic separately to **quickly use them later** and organize your working environment

$\rightarrow$ Formally, I denote the configuration as ``Configuration`` and the logic as ``Component``.

### A visual depiction

A ``Component`` **is built** via its ``Configuration``.

<center>
<div>
<img src="Images/Lecture-5/conf_and_comp.png" width="1200"/>
</div>
</center>

## Binding

We are just associating the ``Configuration`` to its ``Component``.

<center>
<div>
<img src="Images/Lecture-5/conf_and_comp_bound.png" width="1200"/>
</div>
</center>

## Registration

We register the ``Configuration`` to remember it.

<center>
<div>
<img src="Images/Lecture-5/registration.png" width="1200"/>
</div>
</center>

## Registration (cont'd)

To remember a ``Configuration`` we simply define a ``RegistrationKey`` (*a compound dictionary key*)

<center>
<div>
<img src="Images/Lecture-5/registration_key.png" width="1200"/>
</div>
</center>

## Binding (cont'd)

<center>
<div>
<img src="Images/Lecture-5/binding_with_key.png" width="1200"/>
</div>
</center>

### To sum up

- We define our ``Component``
- We define its corresponding ``Configuration``
- We register it to the ``Registry`` with a ``RegistrationKey``
- We bind the ``Configuration`` to ``Component`` by using the configuration ``RegistrationKey``

Thus, the ``RegistrationKey`` of our ``Configuration`` allows to:
- Retrieve the registered ``Configuration`` from the ``Registry``
- Retrieve the ``Component`` that our registered ``Configuration`` is bounded to

#### That's it! If you get this, you get 99% of deasy-learning!

## Configuration and Component

*Let's see some examples!*

### Configuration parameters

A ``Configuration`` is comprised of ``Parameter`` objects

<center>
<div>
<img src="Images/Lecture-5/configuration.png" width="1200"/>
</div>
</center>

A ``Parameter`` is essentially a wrapper for each field of ``Configuration``

### Why ``Parameter``?

Essentially, ``Parameter`` is a useful wrapper for storing additional metadata

- type hints
- descriptions
- allowed value range
- possible variants of interest
- optional tags for quickly retrieving a certain subset of parameters
- ... (*more on this later*)

### Configuration class

In [None]:
class Configuration:
    
    def add(self, param: Parameter):
        ...
    
    def add_condition(self, condition: Callable[[], bool], name):
        ...
        
    def validate(self):
        ...
        
    def search(self, search_key, exact_match):
        ...
        
    def get_delta_copy(self, key_value_dict):
        ...
        
    @classmethod
    def get_default(cls):
        ...

### What does a ``Configuration`` do?

A ``Configuration`` is essentially an extension of a Python dictionary

- You can add ``Parameter``
- You can add conditions (i.e., callable functions) relating multiple ``Parameter``
- You can ``validate()`` your ``Configuration``: running all conditions to check for errors
- You can quickly search for ``Parameter``
- You can quickly get a delta copy of your ``Configuration`` via a simple key-value dictionary
- You can specify standard template (``get_default()``) of your ``Configuration`` $\rightarrow$ a readable and detailed specification!

### An example

In [None]:
class DataLoaderConfig(Configuration):

    @classmethod
    def get_default(
            cls
    ):
        config = super().get_default()

        config.add_short(name='name',
                         type_hint=str,
                         description="Unique dataset identifier",
                         is_required=True)
        config.add_short(name='has_test_split_only',
                         value=False,
                         type_hint=bool,
                         description="Whether DataLoader has test split only or not")
        config.add_short(name='has_val_split',
                         value=True,
                         type_hint=bool,
                         description="Whether DataLoader has a val split or not")
        config.add_short(name='has_test_split',
                         value=True,
                         type_hint=bool,
                         description="Whether DataLoader has a test split or not")

        return config

### Another example

In [None]:
class ConfigA(Configuration):

    @classmethod
    def get_default(
            cls
    ) -> ConfigA:
        config = super().get_default()

        config.add_short(name='param_1',
                         value=True,
                         type_hint=bool)
        config.add_short(name='param_2',
                         value=True,
                         type_hint=bool)

        config.add_condition(condition=lambda p: p.param_1 == p.param_2)

        return config

### Composition

A ``Configuration`` can also include another (or multiple) ``Configuration``.

In [None]:
class ParentConfig(Configuration):

    @classmethod
    def get_default(
            cls
    ):
        config = super().get_default()

        config.add_short(name='param_1',
                         value=True,
                         type_hint=bool)
        config.add_short(name='param_2',
                         value=False, t
                         ype_hint=bool)
        config.add_short(name='child_A',
                         value=RegistrationKey(name='config_a',
                                               namespace='testing'),   # <--- This assumes that ConfigA is registered
                         is_registration=True)   # <--- metadata
        return config
    
class ConfigA(Configuration):
    ...

This allows us to define complex ``Component`` like

- A data-loading pipeline
- A data pre-processing pipeline
- A training routine
- ...

### An example

In [None]:
class ProcessorPipelineConfig(Configuration):

    @classmethod
    def get_default(
            cls
    ):
        config = super().get_default()

        config.add_short(name='processors',
                         type_hint=List[RegistrationKey],
                         description='List of processors to be executed in a sequence fashion',
                         is_required=True,
                         is_registration=True)

        return config

In [None]:
class ProcessorPipeline(Component):
    
    def run(
            self,
            data: Optional[FieldDict] = None,
            is_training_data: bool = False
    ):
        for processor in self.processors:
            data = processor.run(data=data,
                                 is_training_data=is_training_data)
        return data

### Variants

In many cases, you may need multiple ``Configuration`` instances to define different scenarios.

- Work at ``Parameter`` level to specify variants
- Work at ``Configuration`` level to specify explicit configuration variants

### ``Parameter`` level variants

Simply list possible values via ``Parameter.variants`` field.

In [None]:
class ConfigA(Configuration):

    @classmethod
    def get_default(
            cls
    ) -> ConfigA:
        config = super().get_default()

        config.add_short(name='param_1',
                         value=True,
                         type_hint=bool,
                         variants=[False, True])    # <---
        config.add_short(name='param_2',
                         value=True,
                         type_hint=bool,
                         variants=[False, True])    # <---

        config.add_condition(condition=lambda p: p.param_1 == p.param_2)

        return config

### Configuration level variants

We can explicitly define new ``Configuration`` via two decorators: ``supports_variants``, ``add_variant``

In [None]:
@supports_variants
class ConfigA(Configuration):

    @classmethod
    def get_default(
            cls
    ) -> ConfigA:
        config = super().get_default()

        config.add_short(name='param_1',
                         value=True,
                         type_hint=bool,
                         variants=[False, True])
        config.add_short(name='param_2',
                         value=True,
                         type_hint=bool,
                         variants=[False, True])

        config.add_condition(condition=lambda p: p.param_1 == p.param_2)

        return config
    
    @classmethod
    @add_variant(variant_name='variant1')
    def variant1(cls):
        config = cls.get_default()
        config.param_1 = False
        config.param_2 = False
        return config
    
    @classmethod
    @add_variant(variant_name='variant2')
    def variant1(cls):
        config = cls.get_default()
        config.param_1 = True
        config.param_2 = True
        return config

### Registering variants

Both of them can be automatically considered during registration to quickly take into account variants.

In [None]:
Registry.register_and_bind_configuration_variants(configuration_class=ConfigA,
                                                  component_class=ComponentA,
                                                  name='config_a',
                                                  namespace='testing',
                                                  allow_parameters_variants=True)   # <---

Registration variants is controlled by ``allow_parameters_variants``

- [True] Ignore any ``add_variant`` declarations and look for ``Parameter`` level variants
- [False] Just consider ``add_variant`` declarations

### Variants and Nesting

Registering variants is a powerful tool since it supports ``Configuration`` nesting!

Consider your complex routine: data-loading, pre-processing, model training

- You can write it as a combination of ``Configuration`` and ``Component`` classes
- You can define variants
- You can specify conditions $\rightarrow$ only valid variants are considered!
- You can register all possible valid variant combinations in one shot!

## An Example

In [None]:
class ConfigA(Configuration):

    @classmethod
    def get_default(
            cls
    ) -> ConfigA:
        config = super().get_default()
        config.add_short(name='param_1', value=True, type_hint=bool, variants=[False, True])
        config.add_short(name='child', value=RegistrationKey(name='config_b',
                                                             namespace='testing'), is_registration=True)
        return config


class ConfigB(Configuration):

    @classmethod
    def get_default(
            cls
    ) -> ConfigB:
        config = super().get_default()
        config.add_short(name='param_1', value=1, type_hint=int, variants=[1, 2])
        config.add_short(name='child', value=RegistrationKey(name='config_c',
                                                             namespace='testing'), is_registration=True)
        return config


class ConfigC(Configuration):

    @classmethod
    def get_default(
            cls
    ) -> ConfigC:
        config = super().get_default()
        config.add_short(name='param_1', value=False, type_hint=bool, variants=[False, True])
        return config


In [None]:
if __name__ == '__main__':
    Registry.register_and_bind(configuration_class=ConfigB,
                               configuration_constructor=Configuration.get_default,
                               component_class=Component,
                               name='config_b',
                               namespace='testing')
    Registry.register_and_bind(configuration_class=ConfigC,
                               configuration_constructor=Configuration.get_default,
                               component_class=Component,
                               name='config_c',
                               namespace='testing')

    for config_regr_key in Registry.register_and_bind_configuration_variants(configuration_class=ConfigA,
                                                                             component_class=Component,
                                                                             name='config_a',
                                                                             namespace='testing',
                                                                             allow_parameters_variants=True):
        print(config_regr_key)

In [None]:
name:config_a--tags:['child.child.param_1=False', 'child.param_1=1', 'param_1=False']--namespace:testing
name:config_a--tags:['child.child.param_1=False', 'child.param_1=1', 'param_1=True']--namespace:testing
name:config_a--tags:['child.child.param_1=False', 'child.param_1=2', 'param_1=False']--namespace:testing
name:config_a--tags:['child.child.param_1=False', 'child.param_1=2', 'param_1=True']--namespace:testing
name:config_a--tags:['child.child.param_1=True', 'child.param_1=1', 'param_1=False']--namespace:testing
name:config_a--tags:['child.child.param_1=True', 'child.param_1=1', 'param_1=True']--namespace:testing
name:config_a--tags:['child.child.param_1=True', 'child.param_1=2', 'param_1=False']--namespace:testing
name:config_a--tags:['child.child.param_1=True', 'child.param_1=2', 'param_1=True']--namespace:testing

### Calibration

### Component

## Registration

### Registration format

### Commands

## Training a SVM model with deasy-learning

### Data loading

### Data pre-processing

### Modeling

### Routine

## deasy-learning upcoming structure

## Evaluation questionnaire

## Motivational outro

# Any questions?

<center>
<div>
<img src="Images/Lecture-1/jojo-arrivederci.gif" width="1200" alt='JOJO_arrivederci'/>
</div>
</center>