## The end (?)

We've reached **the end** of this short Ph.D. course about reproducible and open research.

### What to do?

## Cinnamon

In this **bonus** lecture, I'm going to briefly show you some functionalities of my small **custom library**.

#### Disclaimer

- What you are going to see is not **groundbreaking** and probably well-known since 50 years. I just couldn't find anything suitable...but I'm a picky person time to time...

- The library is still a prototype $\rightarrow$ still, it somehow does its job

- **Ping me** if you are interested! (*don't you have anything else to do?*)

## What are we going to cover?

- ``cinnamon``
    - Motivational intro $\rightarrow$ *why writing a custom library*
    - Overview
    - Configuration and Component
    - Registration
    - A simple showcase

- Motivational outro

## Motivational Intro

*Shouldn't you be doing research instead of jamming with your **mechanical** keyboard to write buggy code?*

### Why cinnamon

*Take a seat, my friend, for I'm telling you my story now*

Around the middle of my Ph.D., I noticed that:

- I was **copy-pasting** quite a lot of code from one project to another (*w/ several bugs*)

- The majority of my **most frequent errors** were about running experiments with the **wrong hyper-parameters configuration**

- I had a lot of models (variants included) to test and their **management** was becoming **cumbersome**

### An example

For one paper, I had to test a GNN:

- A basic model (**B**)

- A variant with an additional layer (**B + layer**)

- **3** variants of **B + layer** with a specific regularization (**B + layer + regs**)

- **All** the above models had to be tested on **5 different datasets**

- **All** the above models had to be tested on **2** different **input** configurations (**I1**, **I2**)

$\rightarrow $ *Because yes*

#### Totaling: 10 models per dataset $\rightarrow$ 50 models

Actually, I have also an LSTM baseline and a BERT model, so the total number of models was **150**.

$\rightarrow$ *Yes, I like to hurt myself*

### An Example (cont'd)

I was **confident** and started writing up model configurations in JSON format (*a simplified version*)

```
"dataset1_B_GNN_I1": {
        "rnn_encoding": false
        "representation_mlp_weights": [
                    64,
                    256,
                    512
                ]
        "merge_vocabularies": false,
        "build_embedding_matrix": true,
        "embedding_model_type": "glove",
        "input_dropout_rate": 0.4,
        "feature_class": "pos_text_graph_dep_adj_features",
        "is_directional": false,
        "clip_gradient": true,
        "embedding_dimension":  200,
        "l2_regularization": 0.0002,
        "max_grad_norm":  40,
        "gcn_info": {...}
        "use_position_feature":  true
        "tokenizer_args": {
                "filters": "",
                "oov_token": "<UNK>"
        },
        "weight_predictions": true,
        "add_gradient_noise": true,
        "dropout_rate": 0.4,
        "optimizer_args": {
                "learning_rate": 0.0002
        },
        "max_graph_nodes_limit": 140,
        ...
    }
```

### Eventually...

- My JSON configuration file ended up having **more than 8k lines** - barely readable (*not all models were yet listed!*)

- I could **hardly recall** (each day) each parameter type, allowed values, potential conflicts, etc...

- I was **still messing around** with wrong hyper-parameter settings (*my eyes @_@*)

- I was **wasting quite a lot of time** by just making errors and setting up the right experiment

### I want to be lazy!

I looked at my configuration hell and reasoned on the **limitations**...

- **Hardly** readable (*wanna collaborate? enjoy...*)

- **No typing** (*err, what was the accepted format of this parameter?*)

- Custom types (e.g., numpy.ndarray) had to be **converted**...

- **No** possibility to define configuration **constraints**

- **No** possibility to add **descriptions**

### I'm fucking lazy...

- I don't want to **waste time** on writing my configuration in a different format

- I want to run **different** experiments with an **effort comparable to a mouse click**

- I want to **focus** on my research (*Nobody believes you, Fede!*)

### F\*\*\* you configuration files!

- F\*\*\* you JSON
- F\*\*\* you JSONL
- F\*\*\* YAML (*horrible*)
- F\*\*\* JSONNET (*kill me*)
- F\*\*\* TXT (*really?*)
- F\*\*\* CFG
- F\*\*\* to command line arguments (*I'll never be your friend*)

$\rightarrow$ I want write my configurations in **Python**!

### Why not some existing library?

Well, this is entirely due to my **personal experience** and the fact that I'm a very stupid user

- [Allennlp](https://allenai.org/allennlp): cool features but fucking **hard** to **customize** unless you are from allennlp

- [ParlAI](https://parl.ai/): cool if you have to do the three commands written in the **tutorial page**. Otherwise, kill yourself!

- [Huggingface](https://huggingface.co/): super cool if you are working with transformers but **horrible configuration** format $\rightarrow$ you need an open tab on their documentation to **understand each hyper-parameter**

- Tensorflow/Torch/Keras: not a simple way to **define configurations** except for **flags**. You'll never have me!!!

### Disclaimer 

I'm **NOT** saying these are bad libraries at all! It's just that I was not able to use them to solve my issues...

$\rightarrow$ Why can't I run my SVM, decision tree, LSTM, BERT models using the **same configuration** and **training format**?

## A more concrete example

Consider a code logic that has to load some data

In [None]:
class DataLoader:
    
    def __init__(self, folder_name):
        self.folder_name = folder_name
    
    def load(...):
        data = read_from_file(folder_name=self.folder_name)
        return data

The data loader reads from a file located according to ``self.folder_name`` value.

If ``self.folder_name`` has multiple values, we can use the same code logic to load data from different folders.

Hypothetically, we would define multiple data loaders:

In [None]:
data_loader_1 = DataLoader(folder_name='*folder_name1*')
data_loader_2 = DataLoader(folder_name='*folder_name2*')
...

Now, if the data loader code block is used in a project, we require some code modularity to avoid defining several versions of the same script.

One common solution is to rely on configuration files (e.g., JSON file).

In [None]:
{
  'data_loader' : {
     'folder_name': '*folder_name1*'
  }
}

The main script is modified to load our configuration file so that each code logic is properly initialized.

## Cinnamon

Cinnamon keeps this <configuration, code logic> dichotomy where a configuration is written in **plain Python code**!

In [None]:
from cinnamon.configuration import Configuration

class DataLoaderConfig(Configuration):

   @classmethod
   def default(cls):
       config = super().default()
       config.add(name='folder_name',
                  type_hint=str,
                  variants=['*folder_name1*', '*folder_name2*', ...],
                  description="Folder where to look for data files.")
    return config

In [None]:
from cinnamon.component import Component

class DataLoader(Component):
    
    def __init__(self, folder_name):
        self.folder_name = folder_name
        
    def load(...):
        data = read_from_file(folder_name=self.folder_name)
        return data

Cinnamon allows high-level configuration definition (constraints, type-checking, description, variants, etc...)

To quickly load any instance of our data loader, we have **two steps**

### Register

Register the configuration via a **registration key** as <name, tags, namespace> tuple.

In [None]:
Registry.register_configuration(config_class=DataLoaderConfig,
                                component_class=DataLoader,
                                name='data_loader',
                                tags={'example'},
                                namespace='showcase')

### Build

Build the ``DataLoader`` via the used **registration key**.

In [None]:
data_loader = DataLoader.build_component(name='data_loader',
                                        tags={'example'},
                                        namespace='showcase')

variant = DataLoader.build_component(name='data_loader',
                                     tags={'example', 
                                           'folder_name=*folder_name1*'},
                                     namespace='showcase')

## Overview

That's it! This is all you need to use ``cinnamon``!

Let's talk more about the details..

### Inspiration

I remembered a cool feature of some library (*I think it was allenlp...*) 

- You could write your model and **register it** so that you could run your experiment by commandline (*aaargh...*)

This feature led me to the following conclusions:

1. **Separate** configuration from logic
2. **Register** configuration and logic separately to **quickly use them later** and organize your working environment

$\rightarrow$ Formally, I denote the configuration as ``Configuration`` and the logic as ``Component``.

### A visual depiction

A ``Component`` **is built** via its ``Configuration``.

<center>
<div>
<img src="../Images/Lecture-8/conf_and_comp.png" width="1200"/>
</div>
</center>

## Binding

We **bind** the ``Configuration`` to its ``Component``.

<center>
<div>
<img src="../Images/Lecture-8/conf_and_comp_bound.png" width="1200"/>
</div>
</center>

## Registration

We **register** the ``Configuration`` to remember it.

<center>
<div>
<img src="../Images/Lecture-8/registration.png" width="1000"/>
</div>
</center>

## Registration (cont'd)

To remember a ``Configuration`` we simply define a ``RegistrationKey`` (*a compound dictionary key*)

<br/>

<center>
<div>
<img src="../Images/Lecture-8/registration_with_key.png" width="1200"/>
</div>
</center>

### To sum up

- We **define** our ``Component``
- We **define** its corresponding ``Configuration`` (even more than one)
- We **register** the ``Configuration`` to the ``Registry`` with a ``RegistrationKey``
- We **bind** the ``Configuration`` to ``Component`` by using the configuration ``RegistrationKey``

Thus, the ``RegistrationKey`` of our ``Configuration`` allows to:
- **Retrieve** the registered ``Configuration`` from the ``Registry``
- **Retrieve** the ``Component`` that our registered ``Configuration`` is bounded to

### That's it! If you get this, you get 99% of cinnamon!

*Happy weekend :)!*

## Configuration and Component

### Configuration parameters

A ``Configuration`` is comprised of ``Parameter`` objects

<center>
<div>
<img src="../Images/Lecture-8/configuration.png" width="1200"/>
</div>
</center>

A ``Parameter`` is essentially a wrapper for each attribute of ``Configuration``

In [None]:
from cinnamon.configuration import Configuration

class DataLoaderConfig(Configuration):

   @classmethod
   def default(cls):
       config = super().default()
       # This is a Parameter behidn the curtains...
       config.add(name='folder_name',
                  type_hint=str,
                  variants=['*folder_name1*', '*folder_name2*', ...],
                  description="Folder where to look for data files.")
    return config

### Why ``Parameter``?

Essentially, ``Parameter`` is a useful wrapper for storing additional metadata

- type hints
- descriptions
- allowed value range
- possible variants of interest
- optional tags for quickly retrieving a certain subset of parameters
- ...

### Configuration class

In [None]:
class Configuration:
    
    def add(self, ...):
        ...
    
    def add_condition(self, ...):
        ...
        
    def validate(self, ...):
        ...
        
    def delta_copy(self, **kwargs):
        ...
        
    @classmethod
    def default(cls):
        ...

### What does a ``Configuration`` do?

A ``Configuration`` is essentially an extension of a Python dictionary

- You can add ``Parameter``
- You can add **conditions** (i.e., callable functions) relating multiple ``Parameter``
- You can ``validate()`` your ``Configuration``: running all conditions to check for errors
- You can **quickly search** for ``Parameter``
- You can **quickly get a delta copy** of your ``Configuration`` via a simple key-value dictionary
- You can **specify the template** (``default()``) of your ``Configuration`` $\rightarrow$ a readable and detailed specification!
- Lastly, it is a Python object, you can define ``Configuration`` subclasses via **inheritance**!

### An example

In [None]:
class ConfigA(Configuration):

    @classmethod
    def get_default(
            cls
    ) -> ConfigA:
        config = super().default()

        config.add(name='x',
                   value=True,
                   type_hint=bool)
        config.add(name='y',
                   value=True,
                   type_hint=bool)

        config.add_condition(condition=lambda conf: conf.x == conf.y,
                             name='equal_bools')

        return config

### Nesting

A ``Configuration`` can also include another (or multiple) ``Configuration``.

In [None]:
class ParentConfig(Configuration):

    @classmethod
    def default(
            cls
    ):
        config = super().default()

        config.add(name='x',
                   value=True,
                   type_hint=bool)
        config.add(name='y',
                   value=False,
                   type_hint=bool)
        # This assumes that ConfigA is registered
        config.add(name='child',
                   value=RegistrationKey(name='confing',
                                         namespace='testing')
                   )
        return config
    
class ConfigA(Configuration):
    ...

This allows us to define complex ``Component`` like

- A data-loading pipeline
- A data pre-processing pipeline
- A training routine
- ...

### An example

In [None]:
class ProcessorPipelineConfig(Configuration):

    @classmethod
    def default(
            cls
    ):
        config = super().default()

        config.add(name='processors',
                   type_hint=List[RegistrationKey],
                   description='Processors to be executed',
                   )

        return config

In [None]:
class Processor(Component):
    
    def run(self,
            data: Any,
            is_training_data: bool = False):
        ...

class ProcessorPipeline(Component):
    
    def __init__(
        self,
        processors
    ):
        self.processors = processors
    
    def run(
            self,
            data: Any,
            is_training_data: bool = False
    ):
        for processor in self.processors:
            data = processor.run(data=data,
                                 is_training_data=is_training_data)
        return data

### Variants

In many cases, you may need **multiple** ``Configuration`` instances to define different scenarios.

$\rightarrow$ we can specify ``Configuration`` **variants** easily!

### ``Parameter`` level variants

Simply list possible values via ``Parameter.variants`` field.

In [None]:
class ConfigA(Configuration):

    @classmethod
    def default(
            cls
    ) -> ConfigA:
        config = super().default()

        config.add(name='x',
                   value=True,
                   type_hint=bool,
                   variants=[False, True])    # <---
        config.add(name='y',
                   value=True,
                   type_hint=bool,
                   variants=[False, True])    # <---

        config.add_condition(condition=lambda conf: conf.x == conf.y)

        return config

### Configuration level variants

We can explicitly define new ``Configuration`` via the ``register_method`` decorator.

In [None]:
class ConfigA(Configuration):

    @classmethod
    def default(
            cls
    ) -> ConfigA:
        config = super().default()

        config.add(name='x',
                   value=True,
                   type_hint=bool)
        config.add(name='y',
                   value=True,
                   type_hint=bool)

        config.add_condition(condition=lambda conf: conf.x == conf.y)

        return config
    
    @classmethod
    @register_method(name='config',
                     namespace='showcase',
                     tags={'variant1'})
    def variant1(cls):
        config = cls.default()
        config.x = False
        config.y = False
        return config

### Variants and Nesting

Variants are a powerful tool since they are **compatible** with ``Configuration`` nesting!

Consider your complex pipeline: data-loading, pre-processing, model training, etc...

- You can write it as a combination of ``Configuration`` and ``Component`` classes
- You can define variants
- You can specify conditions $\rightarrow$ only **valid variants** are considered!
- You can register all possible valid variant combinations in one shot!

## An Example

In [None]:
class ConfigA(Configuration):

    @classmethod
    def default(
            cls
    ) -> ConfigA:
        config = super().default()
        config.add(name='param_1',
                   value=True,
                   variants=[False, True])
        config.add(name='child', 
                   value=RegistrationKey(name='config_b',
                                         namespace='testing'))
        return config

In [None]:
class ConfigB(Configuration):

    @classmethod
    def default(
            cls
    ) -> ConfigB:
        config = super().default()
        config.add(name='param_1', value=1, variants=[1, 2])
        config.add(name='child', 
                   value=RegistrationKey(name='config_c',
                                         namespace='testing'))
        return config

In [None]:
class ConfigC(Configuration):

    @classmethod
    def default(
            cls
    ) -> ConfigC:
        config = super().default()
        config.add(name='param_1', value=False, variants=[False, True])
        return config

In [None]:
if __name__ == '__main__':
    Registry.register_configuration(config_class=ConfigB,
                                    name='config_b',
                                    namespace='testing')
    Registry.register_configuration(configuration_class=ConfigC,
                                    name='config_c',
                                    namespace='testing')
    # We'll see later what this does...
    valid_keys, invalid_keys = Registry.dag_resolution()
    print(valid_keys)

```python
name:config_a--tags:['child.child.param_1=False', 'child.param_1=1', 'param_1=False']--namespace:testing
name:config_a--tags:['child.child.param_1=False', 'child.param_1=1', 'param_1=True']--namespace:testing
name:config_a--tags:['child.child.param_1=False', 'child.param_1=2', 'param_1=False']--namespace:testing
name:config_a--tags:['child.child.param_1=False', 'child.param_1=2', 'param_1=True']--namespace:testing
name:config_a--tags:['child.child.param_1=True', 'child.param_1=1', 'param_1=False']--namespace:testing
name:config_a--tags:['child.child.param_1=True', 'child.param_1=1', 'param_1=True']--namespace:testing
name:config_a--tags:['child.child.param_1=True', 'child.param_1=2', 'param_1=False']--namespace:testing
name:config_a--tags:['child.child.param_1=True', 'child.param_1=2', 'param_1=True']--namespace:testing
```

### Component

The ``Component`` is a simple interface that **doesn't define** any specific **behaviour**

- I really like freedom of choice
- You can define your preferred APIs since **you decide** how ``Component`` are nested

In [None]:
class Component:
        
    @classmethod
    def build_component(cls, ...):
        ...

### That's all you have to know about ``Component``!

I **don't** want to setup **yet another restrictive** Python library with all its interfaces...

- Define any kind of ``Component`` you want
- Wrap existing code logic into ``Component``
- Simply re-map your configuration to ``Configuration``
- Done!

## Registration

*How does it work?*

### Registration format

Right now, registration has **very few** requirements and dynamics

- The ``Registry`` is yet a simple Python dictionary: ``RegistrationKey: ConfigurationInfo``
- You need to **manually register** and **bind** ``Configuration`` via simple APIs
- You need to wrap configurations methods via ``register_method`` or custom functions via ``register``

## An example

Suppose the following **recommended** code organization:

```
    project_folder
        |
        |__ configurations
        |        |__ __init__.py
        |        |__ data_loader.py
        |
        |__ components
        |       |__ __init__.py
        |       |__ data_loader.py
        |
        |__ my_script.py
```

In [None]:
class ExampleLoaderConfig:

    # The registration is automatically done by the Registry!
    @classmethod
    @register_method(name='data_loader',
                     tags={'custom'},
                     namespace='showcase',
                     component_class=ExampleLoader)
    def default(
            cls
    ):
        config = super().default()

        config.add(name='data_url',
                   value='http://ai.stanford.edu/...',
                   description='URL to dataset archive file')
        config.add(name='download_directory',
                   value='imdb',
                   description='Folder the archive file is downloaded')
        config.add(name='download_filename',
                   value='imdb.tar.gz'
                   description='Name of the archive file')
        return config

In [None]:
class ExampleLoaderConfig:

    # The registration is automatically done by the Registry!
    @classmethod
    def default(
            cls
    ):
        config = super().default()

        config.add(name='data_url',
                   value='http://ai.stanford.edu/...',
                   description='URL to dataset archive file')
        config.add(name='download_directory',
                   value='imdb',
                   description='Folder the archive file is downloaded')
        config.add(name='download_filename',
                   value='imdb.tar.gz'
                   description='Name of the archive file')
        return config
    
@register
def register_loaders():
    Registry.register_configuration(name='data_loader',
                                    tags={'custom'},
                                    namespace='showcase',
                                    # binding!
                                    component_class=ExampleLoader)

## How does the Registration work?

- You register configurations via shown APIs (``register_method``, ``register``, ``Registry.register_configuration``)
- The ``Registry`` takes notes of all registration APIs and progressively builds a dependency DAG
- The DAG allows to **dynamically** resolve conflicts
- The DAG allows for **dynamically** expanding configuration variants
- The DAG lifts from registration **ordering**!

### DAG Resolution

Once we are done with registration, we issue the resolution of the DAG to check its correctness

valid_keys, invalid_keys = Registry.dag_resolution()

### Configuration validation

DAG resolution also validates ``Configuration`` instances to prune invalid configurations!

### Commands

That's **all you need to setup** since ``cinnamon`` offers some **high-level APIs** to deal with registrations

In [None]:
# my_script.py
from pathlib import Path

from cinnamon import Registry

if __name__ == '__main__':
    directory = Path(__file__).parent.parent.resolve()
    
    # This calls DAG resolution
    Registry.setup(directory=directory)
    
    # The rest of your code!
    loader = CustomLoader.build_component(name='data_loader',
                                          tags={'custom'},
                                          namespace='showcase')
    ...

#### Behind the curtains

The ``Registry`` is looking for all registration APIs below ``directory``

### External directories and namespaces

What if you want to use code from another project?

In [None]:
# my_script.py
from pathlib import Path

from cinnamon import Registry

if __name__ == '__main__':
    directory = Path(__file__).parent.parent.resolve()
    external_dir = Path('path', 'to', 'external', 'dir')
    
    # This calls DAG resolution
    Registry.setup(directory=directory, 
                   external_directories=[external_dir])
    
    # The rest of your code!
    loader = Registry.build_component(name='data_loader',
                                      namespace='external')
    ...

#### Limitations

Registration is **always done** at **runtime**!

Thus, **always** begin your script with ``setup(...)``

## How to check dynamically created RegistrationKeys?

``cinnamon`` also provides a few commands for checking keys and running components via command line!

### cmn-setup

The ``cmn-setup`` command is the console script version of ``Registry.setup()``.

In [None]:
cmn-setup --dir *main-directory* --ext *ext-directory-1* ...

<center>
<div>
<img src="../Images/Lecture-8/cmn-setup.png" width="1400"/>
</div>
</center>

You also get all valid and invalid configuration keys stored in .csv files for better readability.

### cmn-run

The ``cmn-run`` command allows building and executing ``Component`` given a ``RegistrationKey``.

In [None]:
class CustomRunnable(cinnamon.component.RunnableComponent):

    def __init__(self, x, y):
        self.x = x
        self.y = y

    def run(
        config: cinnamon.configuration.Configuration
    ):
        print(f"Running this component with x={x} and y={y}")
        print(f'The configuration of the component is {config})

<center>
<div>
<img src="../Images/Lecture-8/cmn-run.png" width="1400"/>
</div>
</center>

## Training a SVM model with deasy-learning

Enough showcasing! Let's see some practical example (*still a showcase ehehe...*)

#### Steps

- Data loading: IMDB dataset
- Preprocessor pipeline: some text normalization and tf-idf encoding
- Model: a SVM
- Routine: a train and test routine

### Data loading (Component)

Let's define a base ``DataLoader`` component.

In [None]:
class IMBDLoader(RunnableComponent):
    
    def __init__(self, 
                 download_directory: Path,
                 download_filename: str,
                 dataset_name: str,
                 download_url: str):
        ...
        
    def download(self, ...):
        ...
        
    def load_data(self, ...) -> pd.DataFrame:
        ...
        
    def get_splits(self, ...):
        ...
        
    def run(...):
        return self.load_data()

In [None]:
class IMDBLoaderConfig(Configuration):

    @classmethod
    @register_method(name='data_loader',
                     tags={'imdb'},
                     namespace='examples',
                     component_class=IMDBLoader)
    def default(
            cls
    ):
        config = super().default()

        config.add(name='download_directory',
                   value=Path('...'),
                   description='Folder the archive file is downloaded')
        config.add(name='download_filename',
                   value='imdb.tar.gz',
                   description='Name of the archive file')
        config.add(name='dataset_name',
                   value='dataset.csv',
                   description='.csv filename')
        config.add(name='download_url',
                   value='http://ai.stanford.edu/...',
                   description='URL to dataset archive file')

        return config

### Data loading (Testing)

Let's test our ``ExampleDataLoader``!

In [None]:
from pathlib import Path

from cinnamon.registry import Registry
from components.data_loader import IMDBLoader

if __name__ == '__main__':
    """
    In this demo script, we retrieve and build our IMDB data loader.
    Once built, we run the data loader to load the IMDB dataset
    and print it for visualization purposes.
    """

    directory = Path(__file__).parent.parent.resolve()
    Registry.setup(directory=directory)

    loader = IMDBLoader.build_component(name='data_loader',
                                        tags={'imdb'},
                                        namespace='examples')
    df = loader.load_data()
    print(df)

### Data pre-processing (Component)

We now define how to convert our inputs to be digested by our SVM model.

In [None]:
class TfIdfProcessor(Component):

    def __init__(
            self,
            **kwargs
    ):
        self.vectorizer = TfidfVectorizer(**kwargs)

    def process(
            self,
            data: Optional[pd.DataFrame],
            is_training_data: bool = False,
    ) -> Optional[Any]:
        if data is None:
            return data

        if is_training_data:
            self.vectorizer.fit(data.x.values)

        return self.vectorizer.transform(data.x.values)

In [None]:
class TfIdfProcessorConfig(Configuration):

    @classmethod
    @register_method(name='processor',
                     tags={'tf-idf'},
                     namespace='examples',
                     component_class=TfIdfProcessor)
    def default(
            cls
    ):
        config = super().default()

        config.add(name='ngram_range',
                   value=(1, 1),
                   type_hint=Any,
                   description='Vectorizer ngram_range')

        return config

In [None]:
class LabelProcessor(Component):

    def __init__(
            self
    ):
        self.label_encoder = LabelEncoder()

    def process(
            self,
            data: Optional[pd.DataFrame],
            is_training_data: bool = False
    ) -> Optional[Any]:
        if data is None:
            return data

        labels = data.y.values
        if is_training_data:
            self.label_encoder.fit(labels)

        return self.label_encoder.transform(labels)

In [None]:
@register
def register_processors():
    Registry.register_configuration(config_class=Configuration,
                                    component_class=LabelProcessor,
                                    name='processor',
                                    tags={'label'},
                                    namespace='examples')

### Modeling (Component)

We are now ready to define our SVM ``Component`` wrapper!

In [None]:
class SVCModel(Component):

    def __init__(self, ...):
        ...

    def fit(self, x_train: Any, y_train: Any,
            x_val: Any = None, y_val: Any = None
    ):
        ...

    def evaluate(self, x: Any, y: Any
    ) -> Dict[str, float]:
        ...

    def predict(self, x: Any
    ) -> Any:
        ...

In [None]:
class SVCModelConfig(Configuration):

    @classmethod
    @register_method(name='model',
                     tags={'svc'},
                     namespace='examples',
                     component_class=SVCModel)
    def default(
            cls
    ):
        config = super().default()

        config.add(name='C',
                   value=1.0,
                   description='C parameter of SVC')
        config.add(name='kernel',
                   value='linear',
                   description='The kernel of the SVC')
        config.add(name='class_weight',
                   value='balanced',
                   description='Technique for class imbalance')

        return config

### Benchmark

The ``Benchmark`` is our task pipeline: from data loading to model training and evaluation

In [None]:
class SVCBenchmark(RunnableComponent):

    def __init__(self,
            data_loader: IMDBLoader,
            model: SVCModel,
            text_processor: TfIdfProcessor,
            label_processor: LabelProcessor):
        ...

    def run(self, ...):
        train_df, val_df, test_df = self.data_loader.get_splits()

        x_train = self.text_processor.process(data=train_df)
        y_train = self.label_processor.process(data=train_df)

        x_val = self.text_processor.process(data=val_df)
        y_val = self.label_processor.process(data=val_df)

        x_test = self.text_processor.process(data=test_df)
        y_test = self.label_processor.process(data=test_df)

        train_info, val_info = self.model.fit(x_train=x_train, 
                                              y_train=y_train,
                                              x_val=x_val, 
                                              y_val=y_val)
        test_info = self.model.evaluate(x=x_test, y=y_test)

In [None]:
class SVCBenchmarkConfig(Configuration):

    @classmethod
    @register_method(name='benchmark',
                     tags={'svc'},
                     namespace='examples',
                     component_class=SVCBenchmark)
    def default(cls):
        config = super().default()

        config.add(name='data_loader',
                   value=RegistrationKey(name='data_loader',
                                         tags={'imdb'},
                                         namespace='examples'))
        config.add(name='text_processor',
                   value=RegistrationKey(name='processor',
                                         tags={'tf-idf'},
                                         namespace='examples'))
        config.add(name='label_processor',
                   value=RegistrationKey(name='processor',
                                         tags={'label'},
                                         namespace='examples'))
        config.add(name='model',
                   value=RegistrationKey(name='model',
                                         tags={'svc'},
                                         namespace='examples'))
        return config

### Benchmark (Testing)

Let's test our **whole** pipeline in a **single shot**!

<center>
<div>
<img src="../Images/Lecture-8/benchmark-I.png" width="1400"/>
</div>
</center>

<center>
<div>
<img src="../Images/Lecture-8/benchmark-II.png" width="1400"/>
</div>
</center>

## The End (really!)

- A quick overview of ``cinnamon``, yet a pretty exhaustive one! $\rightarrow$ simplicity!

- A ``Configuration`` stores all your hyper-parameters

- A ``Component`` defines the core logic

- The ``Registry`` stores ``Configuration`` to ``Component`` bindings to de-couple them for quick re-use and readability