<a href="https://colab.research.google.com/github/rama100/python-notebooks/blob/main/storing_related_variables.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# To store related variables (e.g., hyperparameter configs), don't use dictionaries

**Author**: [Rama Ramakrishnan](http://ramakrishnan.com)

## Introduction


Sometimes we want to keep a group of variables together.

A common use-case in Machine Learning/Deep Learning is storing and accessing 
configuration settings (a.k.a. "configs") when training neural networks e.g.,

```
num_layers = 5
num_hidden_units = 3
activation = 'relu'
dropout = 0.1
```

Python dictionaries are a natural way to store these name-value pairs.

In [86]:
config = dict(num_layers = 5,
              num_hidden_units = 3,
              activation = 'relu',
              dropout = 0.1)

To access the info stored in `config`, we can do the usual thing.

In [89]:
config['num_layers']

5

This works but:
* having to type those quotes (single or double) is annoying.
* **tab-completion** doesn't work.

Both these annoyances are solved if we can use object.attribute **dot** notation e.g., ```config.num_layers``` instead of ```config['num_layers']```



There are many ways to do this.

Here are three that I am familiar with:
* use a `class`
* use a `namedtuple`
* use a `dataclass`

Let's look at a quick example of how each can be used.

## Solution 1: Use a `class`

Defining a simple `class` and making each variable an attribute is probably the first thing that comes to mind, if you are familiar with Object-Oriented Programming.

In [90]:
class Config:
    def __init__(self, num_layers, num_hidden_units, activation, dropout):
        self.num_layers = num_layers
        self.num_hidden_units = num_hidden_units
        self.activation = activation
        self.dropout = dropout

With the `Config` class defined, we can create as many configs as we want.

In [111]:
base_config = Config(5, 10, 'relu', 0.1)

In [112]:
new_config = Config(7, 12, 'tanh', 0.2)

We can access the attributes using dot notation.

In [113]:
base_config.activation

'relu'

In [114]:
new_config.activation

'tanh'

In [116]:
base_config.num_hidden_units

10

Tab-completion works too! You can check for yourself.

(I will confess that I have an unhealthy love for tab completion :))

Another benefit: we can define the class with **defaults** for the variables.

In [126]:
class Config:
    def __init__(self, num_layers, num_hidden_units, activation="relu", dropout="0.1"):
        self.num_layers = num_layers
        self.num_hidden_units = num_hidden_units
        self.activation = activation
        self.dropout = dropout

In [127]:
config = Config(5, 10)

In [128]:
config.activation

'relu'

Of course, since this is a full-fledged `class`, we can add methods etc. In short, lots of flexibility.

One annoyance: We have to write that boilerplate __init__  method and repeat each variable name - e.g., `num_layers` - 3 times!

Another annoyance: we don't get nice output if we print it.

In [129]:
print(config)

<__main__.Config object at 0x10977aca0>


We don't get other useful methods out of the box either (e.g., we can't quickly check if two configs are the same).


For all this, we will need to write some boilerplate code, which is a pain.

## Solution 2: Use a `namedtuple`

In [132]:
from collections import namedtuple

In [133]:
Config = namedtuple('Config','num_layers num_hidden_units activation dropout')

In [134]:
config = Config(5,10,'relu',0.1)

As before, we can access all the variables using dot notation, and you can check for yourself that tab-completion works.

In [135]:
config.activation

'relu'

In [136]:
config.num_hidden_units

10

We get nice output if we print it.

In [137]:
print(config)

Config(num_layers=5, num_hidden_units=10, activation='relu', dropout=0.1)


Since a `namedtuple` is a subclass of `tuple`, we can also access its elements by indexing IF we want.

In [141]:
config[2]

'relu'

`namedtuple` does have a limitation. It is **immutable** (like a Like a `tuple`).

Once created, it can't be changed. If you try to, you will get an error message.

For example:

In [125]:
config.activation = "tanh"

AttributeError: can't set attribute


In practice, if an element of a `namedtuple` is something you want to gradually build (say, a `list`), you should build the thing completely first and then create a `namedtuple` with it.

## Solution 3: Use `dataclasses`

`dataclasses` are the newest game in town and they *are* cool.

In [102]:
from dataclasses import dataclass

We do the necessary import and then define a class like we normally would, but in a much more compact way.

In [144]:
@dataclass     # notice the decorator
class Config:
    num_layers: int
    num_hidden_units: int
    activation: str
    dropout: float

Notice the `@dataclass` decorator.

Nice positive: Each variable name has be written just once.

We do need to provide type hints. Is that a positive or a negative? You can decide :).


As before, we can access all the variables using dot notation, and you can check for yourself that tab-completion works.

In [145]:
config = Config(7,12,'sigmoid',0.1)

In [146]:
config.activation

'sigmoid'

In [147]:
config.num_layers

7

In [110]:
config.num_hidden_units

12

Unlike a `namedtuple`, a `dataclass` is mutable so you get more flexibility (but you can switch mutability off if you want - see the documentation).

Since it is a regular `class`, we can set **defaults** for the variables as well.

In [148]:
@dataclass
class Config:
    num_layers: int
    num_hidden_units: int
    activation: str = "relu"
    dropout: float = 0.1

In [149]:
config = Config(7,12)

You get a nice, descriptive output when you print.

In [150]:
print(config)

Config(num_layers=7, num_hidden_units=12, activation='relu', dropout=0.1)


You get some other goodies for free e.g., checking equality.

In [153]:
new_config = Config(7,12)

config == new_config

True

## What I use

I used `namedtuple`s a lot in the past but have switched to `dataclass` as my default choice now.


### Next steps

We have just scratched the surface of what's possible. Please see the [official Python documentation](https://docs.python.org/) to learn more.