In [None]:
from pathlib import Path
from pydantic import BaseModel, root_validator
from typing import Optional
import ray
from ray import tune


While creating larger projects, we will typically end up having a lot of parameters. While the fastest way might seem to just hardcode them somewhere, this is not a valid long-term strategy.

Especially when doing experiments with machine learning, we will want to have everything in one place, and ideally we want to have checks in place.

This documents explores how you can make more advanced pydantic settings, even for more complex parameters like ray search spaces.

To start naively, we could just make a config like this

In [None]:
config = {"input_size": 3, "output_size": 20, "data_dir": Path(".")}


In [None]:
config["input_size"]

While this will go a long way, there are some horros hidden deep inside python.

In [None]:
from dataclasses import dataclass

@dataclass
class MyClass:
    mutable_attr = []

# Create two instances
instance1 = MyClass()
instance2 = MyClass()

# Append to the list in instance1
instance1.mutable_attr.append('Hello')

print(instance1.mutable_attr)  # prints ['Hello']
print(instance2.mutable_attr)  # also prints ['Hello']. Wait, what?

Every programmer should get nightmares from this, because this is absolutely not what you would expect.
Luckily, pydantic is there to save the day.

![img](python.PNG)

In [None]:
from pydantic import BaseModel
from typing import List

class TrainerSettings(BaseModel):
    mutable_attr: List = []

# Create two settings instances
settings1 = TrainerSettings()
settings2 = TrainerSettings()

# Change 'factor' in settings1
settings1.mutable_attr.append("Hello")

print(settings1.mutable_attr)  # prints ["Hello"]
print(settings2.mutable_attr) # print []


But the protection against modifying features is just one advantage. We can get a config on steroids with pydantic without too much extra effort:

In [None]:
class SearchSpace(BaseModel):
    input_size: int
    output_size: int
    tune_dir: Optional[Path]
    data_dir: Path

config = SearchSpace(input_size=3.0, output_size=20, tune_dir=None, data_dir=".")  # <- string goes in here
config  # <- and is automatic cast to a Path here

Note how the `"."` data_dir becomes a `PosixPath`, automatically, even if we provide the argument as a string!!

Note how `Optional` allows for leaving the argument out, and the value defaults to `None`.

If possible, it will cast all elements, e.g. even `input_size="3"` becomes an integer

In [None]:
config = SearchSpace(input_size="3", output_size=20, tune_dir=None, data_dir=".")
config.input_size


In [None]:
type(config.input_size) == int

And if you try to give `data_dir` something that can't be cast to a `Path`, you will get an error.
The advantage is that you get your errors at the place where you make them, and not 10 steps later when running the trainloop...

In [None]:
try:
    config = SearchSpace(input_size="3", output_size=20, tune_dir=None, data_dir=3.4)
except ValueError as e:
    print(e)


Let's try to add the ray.tune ranges. We will need these later on when hypertuning. 
You dont have to understand this now, but what it does is it provides us a range of possible parameters, in this case a uniform distribution of numbers between 0.0 and 10.0.

To find out what the type is, we simple call the `type()` method.

In [None]:
type(1.0)

In [None]:
type(tune.uniform(0.0, 10.0))


This is a uniform distribution, that Ray will use to search for optimal parameters.

But if we simply add that like this:
```python
class SearchSpace(BaseModel):
    input_size: int
    hidden_size: Union[int, SAMPLE_INT]
    output_size: int = 20
    tune_dir: Path = "."
    data_dir: Path
```

pydantic will crash. Instead, do this

Pydantic complains that it does not know how to validate the type. A simple solution is to add `arbitrary_types_allowed`

In [None]:
from typing import Union, Optional, Dict
SAMPLE_INT = ray.tune.search.sample.Integer

class SearchSpace(BaseModel):
    input_size: int
    hidden_size: Union[int, SAMPLE_INT]
    output_size: int = 20
    tune_dir: Path = "."
    data_dir: Path

    class Config:
        arbitrary_types_allowed = True


config = SearchSpace(input_size=3, hidden_size=32, data_dir=".")
config


Because of the `Union`, an integer will work too

In [None]:
config = SearchSpace(input_size=3, hidden_size=tune.randint(16, 128), data_dir=".")
config


And a `tune.randint` will work.

But a `tune.uniform` fails! Exactly what we need!

In [None]:
try:
    config = SearchSpace(input_size=3, hidden_size=tune.uniform(0.0, 0.5), data_dir=".")
except Exception as e:
    print(e)
