# Fiddle Basic API

*Please see https://github.com/google/fiddle/blob/main/docs/colabs.md for other colabs.*

This colab walks through the core Fiddle abstractions and how they work.

In [None]:
import copy
import dataclasses
from typing import List

from IPython.core.display import display, HTML
!pip install fiddle


import fiddle as fdl

## `fdl.Config`

The most common API for Fiddle is the `fdl.Config` type. `fdl.Config`s
correspond to a call to a function or class. (For classes, calling the class
creates a class instance.)

In [None]:
@dataclasses.dataclass
class Foo:
  a: int


config = fdl.Config(Foo, a=2)
fdl.build(config)

`fdl.Config` stores two key things:

*   A reference to the function or class to call
*   Arguments to call it with

The key affordance of Fiddle is that the arguments are mutable, and can
themselves contain sub-configuration. Let's first look at the mutability:

In [None]:
config.a = 3
fdl.build(config)

These arguments can be retrieved as well. For example, you can double a value in
an existing configuration like so,

In [None]:
config.a = 2 * config.a
fdl.build(config)

### Nested configuration

Let's look at the case when arguments themselves contain sub-configs,

In [None]:
@dataclasses.dataclass
class Bar:
  x: Foo
  y: List[Foo]


foo_config = fdl.Config(Foo, a=1)
config = fdl.Config(Bar, foo_config, [foo_config])
bar = fdl.build(config)
bar

One key thing that Fiddle provides is the ability to have shared instances. Each
`fdl.Config` object corresponds to a single call during `build`. So, in the
above case, the `Foo` objects are actually the same,

In [None]:
bar.x is bar.y[0]

If you don't want these to be the same instance, you can copy `foo_config`;
please see the "Copying configuration" section below.

### Setting and retrieving parameters

Setting an unknown attribute will raise an error,

In [None]:
try:
  foo_config.qux = 4
except AttributeError as e:
  display(HTML(f'<span style="color: red">TypeError: {e}</span>'))
else:
  raise AssertionError("This should raise an error!")

Configuring `**kwargs` arguments are supported,

In [None]:
def args_and_kwargs(a, *args, **kwargs):
  return a, args, kwargs


config = fdl.Config(args_and_kwargs, a=4, b=12)
fdl.build(config)

### Positional Arguments

If the class/function has positional arguments, they can be easily accessed through the `[]` syntax, just like normal Python `list`.

If an argument does not have an assigned value, its default value will be returned when reading the config. If no default value exists, the value will be returned as a special sentinel `fdl.NO_VALUE`.

`fdl.VARARGS` is a special API that represents the start index of variadic positional arguments. See the example below.

In [None]:
def positional_fn(a, b='b', /, c='c', *args):
  return locals()

p_config = fdl.Config(positional_fn, 1, 2, 3, 4, 5)

# Read
assert p_config[0] == 1
assert p_config[:] == [1, 2, 3, 4, 5]

# Modify
p_config[0] = 'a'
p_config.c = 'c'

# `fdl.VARARGS` represent the start index of variadic positional args (*args)
p_config[fdl.VARARGS:] = ['x', 'y']
assert p_config[:] == ['a', 2, 'c', 'x', 'y']

# Delete
del p_config[fdl.VARARGS:]
assert p_config[:] == ['a', 2, 'c']

del p_config[0]
assert p_config[:] == [fdl.NO_VALUE, 2, 'c']

del p_config[1]
assert p_config[:] == [fdl.NO_VALUE, 'b', 'c']

NOTE: If using `slice` as key for modifying the config, and the `slice` spans over positional-only or positional-or-keyword arguments, the provided value must have the same length as that of the slice range.

In [None]:
assert len(p_config[:]) == 3

# slice `[:]` represents indices [0, 1, 2] here, and it spans over non-variadic
# positional arguments. But the value has length of 6. So this operation is
# illegal.
try:
  p_config[:] = [1, 2, 3, 4, 5, 6]
except ValueError as e:
  display(HTML(f'<span style="color: red">ValueError: {e}</span>'))

# To achieve the same purpose, modify variadic positional args separately.
p_config[:fdl.VARARGS] = [1, 2, 3]
p_config[fdl.VARARGS:] = [4, 5, 6]
assert p_config[:] == [1, 2, 3, 4, 5, 6]

NOTE: Directly calling `list` methods like `append` and `extend` is not supported, and will not mutate the config. Like with Python lists, slice operations on Configs effectively create a copy of the underlying sequence.


In [None]:
p_config[:].append(7)
assert p_config[:] == [1, 2, 3, 4, 5, 6]

# A workaround to use `list` methods
var_args = p_config[fdl.VARARGS:]  # Build a new list
assert var_args == [4, 5, 6]
# Make some modifications
var_args.pop()
var_args.append('x')
p_config[fdl.VARARGS:] = var_args
assert p_config[fdl.VARARGS:] == [4, 5, 'x']

### Type annotations

The `fdl.Config` type can take a type parameter, which indicates the output type
of the function or class being configured.

In [None]:
foo_config: fdl.Config[Foo] = fdl.Config(Foo, 1)

A call to `fdl.build(foo_config)` correctly hints its output type as `Foo`,
which helps static type checkers.

In the future, we might check argument types, but this is not currently
available. (Unfortunately, standard Python type tooling is insufficiently
powerful to express the types precisely.) So you can currently write
`fdl.Config(Foo, "string_not_int")`.

## `fdl.Partial`

Let's say you have a checkpointer which takes the current train state and number
of steps. This might not be exactly the API you would create if you intended to
use Fiddle from the beginning, but Fiddle is designed to be able to configure
existing classes.

In [None]:
from typing import Any
import dataclasses


@dataclasses.dataclass
class Checkpointer:
  train_state: Any
  num_steps: int
  base_directory: str
  checkpoints_to_keep: int

  def clear_old_checkpoints(self):
    print("  Clearing checkpoints to limit to", self.checkpoints_to_keep,
          "checkpoints")

  def save(self):
    self.clear_old_checkpoints()
    print("  Saved to", self.base_directory, "at num_steps =", self.num_steps)


@dataclasses.dataclass
class Trainer:
  checkpointer_cls: Any
  num_steps: int = dataclasses.field(default=0)

  def train_epoch(self, epoch_steps: int):
    print("Training for one epoch ...")
    train_state = None
    self.num_steps += epoch_steps
    self.checkpointer_cls(
        train_state=train_state, num_steps=self.num_steps).save()

Then let's say you want to configure the `base_directory` and
`checkpoints_to_keep` parameters, and let the `train_state` and `num_steps`
parameters be set by caller code at runtime. Then we can create a `fdl.Partial`
object to represent the `checkpointer_cls`,

In [None]:
def trainer_config() -> fdl.Config:
  checkpointer_config = fdl.Partial(
      Checkpointer, base_directory="/path/to/my/dir", checkpoints_to_keep=5)
  return fdl.Config(Trainer, checkpointer_cls=checkpointer_config)


config = trainer_config()

# Experimental overrides can be applied as normal.
config.checkpointer_cls.checkpoints_to_keep = 10

# The trainer can be built, and we can mock-train two epochs.
trainer = fdl.build(config)
trainer.train_epoch(10)
trainer.train_epoch(20)

When built, the `checkpointer_cls` becomes a `functools.partial` instance, partially specifying the arguments for `Checkpointer`:

In [None]:
fdl.build(config.checkpointer_cls)

`fdl.Partial` can also be called on functions:

In [None]:
def foo(a, b):
  return a + b


foo_partial = fdl.build(fdl.Partial(foo, a=1))
foo_partial(b=2)

## Building configuration (`fdl.build`): anti-patterns to avoid

As you've seen before, `fdl.build` is the core function creating actual classes
from their configuration.

A key affordance is that when used properly, built objects have any
configuration "closured" in. So that if the configuration is ever mutated later,
then the built objects are not changed. To demonstrate:

In [None]:
def foo(a):
  return a


config = fdl.Partial(foo, a=1)
foo_partial = fdl.build(config)
config.a = 4
foo_partial()

We strongly encourage `fdl.build` to be a single transition point from
configuration to built objects. Returning `fdl.Config` objects as the output of
a function being configured is discouraged. Further, calling `fdl.build` within
a `fdl.build` call will raise an error,

In [None]:
def bar():
  sub_config = fdl.Config(foo, a=1)
  return fdl.build(sub_config)

try:
  fdl.build(fdl.Config(bar))
except Exception as e:
  error_message = str(e).replace("\n", "<br/>")
  display(HTML(f'<span style="color: red">{type(e).__name__}: {error_message}</span>'))
else:
  raise AssertionError("This should raise an error!")

When errors are raised, one "path" from the root config to the
callable which raised the error is added to the error message. This
can help debug configurations.

In [None]:
def my_error_function(a) -> Foo:
  raise ValueError("Error!")

foo_config = fdl.Config(Foo, a=1)
config = fdl.Config(Bar, foo_config, [foo_config, fdl.Config(my_error_function, a=7)])
try:
  fdl.build(config)
except Exception as e:
  error_message = str(e).replace("\n", "<br/>")
  display(HTML(f'<span style="color: red">{type(e).__name__}: {error_message}</span>'))

else:
  raise AssertionError("This should raise an error!")

## Copying configuration

`fdl.Config` and `fdl.Partial` objects can be shallow-copied by just calling
`copy.copy`, or `fdl.Config`/`fdl.Partial` with an existing config,

In [None]:
def foo(a, b):
  return a, b


config = fdl.Config(foo, a="a", b="original_b")
config2 = copy.copy(config)  # Alternate: fdl.Config(config)
config2.b = "new_b"
print("Original:", fdl.build(config))
print("Shallow copy:", fdl.build(config2))

Deep copies are also supported, with the expected semantic that shared objects
are copied, but still shared.

In [None]:
@dataclasses.dataclass
class FooWithId(Foo):

  def __repr__(self):
    return f"Foo<{id(self)}>(a={self.a})"


foo_config = fdl.Config(FooWithId, a=1)
config = fdl.Config(Bar, foo_config, [foo_config])
config2 = copy.copy(config)
config3 = copy.deepcopy(config)

# Change the original sub-object.
foo_config.a = 4

print("Original:", fdl.build(config))

# This reflects the post-copy change to `foo_config.a`.
print("Shallow copy:", fdl.build(config2))

# This has the original value for `foo_config.a`.
print("Deep copy:", fdl.build(config3))

In general, we advise users to use `copy.deepcopy`, unless they intend to share
sub-configs with the original object.

## Serialization, equality, and hashing

We'll introduce serialization in more depth later, but please note that
`fdl.Config` objects are pickleable. Also, an `__eq__` operator for `Config`
works as expected. But we mostly suggest using `__eq__` in unit tests, since it
is plausible that two configurations may produce the same object, but have
different expressions (e.g. `fdl.Config(lambda x: x, 1)` can be used in place of
`1`).

In [None]:
import pickle

reloaded = pickle.loads(pickle.dumps(config))
assert reloaded is not config, "Reloaded object should not be identical"
assert reloaded == config, "Reloaded object should be the same by __eq__"

Since `fdl.Config` objects are mutable, they are not hashable. If you would like
to put them in a dictionary, please hash their IDs.

In [None]:
try:
  hash(config)
except TypeError as e:
  display(HTML(f'<span style="color: red">TypeError: {e}</span>'))
else:
  raise AssertionError("This should raise an error!")

## Internal details

Please do not access private fields of `fdl.Config` objects, but for building
your mental model, you can think of these objects as maintaining a reference to
the class/function being configured, and their arguments,

In [None]:
print("Function configured:", config.__fn_or_cls__)
print("Arguments:", config.__arguments__)

As such, Fiddle objects have been designed to behave like one would expect
Python objects to behave. For example, although we discourage it, you could
configure a function created in a closure,

In [None]:
def make_configured(x):

  def inner(y):
    return x + y

  return fdl.Config(inner, y=1)


config1 = make_configured(3)
config2 = make_configured(4)

assert fdl.build(config1) == 3 + 1
assert fdl.build(config2) == 4 + 1

However, using Fiddle in such ways will likely break serialization, and will be
more difficult to debug.