## Notes
- Found bug / limitation in `builds` when doing populate full signature on a dataclass that has a default factory for a field.
- Need to think about UI for applying recursive just. Should recursive just also work on dataclass types rather than just instances? `builds(..., zen_recursive=<bool>)`?

If `just_recursive` is producing a "sanitized" version of a dataclass, how should it handle inheritance? Should it mirror the parents of the target dataclass?

## Jasha proposal

In [1]:
"""
A few months ago @rsokl shared a vision with me: what if arbitrary python objects could be
serialized to an OmegaConf-compatible format, then modified and composed using Hydra, and finally
reanimated with a call to `instantiate`?

In this literate program, I outline an approach to automatic, recursive conversion of
OmegaConf-incompatible dataclass instances to an OmegaConf-compatible form. This is something like a
poor man's implementation of the auto-config support proposed in hydra-zen issue
https://github.com/mit-ll-responsible-ai/hydra-zen/issues/257 .
"""

#######################
# Motivating use case #
#######################

# Below we create an object `foobar` containing data and type-hints unsupported by Hydra.

from dataclasses import dataclass
from typing import Any, Callable
from typing_extensions import TypeAlias

Interface1: TypeAlias = Callable[[int], int]
Interface2: TypeAlias = Callable[[str], str]


@dataclass
class Nested:
    x: Any


@dataclass
class Stuff:
    field1: Interface1
    field2: Interface2
    field3: Nested


def foo(i: int) -> int:
    return i


def bar(s: str) -> str:
    return s


def baz(f: float) -> float:
    return f


foobar = Stuff(foo, bar, Nested(baz))


In [2]:

###############
# The problem #
###############

# The type hints `Interface1` and `Interface2` are not supported by OmegaConf, and neither are the
# values `foo`/`bar`/`baz`. As such, we cannot serialize the object `foobar` using OmegaConf:

from omegaconf import OmegaConf, ValidationError
from omegaconf.errors import ConfigTypeError
from pytest import raises

with raises((ValidationError, ConfigTypeError)):  # I get ConfigTypeError
    OmegaConf.structured(foobar)

In [3]:


# This means `foobar` cannot pariticipate in config composition via Hydra.

############
# The Goal #
############

# Given `foobar`, how can we create a Hydra-compatible dataclass `BuildsFooBar` such that
# `instantiate(BuildsFooBar) == foobar`?

# Here is a solution using `hydra_zen.builds`:

from hydra.utils import instantiate
from hydra_zen import builds

BuildsFooBar = builds(Stuff, foo, bar, builds(Nested, baz))
assert instantiate(BuildsFooBar) == foobar
assert OmegaConf.to_yaml(BuildsFooBar) == """\
_target_: __main__.Stuff
_args_:
- _target_: hydra_zen.funcs.get_obj
  path: __main__.foo
- _target_: hydra_zen.funcs.get_obj
  path: __main__.bar
- _target_: __main__.Nested
  _args_:
  - _target_: hydra_zen.funcs.get_obj
    path: __main__.baz
"""

In [4]:
# Unlike `foobar` itself, the dataclass `BuildsFooBar` (and its instances) are fully compatible with
# OmegaConf + Hydra.

# How can the above workflow be improved? Ideally, we'd be able to transform the instance `foobar`
# into the class `BuildsFooBar` in a fully-automatic fashion. I envision something like this:

#   BuildsFooBar2 = recursive_just(foobar)
#   assert instantiate(BuildsFooBar2) == foobar

# The idea is for `recursive_just(obj)` to visit sub-objects of `obj`, converting each of them
# to be Hydra-compatible. Like the non-recursive `hydra_zen.just` function, this proposed "recursive
# just" operator is idempotent.


############################
# Prototype implementation #
############################

from dataclasses import fields, is_dataclass
from hydra_zen import just


def recursive_just(obj):
    if is_dataclass(obj) and not isinstance(obj, type):
        # obj is a dataclass instance
        converted_fields = {}
        for field in fields(obj):
            value = getattr(obj, field.name)
            converted_fields[field.name] = recursive_just(value)
        return builds(type(obj), **converted_fields)

    else:
        return just(obj)


BuildsFooBar2 = recursive_just(foobar)
assert instantiate(BuildsFooBar2) == foobar
assert OmegaConf.to_yaml(BuildsFooBar2) == """\
_target_: __main__.Stuff
field1:
  _target_: hydra_zen.funcs.get_obj
  path: __main__.foo
field2:
  _target_: hydra_zen.funcs.get_obj
  path: __main__.bar
field3:
  _target_: __main__.Nested
  x:
    _target_: hydra_zen.funcs.get_obj
    path: __main__.baz
"""

In [5]:
BuildsFooBar2()

Builds_Stuff(_target_='__main__.Stuff', field1=<class 'types.Just_foo'>, field2=<class 'types.Just_bar'>, field3=<class 'types.Builds_Nested'>)

In [6]:
instantiate(BuildsFooBar2)

Stuff(field1=<function foo at 0x000001B1BF5E8B80>, field2=<function bar at 0x000001B1BF65AC10>, field3=Nested(x=<function baz at 0x000001B1BF65ADC0>))

In [7]:
# Current behavior of `just` on dataclasses

from hydra_zen import make_config

Conf = make_config(x=1, y="hi")
conf = Conf()

assert just(Conf) is Conf
assert just(conf) is conf

Thanks so much for taking the time to write this up, Jasha! I like `recursive_just` and feel like it fits in quite nicely with the current behavior of `just`.

### A brief introduction to `just`
For those who aren't familiar with `just`, I'll provide a brief introduction.

The idea behind `just` is: you give it an object that is not natively supported by Hydra, and it will create a config that, when instantiated, "just" returns that object. Originally `just` was used to make it easy to create a config that simply imports an object upon instantiation:

```python
class A: ...

instantiate(just(A)) is A

print(to_yaml(just(A)))

# prints..
```

```
_target_: hydra_zen.funcs.get_obj
path: __main__.A
```

But `just` has since become more general-purpose, and can create configs for values from [a wider variety of types](https://mit-ll-responsible-ai.github.io/hydra-zen/api_reference.html#additional-types-supported-via-hydra-zen).
Let me summarize the current behaviors of `just` to demonstrate:


In [8]:
# basic `just` behavior


# `just` is a no-op on Hydra-compatible primitives. 
# This includes all dataclass objects & instances – even those with fields incompatible with Hydra
assert just(1) == 1

assert is_dataclass(foobar)
assert just(foobar) is foobar  # note: foobar is not compatible 

assert just([1, 2]) == [1, 2]

In [9]:
from hydra_zen import to_yaml

In [10]:
# `just` creates `builds(get_obj, <target>)` when `<target>` is a function or (non-dataclass) class-object
class A: ...

def func(x): ...
    
assert instantiate(just(A)) is A
assert instantiate(just(func)) is func

In [11]:
# `just` uses `builds` to create targeted structured configs to describe data that
# is not compatible with Hydra, but that has specialized support via hydra-zen
#
# See https://mit-ll-responsible-ai.github.io/hydra-zen/api_reference.html#additional-types-supported-via-hydra-zen

# Support for complex numbers
assert instantiate(just(1+2j)) == 1+2j

# Support for partial'd functions
from functools import partial

partial_f = partial(func, x=2)
just_partial_f = just(partial_f)

assert OmegaConf.to_yaml(just_partial_f) == """\
_target_: __main__.func
_partial_: true
x: 2
"""

In [12]:
# `just` automatically applies itself recursively on lists and dictionary values

just_recursive_example = just({"a": [1+2j, func]})
assert str(just_recursive_example) == "\
{'a': [ConfigComplex(real=1.0, imag=2.0, _target_='builtins.complex'), <class 'types.Just_func'>]}"

assert instantiate(just_recursive_example) == {"a": [1+2j, func]}

Lastly, it is useful to note that hydra-zen's `builds` function automatically applies `just` on the target's configured values

In [13]:
assert instantiate(builds(tuple, [1+2j, func])) == (1+2j, func)

This includes fields that are populated from the function's signature.

In [14]:
def has_sig(x=[1+2j, func]): return x

Conf = builds(has_sig, populate_full_signature=True)
assert OmegaConf.to_yaml(Conf) == """\
_target_: __main__.has_sig
x:
- real: 1.0
  imag: 2.0
  _target_: builtins.complex
- _target_: hydra_zen.funcs.get_obj
  path: __main__.func
"""

assert instantiate(Conf) == [(1+2j), func]

### Incorporating `recursive_just`

As noted above, `just` is currently a pass-through returns all dataclass types and instances. This is true even if the dataclass contains fields that are not compatible with Hydra. 

Given the 

In [15]:
def recursive_just(obj):
    if is_dataclass(obj) and not isinstance(obj, type):
        # obj is a dataclass instance
        converted_fields = {}
        for field in fields(obj):
            value = getattr(obj, field.name)
            converted_fields[field.name] = recursive_just(value)
        return builds(type(obj), **converted_fields, populate_full_signature=True, hydra_convert="all")

    else:
        return just(obj)

In [16]:
from pydantic.dataclasses import dataclass

In [17]:
import dataclasses
from typing import List, Optional

from pydantic import Field
from pydantic.dataclasses import dataclass as pydantic_dataclass


@pydantic_dataclass
class User:
    id: int
    name: str = 'John Doe'
    friends: List[int] = dataclasses.field(default_factory=lambda: [0])
    age: Optional[int] = dataclasses.field(
        default=None,
        metadata=dict(title='The age of the user', description='do not lie!')
    )
    height: Optional[int] = Field(None, title='The height in cm', ge=50, le=300)

In [23]:
instantiate(User)

ValidationError: Value 'title='The height in cm' ge=50 le=300 extra={}' of type 'pydantic.fields.FieldInfo' could not be converted to Integer
    full_key: height
    object_type=User

In [18]:
builds(User, populate_full_signature=True)

HydraZenUnsupportedPrimitiveError: Building: User ..
 The configured value <factory>, for field `friends`, is not supported by Hydra -- serializing or instantiating this config would ultimately result in an error.

Consider using `hydra_zen.builds(<class 'dataclasses._HAS_DEFAULT_FACTORY_CLASS'>, ...)` create a config for this particular value.

In [104]:
User(id=False)

User(id=0, name='John Doe', friends=[0], age=None, height=None)

In [24]:
jj = recursive_just(User(id="22"))

In [25]:
print(to_yaml(jj))

_target_: __main__.User
_convert_: all
id: 22
name: John Doe
friends:
- 0
age: null
height: null



In [30]:
instantiate(jj, height=100)

User(id=22, name='John Doe', friends=[0], age=None, height=100)

In [31]:
instantiate(jj, height=-10)

InstantiationException: Error in call to target '__main__.User':
ValidationError(model='User', errors=[{'loc': ('height',), 'msg': 'ensure this value is greater than or equal to 50', 'type': 'value_error.number.not_ge', 'ctx': {'limit_value': 50}}])

## Scratch

In [24]:
# def recursive_just_new(obj):
#     if is_dataclass(obj):
        
#         is_class_obj = isinstance(obj, type)
#         type_ = type(obj) if not is_class_obj else obj
        
#         # obj is a dataclass instance
        
#         converted_fields = {}
#         for field in fields(obj):
#             value = getattr(obj, field.name)
#             converted_fields[field.name] = recursive_just(value)
#         out = builds(type_, **converted_fields)
#         if not is_class_obj:
#             out = out()
#         return out

#     else:
#         return just(obj)

# @dataclass
# class FooBar:
#     field1: Interface1 = foo
#     field2: Interface2 = bar
#     field3: Any = 1




# foobar = Stuff(foo, bar, Nested(baz))

# instantiate(recursive_just(foobar)) == foobar
# instantiate(recursive_just(FooBar())) == FooBar()

In [1]:
from dataclasses import dataclass, field
from typing import List

@dataclass
class Foo:
    bar : List[int] = field(default_factory=lambda : [0])

In [2]:
from hydra_zen import builds, to_yaml

print(to_yaml(builds(Foo, populate_full_signature=True)))

[('_target_', <class 'str'>, Field(name=None,type=None,default='__main__.Foo',default_factory=<dataclasses._MISSING_TYPE object at 0x000001CF557F14F0>,init=False,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=None)), ('bar', typing.List[int], <factory>)]
HERE bar <factory>




AttributeError: type object 'Foo' has no attribute 'bar'

In [3]:
from inspect import signature
from dataclasses import fields

In [7]:
fields(Foo)[0].default_factory()

[0]

In [16]:
signature(Foo).parameters["bar"].default

<factory>

In [17]:
signature(Foo).parameters["bar"].default()

TypeError: '_HAS_DEFAULT_FACTORY_CLASS' object is not callable

In [10]:
from dataclasses import Field
isinstance(signature(Foo).parameters["bar"].default, Field)

False