## Notes
- Found bug / limitation in `builds` when doing populate full signature on a dataclass that has a default factory for a field.
- Need to think about UI for applying recursive just. Should recursive just also work on dataclass types rather than just instances? `builds(..., zen_recursive=<bool>)`?
- Take care to associate 

If `just_recursive` is producing a "sanitized" version of a dataclass, how should it handle inheritance? Should it mirror the parents of the target dataclass?

Jasha's proposal shows a dataclass *instance* with... 
1. Annotations that need to be sanitized
2. A field that stores a dataclass *instance* whose fields/values similarly need to be sanitized 

## Jasha proposal

In [1]:
"""
A few months ago @rsokl shared a vision with me: what if arbitrary python objects could be
serialized to an OmegaConf-compatible format, then modified and composed using Hydra, and finally
reanimated with a call to `instantiate`?

In this literate program, I outline an approach to automatic, recursive conversion of
OmegaConf-incompatible dataclass instances to an OmegaConf-compatible form. This is something like a
poor man's implementation of the auto-config support proposed in hydra-zen issue
https://github.com/mit-ll-responsible-ai/hydra-zen/issues/257 .
"""

#######################
# Motivating use case #
#######################

# Below we create an object `foobar` containing data and type-hints unsupported by Hydra.

from dataclasses import dataclass
from typing import Any, Callable
from typing_extensions import TypeAlias

Interface1: TypeAlias = Callable[[int], int]
Interface2: TypeAlias = Callable[[str], str]


@dataclass
class Nested:
    x: Any


@dataclass
class Stuff:
    field1: Interface1
    field2: Interface2
    field3: Nested


def foo(i: int) -> int:
    return i


def bar(s: str) -> str:
    return s


def baz(f: float) -> float:
    return f


foobar = Stuff(foo, bar, Nested(baz))


In [2]:

###############
# The problem #
###############

# The type hints `Interface1` and `Interface2` are not supported by OmegaConf, and neither are the
# values `foo`/`bar`/`baz`. As such, we cannot serialize the object `foobar` using OmegaConf:

from omegaconf import OmegaConf, ValidationError
from omegaconf.errors import ConfigTypeError
from pytest import raises

with raises((ValidationError, ConfigTypeError)):  # I get ConfigTypeError
    OmegaConf.structured(foobar)

In [3]:


# This means `foobar` cannot pariticipate in config composition via Hydra.

############
# The Goal #
############

# Given `foobar`, how can we create a Hydra-compatible dataclass `BuildsFooBar` such that
# `instantiate(BuildsFooBar) == foobar`?

# Here is a solution using `hydra_zen.builds`:

from hydra.utils import instantiate
from hydra_zen import builds

BuildsFooBar = builds(Stuff, foo, bar, builds(Nested, baz))
assert instantiate(BuildsFooBar) == foobar
assert OmegaConf.to_yaml(BuildsFooBar) == """\
_target_: __main__.Stuff
_args_:
- _target_: hydra_zen.funcs.get_obj
  path: __main__.foo
- _target_: hydra_zen.funcs.get_obj
  path: __main__.bar
- _target_: __main__.Nested
  _args_:
  - _target_: hydra_zen.funcs.get_obj
    path: __main__.baz
"""

In [77]:
# Unlike `foobar` itself, the dataclass `BuildsFooBar` (and its instances) are fully 
# compatible with OmegaConf + Hydra.

# How can the above workflow be improved? Ideally, we'd be able to transform the 
# instance `foobar` into the class `BuildsFooBar` in a fully-automatic fashion. I 
# envision something like this:

#   BuildsFooBar2 = recursive_just(foobar)
#   assert instantiate(BuildsFooBar2) == foobar

# The idea is for `recursive_just(obj)` to visit sub-objects of `obj`, converting each of them
# to be Hydra-compatible. Like the non-recursive `hydra_zen.just` function, this proposed "recursive
# just" operator is idempotent.


############################
# Prototype implementation #
############################

from dataclasses import fields, is_dataclass
from hydra_zen import just


def recursive_just(obj):
    if is_dataclass(obj) and not isinstance(obj, type):
        # obj is a dataclass instance
        converted_fields = {}
        for field in fields(obj):
            value = getattr(obj, field.name)
            converted_fields[field.name] = recursive_just(value)
        return builds(type(obj), **converted_fields)

    else:
        return just(obj)


BuildsFooBar2 = recursive_just(foobar)
assert instantiate(BuildsFooBar2) == foobar
assert OmegaConf.to_yaml(BuildsFooBar2) == """\
_target_: __main__.Stuff
field1:
  _target_: hydra_zen.funcs.get_obj
  path: __main__.foo
field2:
  _target_: hydra_zen.funcs.get_obj
  path: __main__.bar
field3:
  _target_: __main__.Nested
  x:
    _target_: hydra_zen.funcs.get_obj
    path: __main__.baz
"""

In [76]:
def recursive_just(obj):
    if is_dataclass(obj) and not isinstance(obj, type):
        # obj is a dataclass instance
        converted_fields = {}
        for field in fields(obj):
            if field.init:
                value = getattr(obj, field.name)
                converted_fields[field.name] = recursive_just(value)
        return builds(
            type(obj),
            populate_full_signature=True,
            hydra_convert="all"
        )(**converted_fields)

    else:
        return just(obj)
        
instantiate(recursive_just(foobar))

ValidationError: Invalid type assigned: Builds_Nested is not a subclass of Nested. value: Builds_Nested(_target_='__main__.Nested', _convert_='all', x=<class 'types.Just_baz'>)
    full_key: field3
    object_type=None

In [6]:
instantiate(BuildsFooBar2)

Stuff(field1=<function foo at 0x000002B9B719A1F0>, field2=<function bar at 0x000002B9B6F87A60>, field3=Nested(x=<function baz at 0x000002B9B6F87C10>))

In [7]:
# Current behavior of `just` on dataclasses

from hydra_zen import make_config

Conf = make_config(x=1, y="hi")
conf = Conf()

assert just(Conf) is Conf
assert just(conf) is conf

Thanks so much for taking the time to write this up, Jasha! I like `recursive_just` and feel like it fits in quite nicely with the current behavior of `just`.

### A brief introduction to `just`
For those who aren't familiar with `just`, I'll provide a brief introduction.

The idea behind `just` is: you give it an object that is not natively supported by Hydra, and it will create a config that, when instantiated, "just" returns that object. Originally `just` was used to make it easy to create a config that simply imports an object upon instantiation:

```python
class A: ...

instantiate(just(A)) is A

print(to_yaml(just(A)))

# prints..
```

```
_target_: hydra_zen.funcs.get_obj
path: __main__.A
```

But `just` has since become more general-purpose, and can create configs for values from [a wider variety of types](https://mit-ll-responsible-ai.github.io/hydra-zen/api_reference.html#additional-types-supported-via-hydra-zen).
Let me summarize the current behaviors of `just` to demonstrate:


In [8]:
# basic `just` behavior


# `just` returns Hydra-compatible primitives unchanged. 
# This includes *all* dataclass objects & instances. That is, `just` does not perform
# any compatibility checks on config-like inputs 
assert just(1) == 1

assert is_dataclass(foobar)
assert just(foobar) is foobar  # note: foobar is not compatible with Hydra

assert just([1, 2]) == [1, 2]

In [9]:
from hydra_zen import to_yaml

In [10]:
# `just` creates `builds(get_obj, <target>)` when `<target>` is a function or (non-dataclass) class-object
class A: ...

def func(x): ...
    
assert instantiate(just(A)) is A
assert instantiate(just(func)) is func

In [11]:
# `just` uses `builds` to create targeted structured configs to describe data that
# is not compatible with Hydra, but that has specialized support via hydra-zen
#
# See https://mit-ll-responsible-ai.github.io/hydra-zen/api_reference.html#additional-types-supported-via-hydra-zen

# Support for complex numbers
assert instantiate(just(1+2j)) == 1+2j

# Support for partial'd functions
from functools import partial

partial_f = partial(func, x=2)
just_partial_f = just(partial_f)

assert OmegaConf.to_yaml(just_partial_f) == """\
_target_: __main__.func
_partial_: true
x: 2
"""

In [12]:
# `just` automatically applies itself recursively on lists and dictionary values

just_recursive_example = just({"a": [1+2j, func]})
assert str(just_recursive_example) == "\
{'a': [ConfigComplex(real=1.0, imag=2.0, _target_='builtins.complex'), <class 'types.Just_func'>]}"

assert instantiate(just_recursive_example) == {"a": [1+2j, func]}

Lastly, it is useful to note that hydra-zen's `builds` function automatically applies `just` on the target's configured values

In [13]:
# equivalent to: builds(tuple, [just(1+2j), just(func)])
assert instantiate(builds(tuple, [1+2j, func])) == (1+2j, func)

This includes fields that are populated from the function's signature via `builds(..., populate_full_signature=True)`.

In [14]:
def has_sig(x=[1+2j, func]): return x

Conf = builds(has_sig, populate_full_signature=True)
assert OmegaConf.to_yaml(Conf) == """\
_target_: __main__.has_sig
x:
- real: 1.0
  imag: 2.0
  _target_: builtins.complex
- _target_: hydra_zen.funcs.get_obj
  path: __main__.func
"""

assert instantiate(Conf) == [(1+2j), func]

### Incorporating `recursive_just`

As noted above, `just` is currently a pass-through returns all dataclass types and instances. This is true even if the dataclass contains fields that are not compatible with Hydra. 

Given the 

In [41]:
from typing import Literal
@dataclass
class A:
    x: Literal[1]

In [45]:
def recursive_just(obj):
    if is_dataclass(obj) and not isinstance(obj, type):
        # obj is a dataclass instance
        converted_fields = {}
        for field in fields(obj):
            if field.init:
                value = getattr(obj, field.name)
                converted_fields[field.name] = recursive_just(value)
        return builds(
            type(obj),
            populate_full_signature=True,
            hydra_convert="all"
        )(**converted_fields)

    else:
        return just(obj)


In [46]:
instantiate(recursive_just(A(1))) == A(1)

True

In [37]:
from pydantic.dataclasses import dataclass

In [61]:
@dataclass
class A:
    x: int

In [62]:
A(1)

A(x=1)

In [84]:
import dataclasses
from typing import List, Optional, Literal

from pydantic import Field
from pydantic.dataclasses import dataclass as pydantic_dataclass
from pydantic import PositiveInt


@pydantic_dataclass
class User:
    name: str
    age: PositiveInt

@pydantic_dataclass
class Match:
    user_a: User = User(name="Bob", age=25)
    user_b: User = User(name="Alice", age=27)


In [113]:
# def recursive_just(obj):
#     if is_dataclass(obj) and not isinstance(obj, type):
#         # obj is a dataclass instance
#         converted_fields = {}
#         for field in fields(obj):
#             if field.init:
#                 value = getattr(obj, field.name)
#                 converted_fields[field.name] = recursive_just(value)
#         return builds(
#             type(obj),
#             populate_full_signature=True,
#             hydra_convert="all"
#         )(**converted_fields)

#     else:
#         return just(obj)



def recursive_just(obj):
    if is_dataclass(obj) and not isinstance(obj, type):
        # obj is a dataclass instance
        converted_fields = {}
        for field in fields(obj):
            value = getattr(obj, field.name)
            converted_fields[field.name] = recursive_just(value)
        return builds(type(obj), **converted_fields)

    else:
        return just(obj)

conf = recursive_just(Match(User("bob", 24), User("alice", 26)))
instantiate(conf)

Match(user_a=User(name='bob', age=24), user_b=User(name='alice', age=26))

In [94]:
#from pydantic.errors import ValidationError as PydValidationError
from hydra.errors import InstantiationException
conf = recursive_just((User(name="Bob", age=20)))

with raises(InstantiationException):  # raises 
    # Pydantic validation error: ensure this value is greater than 0
    instantiate(conf, age=-10)  

profile = instantiate(conf, age=11)
assert isinstance(profile, User)
assert profile == User("Bob", 11)

In [81]:
conf = recursive_just(Match(User("bob", -20), User("alice", 26)))
instantiate(conf)

ValidationError: 1 validation error for User
age
  ensure this value is greater than 0 (type=value_error.number.not_gt; limit_value=0)

In [39]:
from hydra_zen import to_yaml
def pretty_print(x): print(to_yaml(x))

In [40]:
conf = recursive_just(User(id="22"))

pretty_print(recursive_just(User(id="22")))

HydraZenUnsupportedPrimitiveError: Building: User ..
 The configured value <factory>, for field `friends`, is not supported by Hydra -- serializing or instantiating this config would ultimately result in an error.

Consider using `hydra_zen.builds(<class 'dataclasses._HAS_DEFAULT_FACTORY_CLASS'>, ...)` create a config for this particular value.

In [35]:
def create_sanitized_config(obj):
    if is_dataclass(obj) and not isinstance(obj, type):
        # obj is a dataclass instance
        converted_fields = {}
        for field in fields(obj):
            if field.init:
                value = getattr(obj, field.name)
                converted_fields[field.name] = recursive_just(value)
        return builds(
            type(obj),
            populate_full_signature=True,
            hydra_convert="all",
            **converted_fields
        )(**converted_fields)

    else:
        return just(obj)

User(id=22, name='John Doe', friends=[0], age=None, height=50)

In [108]:
>>> recursive_just(User)
Builds_User

>>> recursive_just(User(age=10))
Builds_User(age=1)

types.Builds_User

In [24]:
jj = recursive_just(User(id="22"))

In [114]:
from hydra_zen.typing import Partial
from functools import partial
class A:
    def __init__(self, params, lr):
        self.lr=lr
        self.params=params

@dataclass
class Conf:
    optim: Partial[A] = partial(A, lr=1e-4)


In [118]:
instantiate(recursive_just(Conf()))

Conf(optim=functools.partial(<class '__main__.A'>, lr=0.0001))

## Response

Hi Jasha, I am finished teaching and am back in the swing of things at work! Thanks for your patience and understanding for the past few weeks.

## A Brief Review of This Proposal

Let me first summarize my understanding of the novel functionality that is being proposed here. Wherease `builds` and `just` are used to both used construct a config that describes a single object in one's program, `recursive_just` is meant to operate on hierarchical/nested structured configs. Specifically one to can write a hierearchical structured config without paying heed to Hydra's [1^] limited support for various type annotations[2^] and values[3^], and `recursive_just` will automatically create a Hydra-compatible form for it.

[1^]: For simplicity's sake, I am going to ignore the distinction between Hydra and omegaconf here, and just use Hydra as a catch-all.

[2^]: Because the "sanitized" config will instantiate to an instance of the original dataclass type, users can rely on this post-instantiation type information within their task function. 

[3^]: Values of types that [have registered support with hydra-zen](https://mit-ll-responsible-ai.github.io/hydra-zen/api_reference.html#configuration-value-types-supported-by-hydra-and-hydra-zen), e.g. `complex` numbers, will be automatically supported as well. 


To demonstrate the utility of this in practice, let's design a structured config using a pydantic-based dataclass, which we can make compatible with Hydra while retaining the runtime type-checking and rich types associated with our original dataclass.


```python
import dataclasses
from typing import List, Optional, Literal

from pydantic.dataclasses import dataclass as pydantic_dataclass
from pydantic import PositiveInt


@pydantic_dataclass
class User:
    name: str
    age: PositiveInt  # typically Hydra users would have to use `int` here


@pydantic_dataclass
class Match:
    user_a: User = User(name="Bob", age=25)
    user_b: User = User(name="Alice", age=27)
```

Now we can use `recursive_just` to make an Hydra-compatible config from this pydantic-based dataclass, and we still benefit from our rich type annotations and pydrantic's runtime type-checking via instantiation:  

```python
from hydra.errors import InstantiationException
conf = recursive_just((User(name="Bob", age=20)))

# Attempting to overwrite age with a negative number..
with raises(InstantiationException):
    # Pydantic's runtime type-checking raises an error during the instantiation process.
    instantiate(conf, age=-10)  # error: age < 0

profile: User = instantiate(conf, age=11)
assert isinstance(profile, User)
assert profile == User("Bob", 11)
```


And here is the recursion in action:

```python
from hydra_zen import to_yaml
sanitized_match = recursive_just(Match())

assert to_yaml(sanitized_match) == """\
_target_: __main__.Match
user_a:
  _target_: __main__.User
  name: Bob
  age: 25
user_b:
  _target_: __main__.User
  name: Alice
  age: 27
"""

assert instantiate(sanitized_match) == Match()
```

I like this proposal a lot! I think that point 1 both reduces the barrier to entry for new Hydra users while also enabling advanced patterns (e.g., the demonstrated cross-compatibility with pydantic!). The improved ergonomics identified point 2 is also a big win, and the gains here will grow as hydra-zen's auto-config support continues to improve (ISSUES).

## Implementation Details

One change to this prototype that I have identified thus far is that I would like to support dataclass-types in addition to the proposed dataclass-instances; this is motivated by the fact that Hydra largely treats dataclass types and instances on an equaly footing. Thus `recursive_just` could be applied directly to the `User` pydanatic-dataclass type [4^].

```python
# support for dataclass types (in addition to instances)
assert to_yaml(recursive_just(User)) == """\
_target_: __main__.User
name: ???
age: ???
"""

assert isinstance(recursive_just(User), type)  # Builds_User
assert not isinstance(recursive_just(User("bob", 12)), type)  # Builds_User("bob", 12)  
```

This is effectively identical to `builds(User, populate_full_signature=True)`, but it also includes recursion over fields that store nested structured configs, which is what makes the functionality unique. 

Supporting dataclass types impacts the nice property of idempotence that Jasha's implementation possesses. To remedy this, `recursive_just` will only apply to dataclasses that do not have a `_target_` field. This restriction prevents people from accidentally creating a structured config that instantiates to a structured config that instantiates to a structured config... and so on.

The original prototype should 

[4^]: One thought that I had is: is there value to perserving basic parent/child parity through `recursive_just`?:

  ```python
  assert isinstance(User(age=10), User)

  CompatUser = recursive_just(User)
  compat_user = recursive_just(User(age=10))  # 
  assert isinstance(compat_user, CompatUser)
  ```

  This would add substantial complications to the implementation, so I'd prefer to forego this unless there is a truly pressing benefit.


## The API

I was tempted to merge this functionality with `just`, perhaps via `just(..., recursive=True)`. Were we to stick with the original proposal, and only process 

In [147]:
recursive_just(builds(dict, a=1)(a=2))

ValueError: The field-name specified via `builds(..., _target_=<...>)` is reserved by Hydra.

In [130]:
from hydra_zen.structured_configs._type_guards import is_builds

def rjust(obj):
    if is_dataclass(obj) and not is_builds(obj):
        # obj is a dataclass type or instance that does not have a _target_ field
        type_ = obj if isinstance(obj, type) else type(obj)

        converted_fields = {}

        for field in fields(obj):
            if field.init and hasattr(obj, field.name):
                value = getattr(obj, field.name)
                converted_fields[field.name] = rjust(value)
        Conf = builds(
            type_,
            populate_full_signature=True,
            hydra_convert="all",
            **converted_fields
        )
        return Conf if type_ is obj else Conf()

    else:
        return just(obj)

In [144]:
rjust(User("bob", 12))

Builds_User(_target_='__main__.User', _convert_='all', name='bob', age=12)

In [146]:
DC1 = make_config(z=3)
DC = make_config("x", y=DC1(z=1+2j))

a = rjust(DC)
assert a is rjust(a)
pretty_print(rjust(rjust(DC)))
a.y is rjust(a).y

_target_: types.Config
_convert_: all
x: ???
'y':
  _target_: types.Config
  _convert_: all
  z:
    real: 1.0
    imag: 2.0
    _target_: builtins.complex



True

In [138]:
rjust(DC(1, 2))

Builds_Config(_target_='types.Config', _convert_='all', x=1, y=2)

In [123]:
pretty_print(recursive_just(recursive_just(User("hi", 1))))

ValueError: The field-name specified via `builds(..., _target_=<...>)` is reserved by Hydra.

In [105]:
from hydra_zen import to_yaml
sanitized_match = recursive_just(Match())

assert to_yaml(sanitized_match) == """\
_target_: __main__.Match
user_a:
  _target_: __main__.User
  name: Bob
  age: 25
user_b:
  _target_: __main__.User
  name: Alice
  age: 27
"""

assert instantiate(sanitized_match) == Match()

In [95]:
import dataclasses
from typing import List, Optional, Literal

from pydantic import Field
from pydantic.dataclasses import dataclass as pydantic_dataclass
from pydantic import PositiveInt


@pydantic_dataclass
class User:
    name: str
    age: PositiveInt

@pydantic_dataclass
class Match:
    user_a: User = User(name="Bob", age=25)
    user_b: User = User(name="Alice", age=27)

In [96]:
#from pydantic.errors import ValidationError as PydValidationError
from hydra.errors import InstantiationException
conf = recursive_just((User(name="Bob", age=20)))

with raises(InstantiationException):
    # Pydantic's runtime type-checking is applied by the instantiation process.
    instantiate(conf, age=-10)  # error: age < 0

profile = instantiate(conf, age=11)
assert isinstance(profile, User)
assert profile == User("Bob", 11)


In [98]:
instantiate(recursive_just(Match())) == Match()

True

In [60]:
c = Config(aggregation=Reducer(init=1.))
pretty_print(recursive_just(c))

_target_: __main__.Config
_convert_: all
aggregation:
  _target_: __main__.Reducer
  _convert_: all
  init: 1.0
  reduction_fn:
    _target_: hydra_zen.funcs.get_obj
    path: builtins.sum
imag_part:
  real: 1.0
  imag: 2.0
  _target_: builtins.complex



In [25]:
print(to_yaml(jj))

_target_: __main__.User
_convert_: all
id: 22
name: John Doe
friends:
- 0
age: null
height: null



In [30]:
instantiate(jj, height=100)

User(id=22, name='John Doe', friends=[0], age=None, height=100)

In [31]:
instantiate(jj, height=-10)

InstantiationException: Error in call to target '__main__.User':
ValidationError(model='User', errors=[{'loc': ('height',), 'msg': 'ensure this value is greater than or equal to 50', 'type': 'value_error.number.not_ge', 'ctx': {'limit_value': 50}}])

## Scratch

In [24]:
# def recursive_just_new(obj):
#     if is_dataclass(obj):
        
#         is_class_obj = isinstance(obj, type)
#         type_ = type(obj) if not is_class_obj else obj
        
#         # obj is a dataclass instance
        
#         converted_fields = {}
#         for field in fields(obj):
#             value = getattr(obj, field.name)
#             converted_fields[field.name] = recursive_just(value)
#         out = builds(type_, **converted_fields)
#         if not is_class_obj:
#             out = out()
#         return out

#     else:
#         return just(obj)

# @dataclass
# class FooBar:
#     field1: Interface1 = foo
#     field2: Interface2 = bar
#     field3: Any = 1




# foobar = Stuff(foo, bar, Nested(baz))

# instantiate(recursive_just(foobar)) == foobar
# instantiate(recursive_just(FooBar())) == FooBar()

In [1]:
from dataclasses import dataclass, field
from typing import List

@dataclass
class Foo:
    bar : List[int] = field(default_factory=lambda : [0])

In [2]:
from hydra_zen import builds, to_yaml

print(to_yaml(builds(Foo, populate_full_signature=True)))

[('_target_', <class 'str'>, Field(name=None,type=None,default='__main__.Foo',default_factory=<dataclasses._MISSING_TYPE object at 0x000001CF557F14F0>,init=False,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=None)), ('bar', typing.List[int], <factory>)]
HERE bar <factory>




AttributeError: type object 'Foo' has no attribute 'bar'

In [3]:
from inspect import signature
from dataclasses import fields

In [7]:
fields(Foo)[0].default_factory()

[0]

In [16]:
signature(Foo).parameters["bar"].default

<factory>

In [17]:
signature(Foo).parameters["bar"].default()

TypeError: '_HAS_DEFAULT_FACTORY_CLASS' object is not callable

In [10]:
from dataclasses import Field
isinstance(signature(Foo).parameters["bar"].default, Field)

False