Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subclassed ArgumentParsers are not respected #171

Closed
Erotemic opened this issue Oct 7, 2022 · 3 comments · Fixed by #225
Closed

Subclassed ArgumentParsers are not respected #171

Erotemic opened this issue Oct 7, 2022 · 3 comments · Fixed by #225
Labels
enhancement New feature or request

Comments

@Erotemic
Copy link
Contributor

Erotemic commented Oct 7, 2022

🐛 Bug report

If jsonargparse.ArgumentParser is subclassed, it will fail to execute subclassed functionality for subclass arguments.

This is because (in the case of the help string) in jsonargparse/actions.py in print_help. It calls:

        subparser = import_object('jsonargparse.ArgumentParser')()
        subparser.add_class_arguments(val_class, dest, **self.sub_add_kwargs)

which creates an empty instance of jsonargparse.ArgumentParser to continue argument processing, and not whatever subclass originally invoked the parsing callstack.

I'm actually not 100% sure what is happening in the non-help case, but if I monkey patch jsonargparse.ArgumentParser and jsonargparse.core.ArgumentParser that does seem to resolve the issue. However, I'd like to avoid monkey patching if possible, hence the bug report.

To reproduce

The following is a MWE where I've disabled the monkey patch to make it work in the way I would expect:

import jsonargparse
import inspect
from jsonargparse.parameter_resolvers import ParamData
from jsonargparse.signatures import get_signature_parameters
from jsonargparse.signatures import get_doc_short_description


class MyArgumentParser(jsonargparse.ArgumentParser):
    def _add_signature_arguments(
        self,
        function_or_class,
        method_name,
        nested_key,
        as_group: bool = True,
        as_positional: bool = False,
        skip=None,
        fail_untyped: bool = True,
        sub_configs: bool = False,
        instantiate: bool = True,
        linked_targets=None
    ) -> list[str]:
        ## Create group if requested ##
        doc_group = get_doc_short_description(function_or_class, method_name, self.logger)
        component = getattr(function_or_class, method_name) if method_name else function_or_class
        group = self._create_group_if_requested(component, nested_key, as_group, doc_group, instantiate=instantiate)

        params = get_signature_parameters(function_or_class, method_name, logger=self.logger)

        if hasattr(function_or_class, '__custom__'):
            # Hack to insert our method for explicit parameterization
            __custom__ = function_or_class.__custom__
            for key, info in __custom__.items():
                type = info.get('type', None)
                if type is None or not isinstance(type, type):
                    annotation = inspect._empty
                else:
                    annotation = type
                param = ParamData(
                    name=key,
                    annotation=annotation,
                    kind=inspect.Parameter.KEYWORD_ONLY,
                    default=info.get('value', None),
                    doc=info.get('help', None),
                    component=function_or_class.__init__,
                    parent=function_or_class,
                )
                params.append(param)

        ## Add parameter arguments ##
        added_args = []
        for param in params:
            self._add_signature_parameter(
                group,
                nested_key,
                param,
                added_args,
                skip,
                fail_untyped=fail_untyped,
                sub_configs=sub_configs,
                linked_targets=linked_targets,
                as_positional=as_positional,
            )
        # import ubelt as ub
        # print('added_args = {}'.format(ub.repr2(added_args, nl=1)))
        return added_args


class ClassA:
    def __init__(self, a=None):
        self.a = a


class ClassB(ClassA):
    __custom__ = {
        'b_name': {'value': 'abc', 'type': str, help: 'hello world'}
    }
    def __init__(self, b=3, **kwargs):
        self.b = b
        self.__dict__.update(kwargs)


class ClassC:
    __custom__ = {
        'c_name': {'type': str, help: 'hello world'}
    }
    def __init__(self, c=None, **kwargs):
        self.c = c
        self.__dict__.update(kwargs)


# Monkey patch jsonargparse so its subcommands use our extended functionality
MONKEY_PATCH = 0
if MONKEY_PATCH:
    jsonargparse.ArgumentParser = MyArgumentParser
    jsonargparse.core.ArgumentParser = MyArgumentParser


def main():
    parser = MyArgumentParser()
    parser.add_subclass_arguments(ClassA, nested_key='class_a', fail_untyped=False)
    parser.add_class_arguments(ClassC, nested_key='class_c', fail_untyped=False, sub_configs=True)
    config = parser.parse_args()
    instances = parser.instantiate_classes(config)
    if hasattr(instances, 'class_a'):
        print(f'{instances.class_a.__dict__=}')
    print(f'{instances.class_c.__dict__=}')


if __name__ == '__main__':
    """
    CommandLine:
        python jsonargparse_subclass_mwe.py --help
        python jsonargparse_subclass_mwe.py --class_a.help ClassB
        python jsonargparse_subclass_mwe.py --class_a=ClassB --class_a.init_args.b_name=foo
        python jsonargparse_subclass_mwe.py --class_c.c_name=bar
        python jsonargparse_subclass_mwe.py
    """
    main()

Without the monkey patch if I invoke:

        python jsonargparse_subclass_mwe.py --class_a.help ClassB

I get:

A cfg=Namespace(class_c=Namespace(c=None, c_name=None))
namespace=None
usage: jsonargparse_subclass_mwe.py [--class_a.init_args.b B]

<class '__main__.ClassB'>:
  --class_a.init_args.b B
                        (type: Any, default: 3)

which does not respect the customized arguments injected by my subclass.

Furthermore, running: python jsonargparse_subclass_mwe.py --class_a=ClassB --class_a.init_args.b_name=foo doesn't even work:

jsonargparse_subclass_mwe.py: error: Parser key "class_a": Problem with given class_path "__main__.ClassB":
  - 'Configuration check failed :: No action for destination key "b_name" to check its value.'

Expected behavior

If I include the monkey patch then invoking the above help command correctly results in the customized arguments being respected:

usage: jsonargparse_subclass_mwe.py [--class_a.init_args.b B] [--class_a.init_args.b_name B_NAME]

<class '__main__.ClassB'>:
  --class_a.init_args.b B
                        (type: Any, default: 3)
  --class_a.init_args.b_name B_NAME
                        (type: Any, default: abc)

And if I invoke: python jsonargparse_subclass_mwe.py --class_a=ClassB --class_a.init_args.b_name=foo I get

instances.class_a.__dict__={'b': 3, 'b_name': 'foo'}
instances.class_c.__dict__={'c': None, 'c_name': None}

as I would expect.

Environment

jsonargparse.version = 4.14.1

Notes:

Fixing this issue naively would break LightningCLI as noted here: Lightning-AI/pytorch-lightning#15038

@Erotemic Erotemic added the bug Something isn't working label Oct 7, 2022
@mauvilsa
Copy link
Member

In commit 7813142 I fixed the print_help issue you mentioned. I guess the only other fix that would be needed is in typehints.py#L424-L431, but that one is not as easy.

Having said this, I need to give a fair warning. jsonargparse was not designed from the ground up having in mind all of the features it now has. Its development has been highly based on real world needs that came up through use. This added to the fact that argparse is also not designed to be extended as much as jsonargparse had done, the code is not as clean as I would have wanted it to be. There are plans to continuously refactor and clean up the internals, which is why I would recommend only using the public API as mentioned in https://jsonargparse.readthedocs.io/en/stable/#api-reference.

If you do want extend private classes and methods, bear in mind that when your code breaks due to internal changes, it will not be considered a bug. Note that by private I do not mean just names starting with _, but everything not mentioned in the public API or even assumptions on how internals work. There is no problem introducing improvements such as the print_help in actions.py.

@Erotemic
Copy link
Contributor Author

Thank you for fixing the issue where it was feasible.

I'll certainly accept the risk of having my code break as the API improves (very easy to detect because I setup my CI's to run the minimum and latest versions of dependencies). It's easy to pin versions to keep production smooth, and I feel it's important to contribute back to FOSS projects like this and help fix/improve things when said breakage situations occur.

I'm also fine with keeping in this small monkey patch to "make things work" for the foreseeable future. I don't see an immediately easy way to update that staticmethod with the information it would need to know which specific subclass of ArgumentParser it should be associated with. But perhaps cleaned up internals would be able to overcome this. I'm willing to wait for (or potentially help with) that.

More details on my use case

FWIW, my immediate goal is to integrate jsonargparse with my yet-another-argparse-alternative: scriptconfig. The way scriptconfig works is that a CLI configuration is defined using a dictionary or data-class-like class. The keys are argument names, and the values are either the default value, or a special Value object that makes it clear exactly what metadata an argument can hold. In the context where I'm using it to configure a reasonably complex class, the way I usually work with it is defining a top-level configuration object and then using kwargs to populate that object. E.g.

import scriptconfig as scfg


class MyClassConf(scfg.DataConfig):
    param1 = scfg.Value('a', help='param1')
    flag1 = scfg.Value(True, isflag=True, help='a flag')


class MyClass:
    __config__ = MyClassConf
    def __init__(self, other_arg2, **kwargs):
        self.other_arg2 = other_arg2
        self.config = self.__config__(**kwargs)

So in order to seemlessly transition between scriptconfig and jsonargparse, when MyClass is added as a class arguments via jsonargparse I want to detect that MyClass contains a __config__ scriptconfig type and then parse the options out of that type (which scriptconfig exposes via the __defaults__ attribute). So very similar to what I described above, I simply checked for this and then created a ParamData entry for all of the extra definitions that jsonargparse wouldn't be able to infer from the kwargs usage.

I think the more general feature that jsonargparse might consider that would make this entire issue of subclassing moot is if future designs checked classes for some special attribute that could allow the user to be explicit about what arguments could be introspected from them. I love the __init__ and __doc__ parsing as a first line of defense, but when the number of init arguments becomes large, I feel it's better to have a separate enumeration of exactly what the param names are that is coupled with their documentation. Adding something in jsonargparse's internals that checks for a class method (e.g. add_argparse_args or something similar) and expects it to modify some state or return some value indicating what are the extra params it should respect would be nice, although I do think it would require something beyond making a list of extra ParamData objects. I'm not exactly sure what the best way to do it would be, hence why I opted to extend ArgumentParser to try things out.

mauvilsa added a commit that referenced this issue Dec 27, 2022
- Fixed parent parser logger not being forwarded to subcommand and internal parsers.
- CLI now supports the parser_class parameter.
- Parsing steps logging now at debug level.
- Discarding init_args warning changed to log at debug level.
- Removed replacing list instead of append warning.
- Refactored some of the context managers and context variables.
@mauvilsa mauvilsa mentioned this issue Dec 27, 2022
6 tasks
mauvilsa added a commit that referenced this issue Dec 27, 2022
- Fixed parent parser logger not being forwarded to subcommand and internal parsers.
- CLI now supports the parser_class parameter.
- Parsing steps logging now at debug level.
- Discarding init_args warning changed to log at debug level.
- Removed replacing list instead of append warning.
- Refactored some of the context managers and context variables.
mauvilsa added a commit that referenced this issue Dec 27, 2022
- Fixed parent parser logger not being forwarded to subcommand and internal parsers.
- CLI now supports the parser_class parameter.
- Parsing steps logging now at debug level.
- Discarding init_args warning changed to log at debug level.
- Removed replacing list instead of append warning.
- Refactored some of the context managers and context variables.
@mauvilsa mauvilsa added enhancement New feature or request and removed bug Something isn't working labels Dec 27, 2022
@mauvilsa
Copy link
Member

#225 fixed this. However, when using lazy_instance the original ArgumentParser class is still used, see typehints.py#L1126-L1127. Lazy instances are created out of the context of any parser, so there is no way find out if a parser subclass should be used. This case will not be fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants