Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with use of configargparse and -h #5

Closed
bhigy opened this issue Mar 16, 2020 · 9 comments · Fixed by #15
Closed

Issue with use of configargparse and -h #5

bhigy opened this issue Mar 16, 2020 · 9 comments · Fixed by #15
Assignees

Comments

@bhigy
Copy link
Contributor

bhigy commented Mar 16, 2020

When platalea.config is imported in a script, it catches the -h (help) parameter before the script's specific parameters are added. As an example, python utils/evaluate_asr.py -h doesn't show the script's parameters (path, -b), only the global ones (e.g. data_root, meta). Importing config after the script has parsed the parameters would result in the opposite behavior.

@egpbos, any idea how this could be handled better?

@egpbos
Copy link
Collaborator

egpbos commented Mar 17, 2020

Ah yeah, I was worried about that parse there, but didn't see yet how it would go wrong :)

It turns out there are now module-level "properties" (not really properties, but still pretty awesome, I didn't know before just now!), so I would go with that, see here: https://www.python.org/dev/peps/pep-0562/

So for config.py, replace the args, ... line with something like:

def __getattr__(name):
    """Only trigger parsing when it's needed."""
    if name == 'args':
        args, unknown_args = parser.parse_known_args()
        return args
    raise AttributeError(f"module '{__name__}' has no attribute '{name}'")

@bhigy
Copy link
Contributor Author

bhigy commented Mar 17, 2020

I tried that but __getattr__ is called before my main script adds its own arguments.

@egpbos
Copy link
Collaborator

egpbos commented Mar 18, 2020

Ah yeah, that would happen when another module (that you import from your script probably) uses config.args or generally if config.args is accessed anywhere before all arguments are added to the parser... Can you find out where that happens and whether we can avoid that access?

@bhigy
Copy link
Contributor Author

bhigy commented Mar 18, 2020

It is in dataset.py. We might be able to avoid the issue here but I feel like we might run into this issue agian with the current setup. I was wondering wether an alternative could be to use two different parsers: one for general level config, such as dataset path, which wouldn't have help associated with it (we could document those parameters somewhere else) and another one for the script specific parameters that would appear in the help message.

@egpbos
Copy link
Collaborator

egpbos commented Apr 1, 2020

Hm, then it wouldn't give any help message if you run a script that doesn't add the second parser.

Another option would be to just put all parameters in one file, even though we only use them in specific runs. We can indicate these experiment-specific parameters in the help section using groups: https://docs.python.org/3/library/argparse.html#argument-groups

So for instance, --epochs, which I added to basic-stack/run.py, should then go in this experiment-parameter group in __init__.py (or maybe we can move it to config.py since we don't need the named singleton parser anymore, we can just make it a simple member of config):

parser = configargparse.get_argument_parser(name='platalea')

# [... current options there ...]

experiment_arguments = parser.add_argument_group('experiment arguments', "Some experiment run scripts use the following optional parameters.")

experiment_arguments.add_argument('--epochs', action='store', default=32, dest='epochs', type=int,
                                  help='number of epochs after which to stop training (default: 32)')
# [... etc ...]

The help message is then:

python experiments/basic-stack/run.py -h
usage: run.py [-h] [--data_root DATA_ROOT] [--meta META] [--audio_features_fn AUDIO_FEATURES_FN]
              [--audio_subdir AUDIO_SUBDIR] [--image_subdir IMAGE_SUBDIR] [--epochs EPOCHS]

Args that start with '--' (eg. --data_root) can also be set in a config file (config.ini or config.yml). Config file
syntax allows: key=value, flag=true, stuff=[a,b,c] (for details, see syntax at https://goo.gl/R74nmi). If an arg is
specified in more than one place, then commandline values override environment variables which override config file
values which override defaults.

optional arguments:
  -h, --help            show this help message and exit
  --data_root DATA_ROOT
                        location of the flickr8k (or similar) dataset [env var: PLATALEA_DATA_ROOT] (default:
                        /roaming/gchrupal/datasets/flickr8k/)
  --meta META           filename of the metadata file (dataset.json or similar) relative to the dataset location [env
                        var: PLATALEA_METADATA_JSON] (default: dataset_multilingual.json)
  --audio_features_fn AUDIO_FEATURES_FN
                        filename of the MFCC audio features file relative to the dataset location [env var:
                        PLATALEA_AUDIO_FEATURES_FN] (default: mfcc_delta_features.pt)
  --audio_subdir AUDIO_SUBDIR
                        directory containing the flickr8k wav files, relative to the dataset location [env var:
                        PLATALEA_AUDIO_SUBDIR] (default: flickr_audio/wavs/)
  --image_subdir IMAGE_SUBDIR
                        directory containing the flickr8k image files, relative to the dataset location [env var:
                        PLATALEA_IMAGE_SUBDIR] (default: Flickr8k_Dataset/Flicker8k_Dataset/)

experiment arguments:
  Some experiment run scripts use the following optional parameters.

  --epochs EPOCHS       number of epochs after which to stop training (default: 32) (default: 32)

A downside of this approach is that we may at some point end up with a huge bunch of experiment arguments, but we can always reevaluate once this happens. Also, we may need slightly different variants/interpretations for different experiments. However, if we document this properly, it shouldn't be a big issue.

What do you think?

@egpbos
Copy link
Collaborator

egpbos commented Apr 1, 2020

Btw, this approach also means we can get rid of the perhaps slightly overly complicated get_attr trick again :P I love it, but sometimes you have to kill your darlings ;)

@bhigy
Copy link
Contributor Author

bhigy commented Apr 8, 2020

I am not really fond of adding all parameters in one file for 2 main reasons:

  • it will quickly make the help unreadable
  • while some parameters (e.g. number of epochs) apply to most cases, some are specific to one script. I think things will get very confused because of this.

I would rather go for one of the following option:

  1. have only important parameters for each script in the help and general configuration elements visible somewhere else (e.g. through python config.py -h)
  2. keep the get_attr trick and remove the accesses to the parameters in function definitions so that the script can add it's own parameters before help is displayed

@egpbos
Copy link
Collaborator

egpbos commented Apr 9, 2020

After discussion, I think we came up with the following idea, please correct me if I misunderstood:

We create a new class that internally uses the ConfigArgParse parser, but by default does not parse, it only parses when the user asks it to. Moreover, it exposes the default parameters so that we can still use those when parsing has not taken place. When parsing is activated, the class updates the parameters from their default values to the values from the parser. The parser has a set of default options included (those that are used inside the main platalea module, like data_root and filenames for feature files), but specific options can be added for each experiment.

The interface and use will be something like this:

class PlataleaConfig:
  def __init__(self):
    # setup ConfigArgParse parser and default parameters in self.stuff dicts

  def add_argument(self, *args, **kwargs):
    self._parser.add_argument(*args, **kwargs)

  def parse(self):
    args, _ = self.parser.parse_known_args()
    # update default parameters dict(s) using args results

  def __getattr__(self, arg_name):
    return self._stuff[arg_name]

@bhigy
Copy link
Contributor Author

bhigy commented Apr 15, 2020

Yes, that's it.

This was referenced May 25, 2020
@egpbos egpbos self-assigned this Jun 2, 2020
@egpbos egpbos closed this as completed in #15 Jun 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants