Issue with use of configargparse and -h #5

bhigy · 2020-03-16T18:11:53Z

When platalea.config is imported in a script, it catches the -h (help) parameter before the script's specific parameters are added. As an example, python utils/evaluate_asr.py -h doesn't show the script's parameters (path, -b), only the global ones (e.g. data_root, meta). Importing config after the script has parsed the parameters would result in the opposite behavior.

@egpbos, any idea how this could be handled better?

The text was updated successfully, but these errors were encountered:

egpbos · 2020-03-17T07:16:44Z

Ah yeah, I was worried about that parse there, but didn't see yet how it would go wrong :)

It turns out there are now module-level "properties" (not really properties, but still pretty awesome, I didn't know before just now!), so I would go with that, see here: https://www.python.org/dev/peps/pep-0562/

So for config.py, replace the args, ... line with something like:

def __getattr__(name):
    """Only trigger parsing when it's needed."""
    if name == 'args':
        args, unknown_args = parser.parse_known_args()
        return args
    raise AttributeError(f"module '{__name__}' has no attribute '{name}'")

bhigy · 2020-03-17T16:01:13Z

I tried that but __getattr__ is called before my main script adds its own arguments.

egpbos · 2020-03-18T10:23:31Z

Ah yeah, that would happen when another module (that you import from your script probably) uses config.args or generally if config.args is accessed anywhere before all arguments are added to the parser... Can you find out where that happens and whether we can avoid that access?

bhigy · 2020-03-18T10:40:21Z

It is in dataset.py. We might be able to avoid the issue here but I feel like we might run into this issue agian with the current setup. I was wondering wether an alternative could be to use two different parsers: one for general level config, such as dataset path, which wouldn't have help associated with it (we could document those parameters somewhere else) and another one for the script specific parameters that would appear in the help message.

egpbos · 2020-04-01T06:42:51Z

Hm, then it wouldn't give any help message if you run a script that doesn't add the second parser.

Another option would be to just put all parameters in one file, even though we only use them in specific runs. We can indicate these experiment-specific parameters in the help section using groups: https://docs.python.org/3/library/argparse.html#argument-groups

So for instance, --epochs, which I added to basic-stack/run.py, should then go in this experiment-parameter group in __init__.py (or maybe we can move it to config.py since we don't need the named singleton parser anymore, we can just make it a simple member of config):

parser = configargparse.get_argument_parser(name='platalea')

# [... current options there ...]

experiment_arguments = parser.add_argument_group('experiment arguments', "Some experiment run scripts use the following optional parameters.")

experiment_arguments.add_argument('--epochs', action='store', default=32, dest='epochs', type=int,
                                  help='number of epochs after which to stop training (default: 32)')
# [... etc ...]

The help message is then:

python experiments/basic-stack/run.py -h
usage: run.py [-h] [--data_root DATA_ROOT] [--meta META] [--audio_features_fn AUDIO_FEATURES_FN]
              [--audio_subdir AUDIO_SUBDIR] [--image_subdir IMAGE_SUBDIR] [--epochs EPOCHS]

Args that start with '--' (eg. --data_root) can also be set in a config file (config.ini or config.yml). Config file
syntax allows: key=value, flag=true, stuff=[a,b,c] (for details, see syntax at https://goo.gl/R74nmi). If an arg is
specified in more than one place, then commandline values override environment variables which override config file
values which override defaults.

optional arguments:
  -h, --help            show this help message and exit
  --data_root DATA_ROOT
                        location of the flickr8k (or similar) dataset [env var: PLATALEA_DATA_ROOT] (default:
                        /roaming/gchrupal/datasets/flickr8k/)
  --meta META           filename of the metadata file (dataset.json or similar) relative to the dataset location [env
                        var: PLATALEA_METADATA_JSON] (default: dataset_multilingual.json)
  --audio_features_fn AUDIO_FEATURES_FN
                        filename of the MFCC audio features file relative to the dataset location [env var:
                        PLATALEA_AUDIO_FEATURES_FN] (default: mfcc_delta_features.pt)
  --audio_subdir AUDIO_SUBDIR
                        directory containing the flickr8k wav files, relative to the dataset location [env var:
                        PLATALEA_AUDIO_SUBDIR] (default: flickr_audio/wavs/)
  --image_subdir IMAGE_SUBDIR
                        directory containing the flickr8k image files, relative to the dataset location [env var:
                        PLATALEA_IMAGE_SUBDIR] (default: Flickr8k_Dataset/Flicker8k_Dataset/)

experiment arguments:
  Some experiment run scripts use the following optional parameters.

  --epochs EPOCHS       number of epochs after which to stop training (default: 32) (default: 32)

A downside of this approach is that we may at some point end up with a huge bunch of experiment arguments, but we can always reevaluate once this happens. Also, we may need slightly different variants/interpretations for different experiments. However, if we document this properly, it shouldn't be a big issue.

What do you think?

egpbos · 2020-04-01T06:45:50Z

Btw, this approach also means we can get rid of the perhaps slightly overly complicated get_attr trick again :P I love it, but sometimes you have to kill your darlings ;)

bhigy · 2020-04-08T15:11:21Z

I am not really fond of adding all parameters in one file for 2 main reasons:

it will quickly make the help unreadable
while some parameters (e.g. number of epochs) apply to most cases, some are specific to one script. I think things will get very confused because of this.

I would rather go for one of the following option:

have only important parameters for each script in the help and general configuration elements visible somewhere else (e.g. through python config.py -h)
keep the get_attr trick and remove the accesses to the parameters in function definitions so that the script can add it's own parameters before help is displayed

egpbos · 2020-04-09T15:19:50Z

After discussion, I think we came up with the following idea, please correct me if I misunderstood:

We create a new class that internally uses the ConfigArgParse parser, but by default does not parse, it only parses when the user asks it to. Moreover, it exposes the default parameters so that we can still use those when parsing has not taken place. When parsing is activated, the class updates the parameters from their default values to the values from the parser. The parser has a set of default options included (those that are used inside the main platalea module, like data_root and filenames for feature files), but specific options can be added for each experiment.

The interface and use will be something like this:

class PlataleaConfig:
  def __init__(self):
    # setup ConfigArgParse parser and default parameters in self.stuff dicts

  def add_argument(self, *args, **kwargs):
    self._parser.add_argument(*args, **kwargs)

  def parse(self):
    args, _ = self.parser.parse_known_args()
    # update default parameters dict(s) using args results

  def __getattr__(self, arg_name):
    return self._stuff[arg_name]

bhigy · 2020-04-15T11:52:11Z

Yes, that's it.

egpbos mentioned this issue Apr 8, 2020

WIP: start adding Transformer based audio encoder #3

Merged

egpbos mentioned this issue May 5, 2020

added config file for platalea spokenlanguage/flickr1d#3

Merged

This was referenced May 25, 2020

Config parser #14

Closed

Config parser 2 #15

Merged

egpbos self-assigned this Jun 2, 2020

egpbos closed this as completed in #15 Jun 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with use of configargparse and -h #5

Issue with use of configargparse and -h #5

bhigy commented Mar 16, 2020

egpbos commented Mar 17, 2020

bhigy commented Mar 17, 2020

egpbos commented Mar 18, 2020

bhigy commented Mar 18, 2020

egpbos commented Apr 1, 2020

egpbos commented Apr 1, 2020

bhigy commented Apr 8, 2020

egpbos commented Apr 9, 2020 •

edited

Loading

bhigy commented Apr 15, 2020

Issue with use of configargparse and -h #5

Issue with use of configargparse and -h #5

Comments

bhigy commented Mar 16, 2020

egpbos commented Mar 17, 2020

bhigy commented Mar 17, 2020

egpbos commented Mar 18, 2020

bhigy commented Mar 18, 2020

egpbos commented Apr 1, 2020

egpbos commented Apr 1, 2020

bhigy commented Apr 8, 2020

egpbos commented Apr 9, 2020 • edited Loading

bhigy commented Apr 15, 2020

egpbos commented Apr 9, 2020 •

edited

Loading