Skip to content

A configue extension that adds the ability to dynamically configure your application via the command line.

License

Notifications You must be signed in to change notification settings

illuin-tech/configue-cli

Repository files navigation

Configue CLI

A configue extension that adds the ability to dynamically configure your application via the command line.

Configue CLI overlaps in functionality with Hydra but without all the unnecessary boilerplate and with the benefit of being compatible with configue.

Table of contents

Installation

To install the library, use

pip install configue-cli

To develop locally, clone the repository and use

pip install -r requirements-dev.txt

Quick start

With configue-cli, configurations are defined with structured and arbitrarily nested Python objects (both native dataclasses and attr dataclasses are supported and can be nested).

import dataclasses
import attrs


@dataclasses.dataclass
class DatasetConfig:
    name: str
    n_samples: int = 10_000


@dataclasses.dataclass
class OptimizerConfig:
    learning_rate: float = 0.001
    weight_decay: float = 1e-2


@attrs.define
class ModelConfig:
    name: str
    batch_size: int = 12
    optimizer: OptimizerConfig = attrs.Factory(
        lambda self: OptimizerConfig(learning_rate=0.001 * self.batch_size), takes_self=True
    )


@dataclasses.dataclass
class ExperimentConfig:
    model: ModelConfig
    dataset: DatasetConfig

These objects are injected at configuration time in your application entrypoint by the inject_from_cli decorator. To use configue-cli, simply wrap a click entrypoint with the configue_cli.click.inject_from_cli decorator and provide a target type to be injected.

import click
from configue_cli.click import inject_from_cli

@click.command()
@inject_from_cli(ExperimentConfig)
def main(config: ExperimentConfig) -> None:
    print("Passed configuration: ", config)


if __name__ == "__main__":
    main()

To display a help message, use the following:

python main.py --help

Inspection of the configuration state

To visually inspect your application configuration state, use the following command:

$ python main.py --dry-run

╭─ Configuration helper ────────────────────────────────╮
│                                                       │
│  model                                                │
│  ├── (): __main__.ModelConfig                         │
│  ├── name: Missing                                    │
│  ├── batch_size: 12                                   │
│  └── optimizer                                        │
│      ├── (): __main__.OptimizerConfig                 │
│      ├── learning_rate: 0.012                         │
│      └── weight_decay: 0.01                           │
│                                                       │
│  dataset                                              │
│  ├── (): __main__.DatasetConfig                       │
│  ├── name: Missing                                    │
│  └── n_samples: 10000                                 │
│                                                       │
╰───────────────────────────────────────────────────────╯

This is useful to quickly identify which parameters are not yet defined (those marked with a Missing) and which values are used in the other parameters without inspecting the code.

Configuration from the command line

Parameters can be specified from the command line using dotted notation.

$ python main.py model.name=camembert-base dataset.name=fquad model.batch_size=48

╭─ Configuration ───────────────────────────────────────────────────────────────────────────╮
│                                                                                           │
│  model                                                                                    │
│  ├── (): __main__.ModelConfig                                                             │
│  ├── name: camembert-base                                                                 │
│  ├── batch_size: 48                                                                       │
│  └── optimizer                                                                            │
│      ├── (): __main__.OptimizerConfig                                                     │
│      ├── learning_rate: 0.048                                                             │
│      └── weight_decay: 0.01                                                               │
│                                                                                           │
│  dataset                                                                                  │
│  ├── (): __main__.DatasetConfig                                                           │
│  ├── name: fquad                                                                          │
│  └── n_samples: 10000                                                                     │
│                                                                                           │
╰───────────────────────────────────────────────────────────────────────────────────────────╯
Passed configuration: ExperimentConfig(model=ModelConfig(name='camembert-base', batch_size=48, optimizer=OptimizerConfig(learning_rate=0.048, weight_decay=0.01)), dataset=DatasetConfig(name='fquad', n_samples=10000))

Any missing required parameter at configuration time will result in an exception:

$ python main.py model.batch_size=3

Traceback (most recent call last):
  ...
configue_cli.core.exceptions.MissingMandatoryValueError: Missing mandatory value: dataset.name

Configuration with YAML files

Any parameter can be overridden using a configue compliant YAML file. Suppose the model is configured in the following model.yml file:

model:
  (): __main__.ModelConfig
  name: camembert-large
  batch_size: 72
  optimizer:
    (): __main__.OptimizerConfig
    learning_rate: 0.01
    weight_decay: 0.0

This configuration file can be loaded from the CLI using the -c flag:

$ python main.py -c model.yml --dry-run

╭─ Configuration helper ────────────────────────────────────╮
│                                                           │
│  model                                                    │
│  ├── (): __main__.ModelConfig                             │
│  ├── name: camembert-large                                │
│  ├── batch_size: 72                                       │
│  └── optimizer                                            │
│      ├── (): __main__.OptimizerConfig                     │
│      ├── learning_rate: 0.01                              │
│      └── weight_decay: 0.0                                │
│                                                           │
│  dataset                                                  │
│  ├── (): __main__.DatasetConfig                           │
│  ├── name: Missing                                        │
│  └── n_samples: 10000                                     │
│                                                           │
╰───────────────────────────────────────────────────────────╯

Multiple configuration files can be used simultaneously, the final configuration is assembled by merging all files in the order they are provided. For instance, let's suppose we have the following large_batch.yml file:

model:
  batch_size: 512

This file can be merged into our previous configuration using the following:

$ python main.py -c model.yml -c large_batch.yml --dry-run 

╭─ Configuration helper ────────────────────────────────────╮
│                                                           │
│  model                                                    │
│  ├── (): __main__.ModelConfig                             │
│  ├── name: camembert-large                                │
│  ├── batch_size: 512                                      │
│  └── optimizer                                            │
│      ├── (): __main__.OptimizerConfig                     │
│      ├── learning_rate: 0.01                              │
│      └── weight_decay: 0.0                                │
│                                                           │
│  dataset                                                  │
│  ├── (): __main__.DatasetConfig                           │
│  ├── name: Missing                                        │
│  └── n_samples: 10000                                     │
│                                                           │
╰───────────────────────────────────────────────────────────╯

Parameters specified with the command line take precedence over the ones specified in YAML files:

$ python main.py model.batch_size=32 -c model.yml -c large_batch.yml --dry-run

╭─ Configuration helper ────────────────────────────────────╮
│                                                           │
│  model                                                    │
│  ├── (): __main__.ModelConfig                             │
│  ├── name: camembert-large                                │
│  ├── batch_size: 32                                       │
│  └── optimizer                                            │
│      ├── (): __main__.OptimizerConfig                     │
│      ├── learning_rate: 0.01                              │
│      └── weight_decay: 0.0                                │
│                                                           │
│  dataset                                                  │
│  ├── (): __main__.DatasetConfig                           │
│  ├── name: Missing                                        │
│  └── n_samples: 10000                                     │
│                                                           │
╰───────────────────────────────────────────────────────────╯

This feature encourages a modular configuration pattern where different subparts of the application (the model and the dataset in this example) are configured in separate YAML files and are dynamically assembled at configuration time. Different variations of these subparts can easily be assembled. All arguments can be overridden using the command line without having to edit the config files.

Exporting the final configuration

To ease reproducibility, the final configuration used for the run can be exported by using the -o flag and specifying an output YAML file:

$ python main.py dataset.name=hello-world -c model.yml -c large_batch.yml -o output.yml

╭─ Configuration ───────────────────────────────────────────╮
│                                                           │
│  model                                                    │
│  ├── (): __main__.ModelConfig                             │
│  ├── name: camembert-large                                │
│  ├── batch_size: 512                                      │
│  └── optimizer                                            │
│      ├── (): __main__.OptimizerConfig                     │
│      ├── learning_rate: 0.01                              │
│      └── weight_decay: 0.0                                │
│                                                           │
│  dataset                                                  │
│  ├── (): __main__.DatasetConfig                           │
│  ├── name: hello-world                                    │
│  └── n_samples: 10000                                     │
│                                                           │
╰───────────────────────────────────────────────────────────╯
Passed configuration ExperimentConfig(model=ModelConfig(name='camembert-large', batch_size=512, optimizer=OptimizerConfig(learning_rate=0.01, weight_decay=0.0)), dataset=DatasetConfig(name='hello-world', n_samples=10000))

$ cat output.yml
model:
  (): __main__.ModelConfig
  name: camembert-large
  batch_size: 512
  optimizer:
    (): __main__.OptimizerConfig
    learning_rate: 0.01
    weight_decay: 0.0
dataset:
  (): __main__.DatasetConfig
  name: hello-world
  n_samples: 10000

Unstructured configuration

It is possible to use the inject_from_cli decorator without specifying a target type:

@click.command()
@inject_from_cli()
def main(config: configue_cli.core.dict_config.DictConfig) -> None:
    ...

In that case, the wrapped entrypoint will be passed a configue_cli.core.dict_config.DictConfig object upon injection.

Configuring the logging

To load a logging configuration located under the "logging" key in your final configuration, use the following:

@click.command()
@inject_from_cli(ExperimentConfig, logging_config_path="logging")
def main(config: ExperimentConfig) -> None:
    ...

Integration with Skypilot

SkyPilot is a framework for easily running jobs on any cloud through a unified interface. Any function decorated with inject_from_cli can easily be executed remotely by providing a Skypilot configuration.

The following configuration defines a job to be executed in a SkyPilot cluster named test-cluster. The job is defined under the task key, we refer to the SkyPilot YAML specification for more details on this section.

The Python command and all its arguments are captured and interpolated inside the run command, respectively in a {command} and {parameters} placeholder.

# skypilot.yml
skypilot:
  cluster-name: test-cluster
  task:
    resources:
      cloud: gcp
      accelerators: K80:1
    workdir: .
    setup: |
      echo 'Setup the job...'
    run: |
      set -e
      cd ~/sky_workdir
      {command} {parameters}

To load the SkyPilot configuration in your final configuration, use the following:

@click.command()
@inject_from_cli(ExperimentConfig, skypilot_config_path="skypilot")
def main(config: ExperimentConfig) -> None:
    ...

As with the other arguments, all SkyPilot configuration arguments can be redefined on the fly:

python main.py -c skypilot.yml skypilot.cluster-name=another-cluster

About

A configue extension that adds the ability to dynamically configure your application via the command line.

Resources

License

Stars

Watchers

Forks

Packages

No packages published