A configue extension that adds the ability to dynamically configure your application via the command line.
Configue CLI overlaps in functionality with Hydra but without all the unnecessary boilerplate and with the benefit of being compatible with configue
.
- Installation
- Quick start
- Inspection of the configuration state
- Configuration from the command line
- Configuration with YAML files
- Exporting the final configuration
- Unstructured configuration
- Configuring the logging
- Integration with Skypilot
To install the library, use
pip install configue-cli
To develop locally, clone the repository and use
pip install -r requirements-dev.txt
With configue-cli
, configurations are defined with structured and arbitrarily nested Python objects (both native dataclasses and attr
dataclasses are supported and can be nested).
import dataclasses
import attrs
@dataclasses.dataclass
class DatasetConfig:
name: str
n_samples: int = 10_000
@dataclasses.dataclass
class OptimizerConfig:
learning_rate: float = 0.001
weight_decay: float = 1e-2
@attrs.define
class ModelConfig:
name: str
batch_size: int = 12
optimizer: OptimizerConfig = attrs.Factory(
lambda self: OptimizerConfig(learning_rate=0.001 * self.batch_size), takes_self=True
)
@dataclasses.dataclass
class ExperimentConfig:
model: ModelConfig
dataset: DatasetConfig
These objects are injected at configuration time in your application entrypoint by the inject_from_cli
decorator. To use configue-cli
, simply wrap a click entrypoint with the configue_cli.click.inject_from_cli
decorator and provide a target type to be injected.
import click
from configue_cli.click import inject_from_cli
@click.command()
@inject_from_cli(ExperimentConfig)
def main(config: ExperimentConfig) -> None:
print("Passed configuration: ", config)
if __name__ == "__main__":
main()
To display a help message, use the following:
python main.py --help
To visually inspect your application configuration state, use the following command:
$ python main.py --dry-run
╭─ Configuration helper ────────────────────────────────╮
│ │
│ model │
│ ├── (): __main__.ModelConfig │
│ ├── name: Missing │
│ ├── batch_size: 12 │
│ └── optimizer │
│ ├── (): __main__.OptimizerConfig │
│ ├── learning_rate: 0.012 │
│ └── weight_decay: 0.01 │
│ │
│ dataset │
│ ├── (): __main__.DatasetConfig │
│ ├── name: Missing │
│ └── n_samples: 10000 │
│ │
╰───────────────────────────────────────────────────────╯
This is useful to quickly identify which parameters are not yet defined (those marked with a Missing
) and which values are used in the other parameters without inspecting the code.
Parameters can be specified from the command line using dotted notation.
$ python main.py model.name=camembert-base dataset.name=fquad model.batch_size=48
╭─ Configuration ───────────────────────────────────────────────────────────────────────────╮
│ │
│ model │
│ ├── (): __main__.ModelConfig │
│ ├── name: camembert-base │
│ ├── batch_size: 48 │
│ └── optimizer │
│ ├── (): __main__.OptimizerConfig │
│ ├── learning_rate: 0.048 │
│ └── weight_decay: 0.01 │
│ │
│ dataset │
│ ├── (): __main__.DatasetConfig │
│ ├── name: fquad │
│ └── n_samples: 10000 │
│ │
╰───────────────────────────────────────────────────────────────────────────────────────────╯
Passed configuration: ExperimentConfig(model=ModelConfig(name='camembert-base', batch_size=48, optimizer=OptimizerConfig(learning_rate=0.048, weight_decay=0.01)), dataset=DatasetConfig(name='fquad', n_samples=10000))
Any missing required parameter at configuration time will result in an exception:
$ python main.py model.batch_size=3
Traceback (most recent call last):
...
configue_cli.core.exceptions.MissingMandatoryValueError: Missing mandatory value: dataset.name
Any parameter can be overridden using a configue
compliant YAML file. Suppose the model is configured in the following model.yml
file:
model:
(): __main__.ModelConfig
name: camembert-large
batch_size: 72
optimizer:
(): __main__.OptimizerConfig
learning_rate: 0.01
weight_decay: 0.0
This configuration file can be loaded from the CLI using the -c
flag:
$ python main.py -c model.yml --dry-run
╭─ Configuration helper ────────────────────────────────────╮
│ │
│ model │
│ ├── (): __main__.ModelConfig │
│ ├── name: camembert-large │
│ ├── batch_size: 72 │
│ └── optimizer │
│ ├── (): __main__.OptimizerConfig │
│ ├── learning_rate: 0.01 │
│ └── weight_decay: 0.0 │
│ │
│ dataset │
│ ├── (): __main__.DatasetConfig │
│ ├── name: Missing │
│ └── n_samples: 10000 │
│ │
╰───────────────────────────────────────────────────────────╯
Multiple configuration files can be used simultaneously, the final configuration is assembled by merging all files in the order they are provided. For instance, let's suppose we have the following large_batch.yml
file:
model:
batch_size: 512
This file can be merged into our previous configuration using the following:
$ python main.py -c model.yml -c large_batch.yml --dry-run
╭─ Configuration helper ────────────────────────────────────╮
│ │
│ model │
│ ├── (): __main__.ModelConfig │
│ ├── name: camembert-large │
│ ├── batch_size: 512 │
│ └── optimizer │
│ ├── (): __main__.OptimizerConfig │
│ ├── learning_rate: 0.01 │
│ └── weight_decay: 0.0 │
│ │
│ dataset │
│ ├── (): __main__.DatasetConfig │
│ ├── name: Missing │
│ └── n_samples: 10000 │
│ │
╰───────────────────────────────────────────────────────────╯
Parameters specified with the command line take precedence over the ones specified in YAML files:
$ python main.py model.batch_size=32 -c model.yml -c large_batch.yml --dry-run
╭─ Configuration helper ────────────────────────────────────╮
│ │
│ model │
│ ├── (): __main__.ModelConfig │
│ ├── name: camembert-large │
│ ├── batch_size: 32 │
│ └── optimizer │
│ ├── (): __main__.OptimizerConfig │
│ ├── learning_rate: 0.01 │
│ └── weight_decay: 0.0 │
│ │
│ dataset │
│ ├── (): __main__.DatasetConfig │
│ ├── name: Missing │
│ └── n_samples: 10000 │
│ │
╰───────────────────────────────────────────────────────────╯
This feature encourages a modular configuration pattern where different subparts of the application (the model and the dataset in this example) are configured in separate YAML files and are dynamically assembled at configuration time. Different variations of these subparts can easily be assembled. All arguments can be overridden using the command line without having to edit the config files.
To ease reproducibility, the final configuration used for the run can be exported by using the -o
flag and specifying an output YAML file:
$ python main.py dataset.name=hello-world -c model.yml -c large_batch.yml -o output.yml
╭─ Configuration ───────────────────────────────────────────╮
│ │
│ model │
│ ├── (): __main__.ModelConfig │
│ ├── name: camembert-large │
│ ├── batch_size: 512 │
│ └── optimizer │
│ ├── (): __main__.OptimizerConfig │
│ ├── learning_rate: 0.01 │
│ └── weight_decay: 0.0 │
│ │
│ dataset │
│ ├── (): __main__.DatasetConfig │
│ ├── name: hello-world │
│ └── n_samples: 10000 │
│ │
╰───────────────────────────────────────────────────────────╯
Passed configuration ExperimentConfig(model=ModelConfig(name='camembert-large', batch_size=512, optimizer=OptimizerConfig(learning_rate=0.01, weight_decay=0.0)), dataset=DatasetConfig(name='hello-world', n_samples=10000))
$ cat output.yml
model:
(): __main__.ModelConfig
name: camembert-large
batch_size: 512
optimizer:
(): __main__.OptimizerConfig
learning_rate: 0.01
weight_decay: 0.0
dataset:
(): __main__.DatasetConfig
name: hello-world
n_samples: 10000
It is possible to use the inject_from_cli
decorator without specifying a target type:
@click.command()
@inject_from_cli()
def main(config: configue_cli.core.dict_config.DictConfig) -> None:
...
In that case, the wrapped entrypoint will be passed a configue_cli.core.dict_config.DictConfig
object upon injection.
To load a logging configuration located under the "logging"
key in your final configuration, use the following:
@click.command()
@inject_from_cli(ExperimentConfig, logging_config_path="logging")
def main(config: ExperimentConfig) -> None:
...
SkyPilot is a framework for easily running jobs on any cloud through a unified interface. Any function decorated with inject_from_cli
can easily be executed remotely by providing a Skypilot configuration.
The following configuration defines a job to be executed in a SkyPilot cluster named test-cluster
. The job is defined under the task
key, we refer to the SkyPilot YAML specification for more details on this section.
The Python command and all its arguments are captured and interpolated inside the run
command, respectively in a {command}
and {parameters}
placeholder.
# skypilot.yml
skypilot:
cluster-name: test-cluster
task:
resources:
cloud: gcp
accelerators: K80:1
workdir: .
setup: |
echo 'Setup the job...'
run: |
set -e
cd ~/sky_workdir
{command} {parameters}
To load the SkyPilot configuration in your final configuration, use the following:
@click.command()
@inject_from_cli(ExperimentConfig, skypilot_config_path="skypilot")
def main(config: ExperimentConfig) -> None:
...
As with the other arguments, all SkyPilot configuration arguments can be redefined on the fly:
python main.py -c skypilot.yml skypilot.cluster-name=another-cluster