Modify the Trainer class to handle simultaneous execution of Ray Tune and Weights & Biases #10823

ruanchaves · 2021-03-20T17:27:09Z

What does this PR do?

The proper way to integrate Ray Tune and Weights & Biases is to pass a wandb parameter to tune.run.

However, this parameter is handled as a dictionary inside the config argument, and there is no distinction between wandb parameters and standard model optimization parameters. The following code comes from their docs:

from ray.tune.logger import DEFAULT_LOGGERS
from ray.tune.integration.wandb import WandbLogger
tune.run(
    train_fn,
    config={
        # define search space here
        "parameter_1": tune.choice([1, 2, 3]),
        "parameter_2": tune.choice([4, 5, 6]),
        # wandb configuration
        "wandb": {
            "project": "Optimization_Project",
            "api_key_file": "/path/to/file",
            "log_config": True
        }
    },
    loggers=DEFAULT_LOGGERS + (WandbLogger, ))

This is not a problem for Ray Tune. However, it is a problem for the transformers integration because it treats wandb as a model parameter, and therefore configuring wandb in this way will raise an error message claiming that wandb is not a training argument.

The following code will raise such an error:

    # Initialize our Trainer
    trainer = Trainer(
        model_init=model_init,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=eval_dataset if training_args.do_eval else None,
        compute_metrics=compute_metrics,
        tokenizer=tokenizer,
        data_collator=data_collator,
    )

    # Hyperparameter Search

    def hp_space_fn(empty_arg):
        config = {
                    "warmup_steps": tune.choice([50, 100, 500, 1000]),
                    "learning_rate": tune.choice([1.5e-5, 2e-5, 3e-5, 4e-5]),
                    "num_train_epochs": tune.quniform(0.0, 10.0, 0.5),
        }
        wandb_config = {
                "wandb": {
                        "project": os.environ.get(
                            'WANDB_PROJECT',
                            'wandb_project'),
                        "api_key": os.environ.get('API_KEY'),
                        "log_config": True
                        }
        }
        config.update(wandb_config)
        return config

    best_run = trainer.hyperparameter_search(
            direction="maximize",
            backend="ray",
            scheduler=PopulationBasedTraining(
                        time_attr='time_total_s',
                        metric='eval_f1_thr_0',
                        mode='max',
                        perturbation_interval=600.0
                    ),
            hp_space=hp_space_fn,
            loggers=DEFAULT_LOGGERS + (WandbLogger, ),
    )

One way to work around this is to instantiate a subclass based on the Trainer:

    class CustomTrainer(Trainer):

        def __init__(self, *args, **kwargs):
            super(CustomTrainer, self).__init__(*args, **kwargs)

        def _hp_search_setup(self, trial: Any):
            try:
                trial.pop('wandb', None)
            except AttributeError:
                pass
            super(CustomTrainer, self)._hp_search_setup(trial)

However, this looks like a hack because throwing away wandb arguments in model config on _hp_search_setup should be standard Trainer behavior.

That's why I'm submitting a PR that directly modifies the _hp_search_setup of the Trainer class to ignore wandb arguments if Ray is chosen as a backend.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

I'm tagging @richardliaw and @amogkam as they're directly involved in Ray Tune.

… wandb argument passed by Ray Tune to model config.

sgugger

Thanks for fixing! Waiting for @richardliaw or @amogkam approval before merging.

amogkam

LGTM! Thanks

… and Weights & Biases (huggingface#10823) * Modify the _hp_search_setup method on the Trainer class to handle the wandb argument passed by Ray Tune to model config. * Reformat single quotes as double quotes.

ruanchaves added 2 commits March 20, 2021 13:58

Modify the _hp_search_setup method on the Trainer class to handle the…

4e0866c

… wandb argument passed by Ray Tune to model config.

Reformat single quotes as double quotes.

88af2c8

LysandreJik requested a review from sgugger March 22, 2021 12:52

sgugger approved these changes Mar 22, 2021

View reviewed changes

amogkam approved these changes Mar 22, 2021

View reviewed changes

sgugger merged commit a8d4d67 into huggingface:master Mar 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modify the Trainer class to handle simultaneous execution of Ray Tune and Weights & Biases #10823

Modify the Trainer class to handle simultaneous execution of Ray Tune and Weights & Biases #10823

ruanchaves commented Mar 20, 2021

sgugger left a comment

amogkam left a comment

Modify the Trainer class to handle simultaneous execution of Ray Tune and Weights & Biases #10823

Modify the Trainer class to handle simultaneous execution of Ray Tune and Weights & Biases #10823

Conversation

ruanchaves commented Mar 20, 2021

What does this PR do?

Before submitting

Who can review?

sgugger left a comment

Choose a reason for hiding this comment

amogkam left a comment

Choose a reason for hiding this comment