Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Results not reproducible when running AllenNLPExecutor multiple times with transformers #2716

Closed
MagiaSN opened this issue Jun 1, 2021 · 0 comments · Fixed by #2717
Closed
Labels
bug Issue/PR about behavior that is broken. Not for typos/examples/CI/test but for Optuna itself.

Comments

@MagiaSN
Copy link
Contributor

MagiaSN commented Jun 1, 2021

Expected behavior

If we run AllenNLPExecutor multiples times in a single process, we should get exactly the same results as we run AllenNLPExecutor one time in different processes. However, with transformer models we get different results in 2nd trial and afterwards.

I found this is caused by allennlp.common.cached_transformers module, which only constructs the model in the first trial (which would consume some random numbers), and uses the cached model in trials afterwards (which would not consume random numbers), leading to inconsistent results between single run and multiple runs.

I have reported the same issue in himkt/allennlp-optuna#45 and we think it is better to fix it here.

Environment

  • Optuna version:
  • Python version: 3.7.9
  • OS: CentOS 7
  • (Optional) Other libraries and their versions:
    • pytorch: 1.7.1+cu101
    • allennlp: 2.4.0
    • allennlp-models: 2.4.0

Error messages, stack traces, or logs

Irrelevant

Steps to reproduce

  1. Run test.py --lrs 3e-5 5e-5 (see below) with 2 trials, results for trial_1 (lr=5e-5) are listed below:

    {
      "best_epoch": 0,
      "peak_worker_0_memory_MB": 4430.26171875,
      "peak_gpu_0_memory_MB": 1087.1904296875,
      "training_duration": "0:01:19.745650",
      "training_start_epoch": 0,
      "training_epochs": 0,
      "epoch": 0,
      "training_accuracy": 0.890610765835982,
      "training_loss": 0.2591960741486883,
      "training_worker_0_memory_MB": 4430.26171875,
      "training_gpu_0_memory_MB": 1087.1904296875,
      "validation_accuracy": 0.8784403669724771,
      "validation_loss": 0.29463143753153936,
      "best_validation_accuracy": 0.8784403669724771,
      "best_validation_loss": 0.29463143753153936,
      "test_accuracy": 0.8956617243272927,
      "test_loss": 0.25426666027513045
    }
    
  2. Run test.py --lrs 5e-5, with only one trial (lr=5e-5), results are listed below, note the accuracy and loss are different from the first run:

    {
      "best_epoch": 0,
      "peak_worker_0_memory_MB": 4268.87890625,
      "peak_gpu_0_memory_MB": 107.75439453125,
      "training_duration": "0:01:21.888510",
      "training_start_epoch": 0,
      "training_epochs": 0,
      "epoch": 0,
      "training_accuracy": 0.891380043322469,
      "training_loss": 0.25985829611887934,
      "training_worker_0_memory_MB": 4268.87890625,
      "training_gpu_0_memory_MB": 107.75439453125,
      "validation_accuracy": 0.8841743119266054,
      "validation_loss": 0.29488931596279144,
      "best_validation_accuracy": 0.8841743119266054,
      "best_validation_loss": 0.29488931596279144,
      "test_accuracy": 0.8973091707852828,
      "test_loss": 0.25216216080147646
    }
    

Reproducible examples (optional)

  • test.py

    import argparse
    import os
    import random
    
    import optuna
    from optuna import Trial
    from optuna.integration import AllenNLPExecutor
    from optuna.samplers import GridSampler
    
    
    def main(args: argparse.Namespace) -> None:
        study_name = "test"
        config_file = "config.jsonnet"
        serialization_dir = "output"
        metrics = "best_validation_accuracy"
        direction = "maximize"
        storage = "sqlite:///allennlp_optuna.db"
    
        # For reproducibility
        random.seed(1)
    
        os.makedirs(serialization_dir, exist_ok=True)
    
        def objective(trial: Trial) -> float:
            trial.suggest_float(name="lr", low=1e-5, high=5e-5)
            optuna_serialization_dir = os.path.join(serialization_dir, "trial_{}".format(trial.number))
            executor = AllenNLPExecutor(
                trial,
                config_file,
                optuna_serialization_dir,
                metrics=metrics,
                include_package="allennlp_models",
            )
            return executor.run()
    
        sampler = GridSampler({"lr": args.lrs})
    
        study = optuna.create_study(
            study_name=study_name,
            direction=direction,
            storage=storage,
            sampler=sampler,
        )
    
        study.optimize(objective)
    
    
    if __name__ == "__main__":
        arg_parser = argparse.ArgumentParser()
        arg_parser.add_argument("--lrs", type=float, nargs="+", default=[3e-5, 5e-5])
        args = arg_parser.parse_args()
    
        main(args)
  • config.jsonnet

    local pretrained_model = "haisongzhang/roberta-tiny-cased";
    
    local lr = std.parseJson(std.extVar("lr"));
    
    {
      "random_seed": 42,
      "numpy_seed": 42,
      "pytorch_seed": 42,
      "evaluate_on_test": true,
      "train_data_path": "https://allennlp.s3.amazonaws.com/datasets/sst/train.txt",
      "validation_data_path": "https://allennlp.s3.amazonaws.com/datasets/sst/dev.txt",
      "test_data_path": "https://allennlp.s3.amazonaws.com/datasets/sst/test.txt",
      "dataset_reader": {
        "type": "sst_tokens",
        "use_subtrees": true,
        "granularity": "2-class",
        "token_indexers": {
          "tokens": {
            "type": "pretrained_transformer",
            "model_name": pretrained_model
          }
        },
        "tokenizer": {
          "type": "pretrained_transformer",
          "model_name": pretrained_model
        }
      },
      "validation_dataset_reader": self.dataset_reader + {
        "use_subtrees": false
      },
      "data_loader": {
        "batch_sampler": {
          "type": "bucket",
          "sorting_keys": ["tokens"],
          "batch_size": 64,
        }
      },
      "model": {
        "type": "basic_classifier",
        "text_field_embedder": {
          "token_embedders": {
            "tokens": {
              "type": "pretrained_transformer",
              "model_name": pretrained_model
            }
          }
        },
        "seq2vec_encoder": {
          "type": "bert_pooler",
          "pretrained_model": pretrained_model,
          "dropout": 0.1
        },
        "namespace": "tokens",
        "label_namespace": "label"
      },
      "trainer": {
        "num_epochs": 1,
        "validation_metric": "+accuracy",
        "patience": 3,
        "optimizer": {
          "type": "adam",
          "lr": lr
        },
        "learning_rate_scheduler": {
          "type": "slanted_triangular",
          "cut_frac": 0.0
        }
      }
    }

Additional context (optional)

@MagiaSN MagiaSN added the bug Issue/PR about behavior that is broken. Not for typos/examples/CI/test but for Optuna itself. label Jun 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue/PR about behavior that is broken. Not for typos/examples/CI/test but for Optuna itself.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant