Results not reproducible when running `AllenNLPExecutor` multiple times with transformers #2716

MagiaSN · 2021-06-01T17:39:25Z

Expected behavior

If we run AllenNLPExecutor multiples times in a single process, we should get exactly the same results as we run AllenNLPExecutor one time in different processes. However, with transformer models we get different results in 2nd trial and afterwards.

I found this is caused by allennlp.common.cached_transformers module, which only constructs the model in the first trial (which would consume some random numbers), and uses the cached model in trials afterwards (which would not consume random numbers), leading to inconsistent results between single run and multiple runs.

I have reported the same issue in himkt/allennlp-optuna#45 and we think it is better to fix it here.

Environment

Optuna version:
Python version: 3.7.9
OS: CentOS 7
(Optional) Other libraries and their versions:
- pytorch: 1.7.1+cu101
- allennlp: 2.4.0
- allennlp-models: 2.4.0

Error messages, stack traces, or logs

Irrelevant

Steps to reproduce

Run test.py --lrs 3e-5 5e-5 (see below) with 2 trials, results for trial_1 (lr=5e-5) are listed below:

{
  "best_epoch": 0,
  "peak_worker_0_memory_MB": 4430.26171875,
  "peak_gpu_0_memory_MB": 1087.1904296875,
  "training_duration": "0:01:19.745650",
  "training_start_epoch": 0,
  "training_epochs": 0,
  "epoch": 0,
  "training_accuracy": 0.890610765835982,
  "training_loss": 0.2591960741486883,
  "training_worker_0_memory_MB": 4430.26171875,
  "training_gpu_0_memory_MB": 1087.1904296875,
  "validation_accuracy": 0.8784403669724771,
  "validation_loss": 0.29463143753153936,
  "best_validation_accuracy": 0.8784403669724771,
  "best_validation_loss": 0.29463143753153936,
  "test_accuracy": 0.8956617243272927,
  "test_loss": 0.25426666027513045
}

Run test.py --lrs 5e-5, with only one trial (lr=5e-5), results are listed below, note the accuracy and loss are different from the first run:

{
  "best_epoch": 0,
  "peak_worker_0_memory_MB": 4268.87890625,
  "peak_gpu_0_memory_MB": 107.75439453125,
  "training_duration": "0:01:21.888510",
  "training_start_epoch": 0,
  "training_epochs": 0,
  "epoch": 0,
  "training_accuracy": 0.891380043322469,
  "training_loss": 0.25985829611887934,
  "training_worker_0_memory_MB": 4268.87890625,
  "training_gpu_0_memory_MB": 107.75439453125,
  "validation_accuracy": 0.8841743119266054,
  "validation_loss": 0.29488931596279144,
  "best_validation_accuracy": 0.8841743119266054,
  "best_validation_loss": 0.29488931596279144,
  "test_accuracy": 0.8973091707852828,
  "test_loss": 0.25216216080147646
}

Reproducible examples (optional)

test.py

import argparse
import os
import random

import optuna
from optuna import Trial
from optuna.integration import AllenNLPExecutor
from optuna.samplers import GridSampler


def main(args: argparse.Namespace) -> None:
    study_name = "test"
    config_file = "config.jsonnet"
    serialization_dir = "output"
    metrics = "best_validation_accuracy"
    direction = "maximize"
    storage = "sqlite:///allennlp_optuna.db"

    # For reproducibility
    random.seed(1)

    os.makedirs(serialization_dir, exist_ok=True)

    def objective(trial: Trial) -> float:
        trial.suggest_float(name="lr", low=1e-5, high=5e-5)
        optuna_serialization_dir = os.path.join(serialization_dir, "trial_{}".format(trial.number))
        executor = AllenNLPExecutor(
            trial,
            config_file,
            optuna_serialization_dir,
            metrics=metrics,
            include_package="allennlp_models",
        )
        return executor.run()

    sampler = GridSampler({"lr": args.lrs})

    study = optuna.create_study(
        study_name=study_name,
        direction=direction,
        storage=storage,
        sampler=sampler,
    )

    study.optimize(objective)


if __name__ == "__main__":
    arg_parser = argparse.ArgumentParser()
    arg_parser.add_argument("--lrs", type=float, nargs="+", default=[3e-5, 5e-5])
    args = arg_parser.parse_args()

    main(args)

config.jsonnet

local pretrained_model = "haisongzhang/roberta-tiny-cased";

local lr = std.parseJson(std.extVar("lr"));

{
  "random_seed": 42,
  "numpy_seed": 42,
  "pytorch_seed": 42,
  "evaluate_on_test": true,
  "train_data_path": "https://allennlp.s3.amazonaws.com/datasets/sst/train.txt",
  "validation_data_path": "https://allennlp.s3.amazonaws.com/datasets/sst/dev.txt",
  "test_data_path": "https://allennlp.s3.amazonaws.com/datasets/sst/test.txt",
  "dataset_reader": {
    "type": "sst_tokens",
    "use_subtrees": true,
    "granularity": "2-class",
    "token_indexers": {
      "tokens": {
        "type": "pretrained_transformer",
        "model_name": pretrained_model
      }
    },
    "tokenizer": {
      "type": "pretrained_transformer",
      "model_name": pretrained_model
    }
  },
  "validation_dataset_reader": self.dataset_reader + {
    "use_subtrees": false
  },
  "data_loader": {
    "batch_sampler": {
      "type": "bucket",
      "sorting_keys": ["tokens"],
      "batch_size": 64,
    }
  },
  "model": {
    "type": "basic_classifier",
    "text_field_embedder": {
      "token_embedders": {
        "tokens": {
          "type": "pretrained_transformer",
          "model_name": pretrained_model
        }
      }
    },
    "seq2vec_encoder": {
      "type": "bert_pooler",
      "pretrained_model": pretrained_model,
      "dropout": 0.1
    },
    "namespace": "tokens",
    "label_namespace": "label"
  },
  "trainer": {
    "num_epochs": 1,
    "validation_metric": "+accuracy",
    "patience": 3,
    "optimizer": {
      "type": "adam",
      "lr": lr
    },
    "learning_rate_scheduler": {
      "type": "slanted_triangular",
      "cut_frac": 0.0
    }
  }
}

Additional context (optional)

The text was updated successfully, but these errors were encountered:

MagiaSN added the bug Issue/PR about behavior that is broken. Not for typos/examples/CI/test but for Optuna itself. label Jun 1, 2021

This was referenced Jun 1, 2021

Fix AllenNLPExecutor reproducibility #2717

Merged

Different results from allennlp tune and allennlp retrain with transformers himkt/allennlp-optuna#45

Closed

himkt closed this as completed in #2717 Jun 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Results not reproducible when running `AllenNLPExecutor` multiple times with transformers #2716

Results not reproducible when running `AllenNLPExecutor` multiple times with transformers #2716

MagiaSN commented Jun 1, 2021 •

edited

Loading

Results not reproducible when running AllenNLPExecutor multiple times with transformers #2716

Results not reproducible when running AllenNLPExecutor multiple times with transformers #2716

Comments

MagiaSN commented Jun 1, 2021 • edited Loading

Expected behavior

Environment

Error messages, stack traces, or logs

Steps to reproduce

Reproducible examples (optional)

Additional context (optional)

Results not reproducible when running `AllenNLPExecutor` multiple times with transformers #2716

Results not reproducible when running `AllenNLPExecutor` multiple times with transformers #2716

MagiaSN commented Jun 1, 2021 •

edited

Loading