Mismatched hyperparameters between web server display and their actual values #5726

WenjieDu · 2023-12-27T09:33:25Z

Describe the issue:

Environment:

NNI version: 3.0
Training service (local|remote|pai|aml|etc): local
Client OS: Ubuntu 20.04.4 LTS (GNU/Linux 5.13.0-30-generic x86_64)
Server OS (for remote mode only):
Python version: 3.11
PyTorch/TensorFlow version: 2.1.2
Is conda/virtualenv/venv used?: Conda
Is running in Docker?: No

Configuration:

Experiment config (remember to remove secrets!):

experimentName: MRNN hyper-param searching
authorName: WenjieDu
trialConcurrency: 1
trainingServicePlatform: local
searchSpacePath: MRNN_ETTm1_tuning_space.json
multiThread: true
useAnnotation: false
tuner:
    builtinTunerName: Random

trial:
    command: enable_tuning=1 pypots-cli tuning --model pypots.imputation.MRNN --train_set ../../data/ettm1/train.h5 --val_set ../../data/ettm1/val.h5
    codeDir: .
    gpuNum: 1

localConfig:
    useActiveGpu: true
    maxTrialNumPerGpu: 20
    gpuIndices: 3

Search space:

{
  "n_steps":  {"_type":"choice","_value":[60]},
  "n_features":  {"_type":"choice","_value":[7]},
  "patience":  {"_type":"choice","_value":[10]},
  "epochs":  {"_type":"choice","_value":[200]},
  "rnn_hidden_size":  {"_type":"choice","_value":[16,32,64,128,256,512]},
  "lr":{"_type":"loguniform","_value":[0.0001,0.01]}
}

Log message:

nnimanager.log:

[2023-12-27 16:16:42] INFO (NNIManager) submitTrialJob: form: {
  sequenceId: 7,
  hyperParameters: {
    value: '{"parameter_id": 7, "parameter_source": "algorithm", "parameters": {"n_steps": 60, "n_features": 7, "patience": 10, "epochs": 200, "rnn_hidden_size": 32, "lr": 0.0008698020401037771}, "parameter_index": 0}',
    index: 0
  },
  placementConstraint: { type: 'None', gpus: [] }
}
[2023-12-27 16:16:42] INFO (LocalV3.local) Created trial XsB6F

dispatcher.log:

[2023-12-27 16:15:06] INFO (numexpr.utils/MainThread) Note: detected 128 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
[2023-12-27 16:15:06] INFO (numexpr.utils/MainThread) Note: NumExpr detected 128 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
[2023-12-27 16:15:06] INFO (numexpr.utils/MainThread) NumExpr defaulting to 8 threads.
[2023-12-27 16:15:06] INFO (nni.tuner.random/MainThread) Using random seed 220808582
[2023-12-27 16:15:06] INFO (nni.runtime.msg_dispatcher_base/MainThread) Dispatcher started
[2023-12-27 16:15:06] INFO (nni.runtime.msg_dispatcher/Thread-1 (command_queue_worker)) Initial search space: {'n_steps': {'_type': 'choice', '_value': [60]}, 'n_features': {'_type': 'choice', '_value': [7]}, 'patience': {'_type': 'choice', '_value': [10]}, 'epochs': {'_type': 'choice', '_value': [200]}, 'rnn_hidden_size': {'_type': 'choice', '_value': [16, 32, 64, 128, 256, 512]}, 'lr': {'_type': 'loguniform', '_value': [0.0001, 0.01]}}

nnictl stdout and stderr:

2023-12-27 16:16:44 [INFO]: Have set the random seed as 2204 for numpy and pytorch.
2023-12-27 16:16:44 [INFO]: The tunner assigns a new group of params: {'n_steps': 60, 'n_features': 7, 'patience': 10, 'epochs': 200, 'rnn_hidden_size': 256, 'lr': 0.0054442307300676335}
2023-12-27 16:16:45 [INFO]: No given device, using default device: cuda
2023-12-27 16:16:45 [WARNING]: ‼️ saving_path not given. Model files and tensorboard file will not be saved.
2023-12-27 16:16:48 [INFO]: MRNN initialized with the given hyperparameters, the number of trainable parameters: 401,619
2023-12-27 16:16:48 [INFO]: Option lazy_load is set as False, hence loading all data from file...
2023-12-27 16:16:52 [INFO]: Epoch 001 - training loss: 1.3847, validating loss: 1.3214

How to reproduce it?:

Note that in the nnimanager.log: lr of trial XsB6F is 0.0008698020401037771 and this is also the value displayed on the local web page, but in the nnictl stdout log, the actual lr received by the model is 0.0054442307300676335, and they're mismatched. This is not a single case, I notice that hyperparameters of some trials are mismatched between the nnimanager tells and their actual values, while some of them are matched and fine.

The text was updated successfully, but these errors were encountered:

axinbme · 2024-01-11T03:31:15Z

I had the same problem.

void-echo · 2024-03-10T04:25:42Z

Plus one 🤣

WenjieDu · 2024-05-25T06:20:06Z

Seriously? Nobody takes care of this high-risk issue?

Itomigna2 mentioned this issue Mar 4, 2024

Incorrect HPO params viewed by nni Itomigna2/Muesli-lunarlander#5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mismatched hyperparameters between web server display and their actual values #5726

Mismatched hyperparameters between web server display and their actual values #5726

WenjieDu commented Dec 27, 2023 •

edited

axinbme commented Jan 11, 2024

void-echo commented Mar 10, 2024

WenjieDu commented May 25, 2024

Mismatched hyperparameters between web server display and their actual values #5726

Mismatched hyperparameters between web server display and their actual values #5726

Comments

WenjieDu commented Dec 27, 2023 • edited

axinbme commented Jan 11, 2024

void-echo commented Mar 10, 2024

WenjieDu commented May 25, 2024

WenjieDu commented Dec 27, 2023 •

edited