Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] What is the meaning of WARNING (nni.runtime.msg_dispatcher_base/MainThread) default queue length:? #4174

Closed
fsmosca opened this issue Sep 11, 2021 · 3 comments
Assignees

Comments

@fsmosca
Copy link

fsmosca commented Sep 11, 2021

Describe the issue:
There is a dispatcher log below. What is the meaning of this warning? It spends around 10 minutes for this particular part and it repeats. It started doing this after around 50 trials.

[2021-09-11 08:23:00] WARNING (nni.runtime.msg_dispatcher_base/MainThread) default queue length: 20
[2021-09-11 08:23:05] WARNING (nni.runtime.msg_dispatcher_base/MainThread) default queue length: 21
[2021-09-11 08:23:10] WARNING (nni.runtime.msg_dispatcher_base/MainThread) default queue length: 22
[2021-09-11 08:23:15] WARNING (nni.runtime.msg_dispatcher_base/MainThread) default queue length: 23

...

[2021-09-11 08:31:41] WARNING (nni.runtime.msg_dispatcher_base/MainThread) default queue length: 124
[2021-09-11 08:31:44] INFO (Metis_Tuner_AutoML/Thread-1) DEBUG: %d re-sampling candidates selected

[2021-09-11 08:31:44] INFO (Metis_Tuner_AutoML/Thread-1) {'hyperparameter': [96, 70, 41, 9, 74, 169], 'expected_mu': array([-1.81632653]), 'expected_sigma': 26.150017460044886, 'reason': 'resampling'}
[2021-09-11 08:31:44] INFO (Metis_Tuner_AutoML/Thread-1) Evaluating information gain of %d candidates...

[2021-09-11 08:31:46] WARNING (nni.runtime.msg_dispatcher_base/MainThread) default queue length: 125
[2021-09-11 08:31:51] WARNING (nni.runtime.msg_dispatcher_base/MainThread) default queue length: 126

...

[2021-09-11 09:04:39] WARNING (nni.runtime.msg_dispatcher_base/MainThread) default queue length: 151
[2021-09-11 09:04:41] INFO (Metis_Tuner_AutoML/Thread-1) Generate paramageters: 
{'Ka1': 73, 'Ka2': 69, 'Ka3': 39, 'Ka4': 15, 'Ks1': 65, 'Ks2': 190}
[2021-09-11 09:07:18] INFO (Metis_Tuner_AutoML/Thread-1) Received trial result.
[2021-09-11 09:07:18] INFO (Metis_Tuner_AutoML/Thread-1) value is : -11.0
[2021-09-11 09:07:18] INFO (Metis_Tuner_AutoML/Thread-1) parameter is : {'Ka1': 73, 'Ka2': 69, 'Ka3': 39, 'Ka4': 15, 'Ks1': 65, 'Ks2': 190}
[2021-09-11 09:07:36] INFO (Metis_Tuner_AutoML/Thread-1) {'hyperparameter': [69, 68, 34, 3, 77, 179], 'expected_mu': array([0.0754717]), 'expected_sigma': 27.239572412632043, 'reason': 'exploitation_gp'}
[2021-09-11 09:07:37] INFO (Metis_Tuner_AutoML/Thread-1) x not in sample_x
[2021-09-11 09:07:37] INFO (Metis_Tuner_AutoML/Thread-1) DEBUG: 1 exploration candidate selected

[2021-09-11 09:07:37] INFO (Metis_Tuner_AutoML/Thread-1) {'hyperparameter': [69, 68, 34, 3, 77, 179], 'expected_mu': array([0.0754717]), 'expected_sigma': 27.239572412632043, 'reason': 'exploration'}
[2021-09-11 09:07:37] INFO (Metis_Tuner_AutoML/Thread-1) Getting candidates for exploitation...

[2021-09-11 09:07:45] INFO (Metis_Tuner_AutoML/Thread-1) x not in sample_x
[2021-09-11 09:07:45] INFO (Metis_Tuner_AutoML/Thread-1) DEBUG: 1 exploitation_gmm candidate selected

[2021-09-11 09:07:45] INFO (Metis_Tuner_AutoML/Thread-1) {'hyperparameter': [83, 74, 29, 18, 55, 191], 'expected_mu': array([0.0754717]), 'expected_sigma': 27.239572412632043, 'reason': 'exploitation_gmm'}
[2021-09-11 09:07:45] INFO (Metis_Tuner_AutoML/Thread-1) Getting candidates for re-sampling...

[2021-09-11 09:09:03] WARNING (nni.runtime.msg_dispatcher_base/MainThread) default queue length: 20
[2021-09-11 09:09:08] WARNING (nni.runtime.msg_dispatcher_base/MainThread) default queue length: 21

...

[2021-09-11 09:16:44] WARNING (nni.runtime.msg_dispatcher_base/MainThread) default queue length: 112
[2021-09-11 09:16:49] WARNING (nni.runtime.msg_dispatcher_base/MainThread) default queue length: 113
...

Environment:

  • NNI version: 2.4
  • Training service (local|remote|pai|aml|etc): local
  • Client OS: Win10
  • Server OS (for remote mode only):
  • Python version: 3.9.1
  • PyTorch/TensorFlow version: N/A I am not using it.
  • Is conda/virtualenv/venv used?: No
  • Is running in Docker?: No

Configuration:

  • Experiment config (remember to remove secrets!):
{
 "params": {
   "trialCommand": "python opt.py ...",
   "trialCodeDirectory": ,
   "trialConcurrency": 1,
   "trialGpuNumber": 0,
   "maxTrialNumber": 200,
   "useAnnotation": false,
   "debug": false,
   "experimentWorkingDirectory": "",
   "tuner": {
     "name": "MetisTuner",
     "classArgs": {
       "optimize_mode": "maximize",
       "no_resampling": false,
       "exploration_probability": 0.75
     }
   },
   "trainingService": {
     "platform": "local",
     "reuseMode": false,
     "maxTrialNumberPerGpu": 1
   }
 },
 "execDuration": "4h 3m 17s",
 "nextSequenceId": 55,
 "revision": 1517
}
  • Search space:
{
  "Ka1": {
    "_type": "randint",
    "_value": [57, 97]
  },
  "Ka2": {
    "_type": "randint",
    "_value": [35, 75]
  },
  "Ka3": {
    "_type": "randint",
    "_value": [24, 64]
  },
  "Ka4": {
    "_type": "randint",
    "_value": [0, 20]
  },
  "Ks1": {
    "_type": "randint",
    "_value": [50, 90]
  },
  "Ks2": {
    "_type": "randint",
    "_value": [165, 200]
  }
}

Log message:

  • nnimanager.log:
[2021-11-09 10:20:58] INFO (NNIDataStore) Datastore initialization done
[2021-11-09 10:20:59] INFO (RestServer) RestServer start
[2021-11-09 10:20:59] WARNING (NNITensorboardManager) Tensorboard may not installed, if you want to use tensorboard, please check if tensorboard installed.
[2021-11-09 10:20:59] INFO (RestServer) RestServer base port is 8111
[2021-11-09 10:20:59] INFO (main) Rest server listening on: http://0.0.0.0:8111
[2021-11-09 10:21:02] INFO (NNIManager) Starting experiment: 7Yk24RaK
[2021-11-09 10:21:02] INFO (NNIManager) Setup training service...
[2021-11-09 10:21:02] INFO (LocalTrainingService) Construct local machine training service.
[2021-11-09 10:21:02] INFO (NNIManager) Setup tuner...
[2021-11-09 10:21:02] INFO (NNIManager) Change NNIManager status from: INITIALIZED to: RUNNING
[2021-11-09 10:21:02] INFO (NNIManager) Add event listeners
[2021-11-09 10:21:02] INFO (LocalTrainingService) Run local machine training service.
[2021-11-09 10:21:03] INFO (NNIManager) NNIManager received command from dispatcher: ID, 
[2021-11-09 10:21:03] INFO (NNIManager) NNIManager received command from dispatcher: TR, {"parameter_id": 0, "parameter_source": "algorithm", "parameters": {"Ka1": 70, "Ka2": 36, "Ka3": 43, "Ka4": 6, "Ks1": 58, "Ks2": 182}, "parameter_index": 0}
[2021-11-09 10:21:07] INFO (NNIManager) submitTrialJob: form: {
  sequenceId: 0,
  hyperParameters: {
    value: '{"parameter_id": 0, "parameter_source": "algorithm", "parameters": {"Ka1": 70, "Ka2": 36, "Ka3": 43, "Ka4": 6, "Ks1": 58, "Ks2": 182}, "parameter_index": 0}',
    index: 0
  },
  placementConstraint: { type: 'None', gpus: [] }
}
...

  • dispatcher.log:
[2021-09-11 05:32:59] INFO (nni.runtime.msg_dispatcher_base/MainThread) Dispatcher started
[2021-09-11 05:32:59] INFO (Metis_Tuner_AutoML/Thread-1) Generate paramageters: 
{'Ka1': 65, 'Ka2': 62, 'Ka3': 52, 'Ka4': 17, 'Ks1': 69, 'Ks2': 196}
[2021-09-11 05:35:29] INFO (Metis_Tuner_AutoML/Thread-1) Received trial result.
[2021-09-11 05:35:29] INFO (Metis_Tuner_AutoML/Thread-1) value is : 5.0
[2021-09-11 05:35:29] INFO (Metis_Tuner_AutoML/Thread-1) parameter is : {'Ka1': 65, 'Ka2': 62, 'Ka3': 52, 'Ka4': 17, 'Ks1': 69, 'Ks2': 196}
[2021-09-11 05:35:29] INFO (Metis_Tuner_AutoML/Thread-1) Generate paramageters: 
{'Ka1': 85, 'Ka2': 61, 'Ka3': 50, 'Ka4': 0, 'Ks1': 73, 'Ks2': 189}
[2021-09-11 05:38:00] INFO (Metis_Tuner_AutoML/Thread-1) Received trial result.
[2021-09-11 05:38:00] INFO (Metis_Tuner_AutoML/Thread-1) value is : -16.0
[2021-09-11 05:38:00] INFO (Metis_Tuner_AutoML/Thread-1) parameter is : {'Ka1': 85, 'Ka2': 61, 'Ka3': 50, 'Ka4': 0, 'Ks1': 73, 'Ks2': 189}
[2021-09-11 05:38:05] INFO (Metis_Tuner_AutoML/Thread-1) Generate paramageters: 
{'Ka1': 96, 'Ka2': 69, 'Ka3': 50, 'Ka4': 12, 'Ks1': 81, 'Ks2': 171}
[2021-09-11 05:40:36] INFO (Metis_Tuner_AutoML/Thread-1) Received trial result.
[2021-09-11 05:40:36] INFO (Metis_Tuner_AutoML/Thread-1) value is : -38.0
[2021-09-11 05:40:36] INFO (Metis_Tuner_AutoML/Thread-1) parameter is : {'Ka1': 96, 'Ka2': 69, 'Ka3': 50, 'Ka4': 12, 'Ks1': 81, 'Ks2': 171}
...
  • nnictl stdout and stderr: Nothing

How to reproduce it?:

@ultmaster
Copy link
Contributor

This is usually because the commands are coming too fast and tuner is too slow to process them. Are your trials running very fast and quickly sending many intermediate and final results? Metis tuner can be not efficient enough to handle all those messages.

@fsmosca
Copy link
Author

fsmosca commented Sep 13, 2021

[2021-09-11 05:35:29] INFO (Metis_Tuner_AutoML/Thread-1) Generate paramageters: 
{'Ka1': 85, 'Ka2': 61, 'Ka3': 50, 'Ka4': 0, 'Ks1': 73, 'Ks2': 189}
[2021-09-11 05:38:00] INFO (Metis_Tuner_AutoML/Thread-1) Received trial result.

Elapse is around 2.5 minutes.
38:00 - 35:29

Unfortunately I cannot make some retests right now as I got ERROR: Restful server start failed! It worked on one day and suddenly it refused to start.

@fsmosca
Copy link
Author

fsmosca commented Sep 14, 2021

I made a retest but using nni 2.3 because it is working and have not seen any of those warnings.

@fsmosca fsmosca closed this as completed Sep 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants