You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[5/12/2019, 1:10:07 AM] ERROR [ 'PAI Training service: get job info for trial Y7TZR from PAI Cluste
r failed!' ]
[5/12/2019, 1:10:08 AM] ERROR [ 'Submit trial XPxTn failed, http code:500, http body: [object Objec
t]' ]
[5/12/2019, 1:10:08 AM] ERROR [ 'Error: Submit trial XPxTn failed, http code:500, http body: [objec
t Object]\n at Request.request [as _callback] (/data/home/v-zejlin/.conda/envs/pynni/nni/training_service/pai/paiTrainingService.js:322:33)\n at Request.self.callback (/data/home/v-zejlin/.conda/envs/pynni/nni/node_modules/request/request.js:185:22)\n at Request.emit (events.js:182:13)\n at Request.<anonymous> (/data/home/v-zejlin/.conda/envs/pynni/nni/node_modules/request/request.js:1161:10)\n at Request.emit (events.js:182:13)\n at IncomingMessage.<anonymous> (/data/home/v-zejlin/.conda/envs/pynni/nni/node_modules/request/request.js:1083:12)\n at Object.onceWrapper (events.js:273:13)\n at IncomingMessage.emit (events.js:187:15)\n at endReadableNT (_stream_readable.js:1094:12)\n at process._tickCallback (internal/process/next_tick.js:63:19)' ]
[5/12/2019, 1:10:08 AM] INFO [ 'Change NNIManager status from: TUNER_NO_MORE_TRIAL to: ERROR' ]
Two experiments resulted in the same bug. Note that after I decreased the interval time of updating PAI token (from originally 2 hours to half an hour), it was fixed.
Root cause analyze:
There are 2 types of 500 errors (so far we know), trial failure or experiment failure. For trial failure, in this issue, we will catch the trial failure and add an NNI Error log, fail the trial but won't failure the entire experiment. For experiment failure, fail the experiment and add NNI error log.
The text was updated successfully, but these errors were encountered:
Short summary about the issue/question:
Brief what process you are following:
How to reproduce it:
nni Environment:
need to update document(yes/no):
Anything else we need to know:
Error info:
Two experiments resulted in the same bug. Note that after I decreased the interval time of updating PAI token (from originally 2 hours to half an hour), it was fixed.
Root cause analyze:
There are 2 types of 500 errors (so far we know), trial failure or experiment failure. For trial failure, in this issue, we will catch the trial failure and add an NNI Error log, fail the trial but won't failure the entire experiment. For experiment failure, fail the experiment and add NNI error log.
The text was updated successfully, but these errors were encountered: