UI thinks a node is still deploying, even after the job has disappeared. #2443

slaperche-scality · 2020-04-21T15:22:14Z

Component:

UI

What happened:

Due to some issue with the OneClick, the SSH key was unusable.
Thus, when I tried to expand the cluster, the deployment was stuck (because Salt was prompting for a yes/no for the SSH key during the ping I guess).
After manually fixing the SSH key, I've run the expansion from the CLI and it worked.

But, the UI still believe that the node is deploying (even though it got the right status as Ready).

After investigating a with @ChengYanJin it appears that the UI is still polling for the Salt JID that was stuck and think it hasn't completed yet:

But this job, no longer exists:

[root@hd-cluster-bootstrap /]# salt-run jobs.print_job 20200421071144947483
20200421071144947483:
    ----------
    Error:
        Cannot contact returner or no job with this jid
    Result:
        ----------
    StartTime:
        2020, Apr 21 07:11:44.947483

And indeed, the UI get the same answer:

But isn't processing it correctly.
I think the error either lies in refreshJobStatus (with its TODO: error handling? 😂 ) or inside getJobStatusFromPrintJob which isn't looking for an Error key in the response.

Note that we should handle this case of JID not found because it can also happen if the salt-master crash or is restarted (JID are not persistent).

What was expected:

The node shouldn't appear as Deploying…

When the JID no longer exists we should at least warn the user something went wrong (and maybe restart the job automatically, though I'm not sure about this behavior…)

Steps to reproduce

I guess you can try:

use a "bad" or unknown SSH key when doing cluster expansion from UI
fix the SSH key
run the deployment from the CLI
go check the UI

Resolution proposal:
We should handle when there is an error occur( not indicate in the response, but in the JSON), set the job completed to true and pop up a notification with the error message)
Otherwise the job is stuck there and will never finish.

export function getJobStatusFromPrintJob(result, jid) {
  let status = {
    completed: false,
  };
  const job = result.return[0][jid];
  if (job && Object.keys(job['Result']).length) {
    status.completed = true;
    const returner = Object.values(job['Result'])[0].return;
    status.success = returner.success;
    if (!status.success) {
      status = { ...status, ...parseJobError(returner) };
    }
  }
  return status;
}

The text was updated successfully, but these errors were encountered:

…JSON Refs: #2443

Refs: #2443

…he JSON Refs: #2443

Refs: #2443

Since we are not sure how to reproduce the error circumstances, we will just display everything inside the result.error Refs: #2443

Refs: #2443

…he JSON Refs: #2443

Refs: #2443

Since we are not sure how to reproduce the error circumstances, we will just display everything inside the result.error Refs: #2443

Refs: #2443

ChengYanJin · 2020-04-25T08:25:43Z

merged in #2458 (dev/2.6) and #2475(dev/2.5)

slaperche-scality added the topic:ui UI-related issues label Apr 21, 2020

ChengYanJin self-assigned this Apr 22, 2020

thomasdanan added the kind:bug Something isn't working label Apr 22, 2020

ChengYanJin added a commit that referenced this issue Apr 24, 2020

nodes: Add error handler when there is an error indicated inside the …

1e80bb0

…JSON Refs: #2443

ChengYanJin added a commit that referenced this issue Apr 24, 2020

nodes: Auto formatting

801d765

Refs: #2443

ChengYanJin added a commit that referenced this issue Apr 24, 2020

ui/nodes: Add error handler when there is an error indicated inside t…

048d0a7

…he JSON Refs: #2443

ChengYanJin added a commit that referenced this issue Apr 24, 2020

ui/nodes: Auto formatting

7e6d6b9

Refs: #2443

ChengYanJin added a commit that referenced this issue Apr 24, 2020

ui/salt: Add a notification for the salt error

9db2244

Since we are not sure how to reproduce the error circumstances, we will just display everything inside the result.error Refs: #2443

ChengYanJin added a commit that referenced this issue Apr 24, 2020

ui/translations: Add the salt job translations

39219cd

Refs: #2443

ChengYanJin added the complexity:medium Something that requires one or few days to fix label Apr 24, 2020

ChengYanJin added a commit that referenced this issue Apr 24, 2020

ui/nodes: Add error handler when there is an error indicated inside t…

bc4a613

…he JSON Refs: #2443

ChengYanJin added a commit that referenced this issue Apr 24, 2020

ui/nodes: Auto formatting

c8cb42d

Refs: #2443

ChengYanJin added a commit that referenced this issue Apr 24, 2020

ui/salt: Add a notification for the salt error

262450f

Since we are not sure how to reproduce the error circumstances, we will just display everything inside the result.error Refs: #2443

ChengYanJin mentioned this issue Apr 24, 2020

Bugfix/backport add the error handling for salt job #2475

Merged

ChengYanJin added a commit that referenced this issue Apr 24, 2020

ui/translations: Add the salt job translations

9c42904

Refs: #2443

ChengYanJin added a commit that referenced this issue Apr 24, 2020

ui/tranlations: Update the translations

31b2e9c

Refs: #2443

ChengYanJin closed this as completed Apr 25, 2020

slaperche-scality mentioned this issue Jun 5, 2020

ui: Failed Node deployment jobs may prevent access to Deploy button #2602

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UI thinks a node is still deploying, even after the job has disappeared. #2443

UI thinks a node is still deploying, even after the job has disappeared. #2443

slaperche-scality commented Apr 21, 2020 •

edited by ChengYanJin

Loading

ChengYanJin commented Apr 25, 2020

UI thinks a node is still deploying, even after the job has disappeared. #2443

UI thinks a node is still deploying, even after the job has disappeared. #2443

Comments

slaperche-scality commented Apr 21, 2020 • edited by ChengYanJin Loading

ChengYanJin commented Apr 25, 2020

slaperche-scality commented Apr 21, 2020 •

edited by ChengYanJin

Loading