Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Logs] Analysis throws index pattern error when creating jobs but still actually creates them #48672

Closed
Zacqary opened this issue Oct 18, 2019 · 10 comments
Assignees
Labels
bug Fixes for quality problems that affect the customer experience Feature:Logs UI Logs UI feature Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services v7.6.0

Comments

@Zacqary
Copy link
Contributor

Zacqary commented Oct 18, 2019

When your log settings includes an index pattern that does not exist, the analysis setup workflow throws this error:
Screen Shot 2019-10-18 at 11 28 25 AM

This seems to imply that the jobs weren't created. However, when you refresh the page, or tab away and go back to Analysis, the Analysis tab displays the UI for a successfully set up ML job. The job also exists in the ML plugin UI.

Steps to reproduce:

  1. Go to the logs settings tab and change log indices to include an index pattern that doesn't exist
  2. Try to create the ML jobs in the analysis tab, notice the error message
  3. Refresh the page without changing the log indices to remove the non-existent index pattern

Expected behavior:

Not sure whether we should suppress this error (if it's not actually a problem) or if the ML jobs should be deleted when this error throws.

@Zacqary Zacqary added bug Fixes for quality problems that affect the customer experience Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services v7.5.0 labels Oct 18, 2019
@elasticmachine
Copy link
Contributor

Pinging @elastic/logs-metrics-ui (Team:logs-metrics-ui)

@Zacqary Zacqary added the Feature:Logs UI Logs UI feature label Oct 18, 2019
@Zacqary
Copy link
Contributor Author

Zacqary commented Oct 18, 2019

@weltenwort @Kerry350 any insight on what the expected behavior should be in this instance?

@weltenwort
Copy link
Member

This is related to #48231, which surfaces the underlying error and offers a way for the user to fix it. The job is still created, because it's not an atomic operation. Internally the ML plugin actually creates a job and a datafeed. If the latter fails the job stays.

Currently we delete the job on retry, because, if the cause of the error is that a job with the same id already exists, we don't want to delete any data that might be valuable to the user. (deleting a job also automatically deletes the results)

#48660 relates to how the job health is being interpreted when the datafeed is missing. Maybe we can cover what is described here by not treating it as "stopped" but "failed"? 🤔 @afgomez what do you think?

@afgomez
Copy link
Contributor

afgomez commented Oct 21, 2019

Maybe we can cover what is described here by not treating it as "stopped" but "failed"?

That makes sense. I'll change the code in the PR to reflect that.

@afgomez
Copy link
Contributor

afgomez commented Oct 22, 2019

I did some investigation around this. When the user tries to create an ML job with a non-existing index both the job and the datafeed are created. The datafeed is marked as stopped right after creation because the index doesn't exist. I wonder if this behaviour is intentional (ping @elastic/ml-ui, @sophiec20, @grabowskit)

When the user reloads the page, since the datafeed exists and it has a valid state ("stopped") nothing happens.

Some options that we could do:

  • Create a new state for the datafeed that reflects this situation (not_started?). When the user reloads the page, take into account the new state and show the appropiate feedback (could be as simple as showing again the setup page).

  • When we create the job, check the state. If the job creation was successful but the datafeed creation wasn't, request the job deletion.

Any other ideas?

@weltenwort
Copy link
Member

The datafeed is marked as stopped right after creation because the index doesn't exist. I wonder if this behaviour is intentional.

There have been recent discussion about introducing a configuration option to change that behavior on demand: elastic/elasticsearch#48056

As for the options:

  • New datafeed state: So far I couldn't find a clear indication in the datafeed as to the reason why it was stopped.
  • Delete job when datafeed could not be started: That might be reasonable as long as we make sure that we only do this iff we created both and the datafeed failed to start.

@afgomez
Copy link
Contributor

afgomez commented Oct 22, 2019

@weltenwort this is the response we get from the API

{
  "jobs": [{ "id": "...", "success": true }],
  "datafeeds": [
    {
      "id": "...",
      "success": true,
      "started": false,
      "error": {
        "msg": "[status_exception] No node found to start datafeed [datafeed-...], allocation explanation [cannot start datafeed [datafeed-...] because index [wadus-*] does not exist, is closed, or is still initializing.]",
        "path": "/_ml/datafeeds/.../_start?&start=...",
        "query": {},
        "statusCode": 409,
        "response": "{\"error\":{\"root_cause\":[{\"type\":\"status_exception\",\"reason\":\"No node found to start datafeed [...], allocation explanation [cannot start datafeed [...] because index [wadus-*] does not exist, is closed, or is still initializing.]\"}],\"type\":\"status_exception\",\"reason\":\"No node found to start datafeed [...], allocation explanation [cannot start datafeed [...] because index [wadus-*] does not exist, is closed, or is still initializing.]\"},\"status\":409}"
      }
    }
  ],
  "kibana": {}
}

Technically it was never started, so it was never stopped :D. I think a new state to reflect this scenario (new, not_started, etc) might be useful for our use case.

@weltenwort
Copy link
Member

Yes, the reason is obvious during the setup process and we handle it appropriately. I was talking about the job_summary api, of course, which we use to fetch the job status after a page reload.

@afgomez
Copy link
Contributor

afgomez commented Oct 22, 2019

Ah, yes :) That's why the proposal to have a new state.

Let's wait for the ML team to give their opinion on the topic. Otherwise I can give a shot to the second option

@sgrodzicki sgrodzicki added v7.6.0 and removed v7.5.0 labels Nov 18, 2019
@afgomez
Copy link
Contributor

afgomez commented Nov 25, 2019

Superseeded by #50008

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:Logs UI Logs UI feature Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services v7.6.0
Projects
None yet
Development

No branches or pull requests

6 participants