Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Rollup] Wrong response status on failed requests #39845

Closed
sebelga opened this issue Mar 8, 2019 · 5 comments · Fixed by #41502
Closed

[Rollup] Wrong response status on failed requests #39845

sebelga opened this issue Mar 8, 2019 · 5 comments · Fixed by #41502
Labels
:StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data

Comments

@sebelga
Copy link
Contributor

sebelga commented Mar 8, 2019

I discovered that Elasticsearch handles differently the response of a common pattern: "Trigger an action that has already been triggered".

Rollup job --> start

POST _rollup/job/<job_id>/_start

When we try to start a rollup job that is already started, Elasticsearch throws a 500 Internal Server Error instead of a 400 Bad Request.

This is the error returned

status: 500,
  displayName: 'InternalServerError',
  message:
   '[exception] Cannot start task for Rollup Job [my-job] because state was [STARTED]',
  path: '/_rollup/job/my-job/_start',
  query: {},
  body:
   { error:
      { root_cause: [Array],
        type: 'exception',
        reason:
         'Cannot start task for Rollup Job [my-job] because state was [STARTED]' },
     status: 500 },
  statusCode: 500,

Rollup job --> stop

POST _rollup/job/<job_id>/_stop

This action is idempotent. We can call it as many times as we want, we always get a

200 OK
{
  stopped: true
}

Start trial

POST _xpack/license/start_trial

If we compare the above with the start trial API, we can see that it is handled differently. If we have already started the trial and we call the request again we get a 403 Forbidden

{
  "acknowledged": true,
  "trial_was_started": false,
  "error_message": "Operation failed: Trial was already activated."
}

I think it would be nice to align the way ES handle these types of requests and their status code.

I also realize another wrong status in the Rollup API. When we try to delete a Job that has been started, we receive also a 500 Internal Server Error instead of a 400 Bad Request.

{
	Internal Server Error::{
		"path": "/_rollup/job/my-job",
		"query": {},
		"statusCode": 500,
		"response": "{\"task_failures\":[{\"task_id\":1464,\"node_id\":\"38IK6IjmQAy2y2tgb7q_Nw\",\"status\":\"INTERNAL_SERVER_ERROR\",\"reason\":{\"type\":\"illegal_state_exception\",\"reason\":\"Could not delete job [my-job] because indexer state is [STARTED].  Job must be [STOPPED] before deletion.\"}}],\"acknowledged\":false}"
	}
@polyfractal polyfractal added the :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data label Mar 8, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo

@polyfractal
Copy link
Contributor

Started to look into this. I agree the 500 errors are no good. I'm not sure I like the 403 Forbidden though, since that sorta implies it's some kind of authentication issue.

ML uses 409 Conflict when trying to start an already started datafeed, which I think is a better fit. It specifically applies to a conflict of internal state, although part of 409 Conflict says it may be something the user can fix. I guess it's "fixable" by the user stopping the job, but in any case, it feels like a better status than 403 Forbidden.

ML also makes stopping datafeeds idempotent even if they are already stopped. I think this is done to make UI management easier.

@jen-huang @cjcenizal any opinions on this?

Marking this as team-discuss so I can ask the A&G team in the next meeting.

@cjcenizal
Copy link
Contributor

Idempotency feels intuitive to me wrt start/stop actions. In those situations, I'd expect a 200 response, because the resource is in the desired state.

A 409 code also seems the most reasonable response when trying to execute a state change that's disallowed by the current state -- agreed that 403 is a bit misleading for the reasons you mention @polyfractal.

I guess we're not considering 418 in either of these scenarios? :)

@sebelga
Copy link
Contributor Author

sebelga commented Apr 17, 2019

I also would vote or Idempotency on those actions as, to me, they are neither

  • a 400 Bad request (the request is well formulated)
  • nor a 409 Conflict (there is no state change that needs to occur, so no conflict possible)
  • nor a 403 Forbidden (the user is allowed to execute the request)

And above all, it is a much nicer UX 😊

The real downside is the breaking change it would create.

@polyfractal
Copy link
Contributor

We chatted in the A&G meeting and came to the same conclusion: idempotency and returning 200 OK made the most sense.

We still need to deal with the break this introduces, and the team wasn't sure if this is something we could slide into a minor or not. It's a break, but only if you have special code that does something important the second time an API is called (like as part of a sanity check or test or something). We're leaning towards ok to break that in a minor but wasn't decided. I'll work on a PR and we can discuss what branches it can go into there.

I guess we're not considering 418 in either of these scenarios? :)

Don't tempt me! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants