Skip to content

Conversation

@DonalEvans
Copy link
Contributor

@DonalEvans DonalEvans commented Sep 4, 2025

Rather than always throwing an ElasticsearchStatusException with 500 status in ActionUtils.wrapFailuresInElasticsearchException(), determine the appropriate status from the unwrapped exception

  • Remove createInternalServerError() method from ActionUtils
  • Refactor AlibabaCloudSearch*Action classes to be consistent with SenderExecutableAction
  • Update tests to account for new behaviour

Instead of returning a 500 (internal server error) response when the
RequestExecutorService queue is full and a new request is submitted,
return a 429 (too many requests) response.

- Wrap the existing EsRejectedExecutionException in an
  ElasticsearchStatusException before throwing
- Update existing tests for new behaviour
@DonalEvans DonalEvans added >enhancement :ml Machine learning Team:ML Meta label for the ML team v9.2.0 labels Sep 4, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@davidkyle
Copy link
Member

Elasticsearch has code in the REST layer that catches any exception and generates a suitable status code for the exception

this(channel, ExceptionsHelper.status(e), e);

Drilling into ExceptionsHelper.status() there is a check for EsRejectedExecutionException and it should return a 429

} else if (t instanceof EsRejectedExecutionException) {

Any EsRejectedExecutionException should be converted to a 429 status code, the question is why has that not happened.

I hardcoded RequestExecutorService to throw a EsRejectedExecutionException and on POST _inference/my_endpoint I got the rejected error but with a status code 500, ie. has not been converted to a 429. The Elasticsearch REST API has an error_trace param that includes the exceptions stack trace (POST _inference/my_endpoint?error_trace) using that option I got this stack trace:

       "stack_trace": """org.elasticsearch.ElasticsearchStatusException: Failed to send X embeddings request. Cause: root cause rejected
	at org.elasticsearch.inference@9.2.0-SNAPSHOT/org.elasticsearch.xpack.inference.external.action.ActionUtils.createInternalServerError(ActionUtils.java:41)
	at org.elasticsearch.inference@9.2.0-SNAPSHOT/org.elasticsearch.xpack.inference.external.action.ActionUtils.lambda$wrapFailuresInElasticsearchException$0(ActionUtils.java:31)
	at org.elasticsearch.server@9.2.0-SNAPSHOT/org.elasticsearch.action.ActionListenerImplementations$DelegatingResponseActionListener.acceptException(ActionListenerImplementations.java:202)
	at org.elasticsearch.server@9.2.0-SNAPSHOT/org.elasticsearch.action.ActionListenerImplementations.safeAcceptException(ActionListenerImplementations.java:78)
	at org.elasticsearch.server@9.2.0-SNAPSHOT/org.elasticsearch.action.ActionListenerImplementations$DelegatingResponseActionListener.onFailure(ActionListenerImplementations.java:207)
	at org.elasticsearch.server@9.2.0-SNAPSHOT/org.elasticsearch.action.ActionListenerImplementations.safeAcceptException(ActionListenerImplementations.java:78)
	at org.elasticsearch.server@9.2.0-SNAPSHOT/org.elasticsearch.action.ActionListenerImplementations.safeOnFailure(ActionListenerImplementations.java:89)
	at org.elasticsearch.server@9.2.0-SNAPSHOT/org.elasticsearch.action.DelegatingActionListener.onFailure(DelegatingActionListener.java:32)
	at org.elasticsearch.server@9.2.0-SNAPSHOT/org.elasticsearch.action.support.ContextPreservingActionListener.onFailure(ContextPreservingActionListener.java:40)
	at org.elasticsearch.inference@9.2.0-SNAPSHOT/org.elasticsearch.xpack.inference.external.http.sender.TimedListener.lambda$getListener$1(TimedListener.java:49)
	at org.elasticsearch.server@9.2.0-SNAPSHOT/org.elasticsearch.action.ActionListenerImplementations.safeAcceptException(ActionListenerImplementations.java:78)
	at org.elasticsearch.server@9.2.0-SNAPSHOT/org.elasticsearch.action.ActionListener$2.onFailure(ActionListener.java:266)
	at org.elasticsearch.server@9.2.0-SNAPSHOT/org.elasticsearch.action.support.ListenerTimeouts$TimeoutableListener.onFailure(ListenerTimeouts.java:96)
	at org.elasticsearch.inference@9.2.0-SNAPSHOT/org.elasticsearch.xpack.inference.external.http.sender.RequestTask.onRejection(RequestTask.java:59)
	at org.elasticsearch.inference@9.2.0-SNAPSHOT/org.elasticsearch.xpack.inference.external.http.sender.RequestExecutorService$RateLimitingEndpointHandler.enqueue(RequestExecutorService.java:495)
	at org.elasticsearch.inference@9.2.0-SNAPSHOT/org.elasticsearch.xpack.inference.external.http.sender.RequestExecutorService.execute(RequestExecutorService.java:342)
....

And that takes us to

if (unwrappedException instanceof ElasticsearchException esException) {

EsRejectedExecutionException is not an ElasticsearchException hence the 500 error code.

RequestExecutorService is doing the right thing by returning a EsRejectedExecutionException but this translation logic in ActionUtils perhaps unnecessary. Please can you check if anything is relying on that code as it would be simplest to remove it and rely on the REST layer to generate the status code.

@DonalEvans
Copy link
Contributor Author

RequestExecutorService is doing the right thing by returning a EsRejectedExecutionException but this translation logic in ActionUtils perhaps unnecessary. Please can you check if anything is relying on that code as it would be simplest to remove it and rely on the REST layer to generate the status code.

It looks like we have quite a few unit tests that are explicitly testing the behaviour of the wrapFailuresInElasticsearchException() method and which fail if I change SenderExecutableAction.execute() to not use it, but none of the inference integration tests fail with the same change, so perhaps it's fine? I don't know how good our coverage is in integration tests, so it's hard to know how confident I should be in the change. I'll try fixing all the unit tests to account for the change in behaviour and push a commit to see what happens with the other tests.

Rather than always throwing an ElasticsearchStatusException with 500
status in ActionUtils.wrapFailuresInElasticsearchException(), determine
the appropriate status from the unwrapped exception

- Remove createInternalServerError() method from ActionUtils
- Refactor AlibabaCloudSearch*Action classes to be consistent with
  SenderExecutableAction
- Update tests to account for new behaviour
@elasticsearchmachine
Copy link
Collaborator

Hi @DonalEvans, I've created a changelog YAML for you.

@DonalEvans DonalEvans requested a review from davidkyle September 5, 2025 21:20
Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I left a suggestion for some code cleanup with the Alibaba action classes. Please can you do that either as part of this PR or in another PR

// Determine the appropriate RestStatus from the unwrapped exception, then wrap in an ElasticsearchStatusException
new ElasticsearchStatusException(
Strings.format("%s. Cause: %s", errorMessage, unwrappedException.getMessage()),
ExceptionsHelper.status(unwrappedException),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 that's a neat solution

return;
}

ActionListener<InferenceServiceResults> wrappedListener = wrapFailuresInElasticsearchException(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These Alibaba classes are the outlier everything else is done by SenderExecutableAction. Please can you convert AlibabaCloudSearchActionCreator to produce SenderExecutableAction and remove these classes. You will probably have to keep AlibabaCloudSearchCompletionAction as that has extra logic in it to check the inputs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix this in another PR, since it's pretty far out of scope for this one.

@DonalEvans DonalEvans merged commit 92b15a3 into elastic:main Sep 8, 2025
33 checks passed
@DonalEvans DonalEvans deleted the return-429-when-queue-full branch September 8, 2025 16:02
rjernst pushed a commit to rjernst/elasticsearch that referenced this pull request Sep 9, 2025
…c#134178)

Rather than always throwing an ElasticsearchStatusException with 500
status in ActionUtils.wrapFailuresInElasticsearchException(), determine
the appropriate status from the unwrapped exception

- Remove createInternalServerError() method from ActionUtils
- Refactor AlibabaCloudSearch*Action classes to be consistent with
  SenderExecutableAction
- Update tests to account for new behaviour
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :ml Machine learning Team:ML Meta label for the ML team v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants