Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds new SavedObjectsRespository error type for 404 that do not originate from Elasticsearch responses #107301

Conversation

TinaHeiligers
Copy link
Contributor

@TinaHeiligers TinaHeiligers commented Jul 30, 2021

Summary

Resolves #102353.

Replaces #107104.

There may be cases where a 404 can be returned during calls to the Saved Objects Repository for reasons other than the saved object not being found.

The Saved Objects service should distinguish between a Saved Object is not found and other unknown reasons that do not originate from responses to an Elasticsearch call.

This PR adds an explicit 503 NotFoundEsUnavailable Error that will wrap 404 errors from get, update and delete requests when the response does not contain the x-elastic-product: Elasticsearch header.

As a follow up to this initial implementation, we need to:

Checklist

Delete any items that are not applicable to this PR.

Risk Matrix

Risk Probability Severity Mitigation/Notes
False positives— An intermediate proxy doesn't forward x-elastic-product header . Low Low As of 7.15.0 Kibana won't start unless the required headers are present. For 7.14 patches, the type of error returned could be a 404 or a 503 but we will still get an error

For maintainers

Copy link
Contributor Author

@TinaHeiligers TinaHeiligers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments to reviewers.

const notFoundError = this.createGenericNotFoundError(type, id);
return this.decorateEsUnavailableError(
new Error(`${notFoundError.message}`),
`x-elastic-product not present or not recognized`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied self-review comment:

I'm not sure this is the best text string to use as a descriptor for the "missing" header. I took inspiration from the client's team but made it a lot more specific. We need to get Product's input on wording if we choose to follow this implementation.

public static isNotFoundEsUnavailableError(error: Error | DecoratedError) {
return (
isSavedObjectsClientError(error) &&
error[code] === CODE_ES_UNAVAILABLE &&
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied self-review comment:

Specifically set a status code of 503 to indicate an ES availability error. This overrides the 404 we would otherwise have thrown.

throw SavedObjectsErrorHelpers.createGenericNotFoundError(type, id);
} else {
// throw if we can't verify the response is from Elasticsearch
throw SavedObjectsErrorHelpers.createGenericNotFoundEsUnavailableError(type, id);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-review:
createGenericNotFoundEsUnavailableError will return a 503 error that adds the message from a NotFoundError to the EsUnavailableError. The decorated error that's thrown will assert true to a check for isEsUnavailableError, since the latter returns a boolean for the statusCode being 503.
If a saved object type and id are provided, the error will have the form:

{
	message: 'x-elastic-product not present or not recognized: Saved object [foo/bar] not 	found',
	statusCode: 503
}

Otherwise, the error will have the form:

{
	message: 'x-elastic-product not present or not recognized: Not Found',
	statusCode: 503
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional nit: I had to read createGenericNotFoundEsUnavailableError implementation to understand what error it will create. Maybe we should pass an error reason to decorateEsUnavailableError explicitly?

decorateEsUnavailableError(error, `Saved object [${type}/${id}] not found`);

// throw if we can't verify the response is from Elasticsearch

Will we add such a helper for every possible error response?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joshdover which question did you give the thumbs up for in #107301 (comment)? i.e.

Maybe we should pass an error reason to decorateEsUnavailableError explicitly?

or

Will we add such a helper for every possible error response?

I'm not sure which way we want to go eventually, as long as we provide enough info to easily figure out the root cause of an error when it's thrown.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should pass an error reason to decorateEsUnavailableError explicitly?

We would still need to provide an error to pass to decorateEsUnavailableError:

public static decorateEsUnavailableError(error: Error, reason?: string) {
    return decorate(error, CODE_ES_UNAVAILABLE, 503, reason);
  }

which is what we're doing in createGenericNotFoundEsUnavailableError.

Copy link
Contributor Author

@TinaHeiligers TinaHeiligers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code comments.

src/core/server/saved_objects/service/lib/repository.ts Outdated Show resolved Hide resolved
@TinaHeiligers
Copy link
Contributor Author

@elasticmachine merge upstream

Copy link
Contributor

@YulNaumenko YulNaumenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Awesome work! I've tested this locally by simulating situation with ES restart during the Kibana was still running - as a result all scheduled rules continue running after Kibana got back to the healthy state.
This PR should resolve this Alerting issue with no actions from the our side

@joshdover
Copy link
Contributor

@mshustov regarding your comment regarding using this header before our version check uses it:

Question: Might it lead to false positives if an intermediate proxy doesn't forward x-elastic-product header? In v7.14, we do allow Kibana starts even if x-elastic-product header is missing. Kibana won't start unless the header presents starting from v7.15 #105557
Does it mean that we cannot backport this change to v7.14.1?

In addition to @TinaHeiligers's note that this check only applies to 404, it also seems quite possible that we will have #105557 fixed for the 7.14.1 release which makes me less concerned about this. Though maybe we should consider delaying a backport until that change is also backported.

@TinaHeiligers
Copy link
Contributor Author

Though maybe we should consider delaying a backport until that change is also backported

@joshdover do you mean the 7.14.1 backport or both backports to 7.14.1 and 7.x (7.15)?

@mshustov
Copy link
Contributor

mshustov commented Aug 3, 2021

ack: going to review tomorrow (04 Aug.)

src/core/server/elasticsearch/client/mocks.ts Outdated Show resolved Hide resolved
);
}

public static isNotFoundEsUnavailableError(error: Error | DecoratedError) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this API? If we want plugins to treat this error as any other ES not found issue, I think it'd be ok to only include the factory function createGenericNotFoundEsUnavailableError so that plugins don't waste time trying to figure out how they should handle this error in a different way.

Copy link
Contributor Author

@TinaHeiligers TinaHeiligers Aug 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually using the API within the repository to verify error types here. It also ensures we stay consistent with repository error type checks, such as isNotFoundError, with the additional check on the error reason.

isNotFoundEsUnavailableError isn't an error itself, it returns a boolean.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but as a plugin developer, I'd want to handle each error exposed on SavedObjectsErrorHelpers.is* in order to make my code bulletproof. I'd expect that each of these is*Error methods to only correspond to one error type and not overlap, but now that is no longer the case since both isEsUnavailableError and isNotFoundEsUnavailableError could return true on the same error. I'd also expect that each error type would correspond to something different I could do to handle this, however the underlying problem for both of these errors is the same and there's really nothing different I can/should do as a developer for the isNotFoundEsUnavailableError case vs the isEsUnavailable one.

Are you planning to add similar new error guards for each method as part of #107343? Personally, I think we should only expose isEsUnavailable for all of these situations to signal to plugins that they should handle this error in the same way for every method.

Copy link
Contributor Author

@TinaHeiligers TinaHeiligers Aug 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I think we should only expose isEsUnavailable for all of these situations to signal to plugins that they should handle this error in the same way for every method.

That makes sense. I've removed isNotFoundEsUnavailableError.

elasticsearchClientMock.createSuccessTransportRequestPromise(
{ found: false },
undefined,
{ 'x-elastic-product': 'Elasticsearch' }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should include this header as the default in elasticsearchClientMock.createSuccessTransportRequestPromise() (or even better might be in the createApiResponse function that this function calls) and only override it when needed for these tests. That would better represent the expected typical conditions in our mocks rather than the exceptional case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to handle that in the follow up issue (that I'm working on right now anyway), as it's going to become the typical conditions when we add the checks to the rest of the SO repository methods.

src/core/server/saved_objects/service/lib/repository.ts Outdated Show resolved Hide resolved
.then((res) => {
const indexNotFound = res.statusCode === 404;
const esServerSupported = isSupportedEsServer(res.headers);
// check if we have the elasticsearch header when doc.found is not true (false, undefined or null) and if we do, ensure it is Elasticsearchq
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// check if we have the elasticsearch header when doc.found is not true (false, undefined or null) and if we do, ensure it is Elasticsearchq
// check if we have the elasticsearch header when index is not found and if we do, ensure it is Elasticsearch

@joshdover
Copy link
Contributor

do you mean the 7.14.1 backport or both backports to 7.14.1 and 7.x (7.15)?

I just mean delaying the backport to 7.14.1

@@ -141,9 +141,10 @@ export type MockedTransportRequestPromise<T> = TransportRequestPromise<T> & {

const createSuccessTransportRequestPromise = <T>(
body: T,
{ statusCode = 200 }: { statusCode?: number } = {}
{ statusCode = 200 }: { statusCode?: number } = {},
headers?: Record<string, any>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joshdover thanks for pointing the type issue out!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

headers?: Record<string, string | string[]>

see

export type Headers = { [header in KnownHeaders]?: string | string[] | undefined } & {
[header: string]: string | string[] | undefined;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TinaHeiligers
Copy link
Contributor Author

jenkins test this please

@TinaHeiligers
Copy link
Contributor Author

@elasticmachine merge upstream

@TinaHeiligers
Copy link
Contributor Author

Summary of PR status:
Initial implementation of extra check to make sure we get a response from elasticsearch before throwing 404’s. This handles the immediate need for the Alerting team, where they need to make sure we get a response from elasticsearch before deleting an alert that isn’t found.

@dover @mshustov @rudolf I’ve created the following issues to follow up on from this work:

  1. Implement similar checks in all the methods (Add general product check to all responses from the SO repository calls to ES. #107343)
  2. Add integration test as a smoke screen to make sure the behavior is as expected. (Add integration test to ensure the SavedObjectsRepository handles responses correctly #107246)
  3. Improve on the way we interpret 404 responses and change to throwing 500 errors when the index isn’t found. (Saved Objects repository: only throw a 404 error when the document isn't found #107944)

Please let me know if I’ve missed something or if there are remaining issues to tackle before moving forward with the PR.

I’m hoping we can a fix in before 7.15 FF but, depending on priorities, don't know if there's enough time to get through all the follow up issues before FF.

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

Unknown metric groups

API count

id before after diff
core 2455 2458 +3

API count missing comments

id before after diff
core 1166 1169 +3

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@TinaHeiligers TinaHeiligers added auto-backport Deprecated - use backport:version if exact versions are needed v7.14.1 and removed v7.14.1 auto-backport Deprecated - use backport:version if exact versions are needed labels Aug 10, 2021
@TinaHeiligers
Copy link
Contributor Author

TinaHeiligers commented Aug 10, 2021

Note to self: backport manually to 7.14.1.
Update: #107663 only landed in 7.16 and we decided not to backport as far back as 7.14.1.
The changes are in for 7.15, which is already a minor ahead of the Kibana-wide startup product check.

@TinaHeiligers TinaHeiligers merged commit cdf90aa into elastic:master Aug 10, 2021
@TinaHeiligers TinaHeiligers deleted the so-errors/identify-not-found-form-esunavailable branch August 10, 2021 13:16
kibanamachine added a commit to kibanamachine/kibana that referenced this pull request Aug 10, 2021
…nate from Elasticsearch responses (elastic#107301)

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
@kibanamachine
Copy link
Contributor

💚 Backport successful

Status Branch Result
7.x

This backport PR will be merged automatically after passing CI.

@TinaHeiligers TinaHeiligers added v7.14.1 and removed auto-backport Deprecated - use backport:version if exact versions are needed labels Aug 10, 2021
kibanamachine added a commit that referenced this pull request Aug 10, 2021
…nate from Elasticsearch responses (#107301) (#108037)

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

Co-authored-by: Christiane (Tina) Heiligers <christiane.heiligers@elastic.co>
@ymao1
Copy link
Contributor

ymao1 commented Sep 23, 2021

@TinaHeiligers Was this backported to 7.14? The tag says 7.14 but I can't find a backport PR. Wondering because we are seeing a saved object not found error for a rule in 7.14.1 in a cluster that seems to have some proxy connectivity issues.

@TinaHeiligers
Copy link
Contributor Author

TinaHeiligers commented Sep 23, 2021

@ymao1 no, it wasn't backported to 7.14.1. To follow up on this, 7.15 was around the corner, and there was still work that needed to be done to wrap it up.
We also needed to wait for #107663 to land, which was only backported to 7.16.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add explicit error type for SavedObjectsRepository for 404 errors that do not originate from Elasticsearch
7 participants