Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error when ES Ingest node plugins are not loaded #3676

Merged
merged 3 commits into from Mar 1, 2017

Conversation

Projects
None yet
4 participants
@tsg
Copy link
Collaborator

commented Feb 27, 2017

We're parsing the Elasticsearch JSON error and try to produce an
error message that is as helpful as possible. The following cases
are detected:

  • A plugin providing a processor is missing. In case the plugin is one of
    ingest-geoip or ingest-user-agent, we can also suggest the command that
    installs them.
  • Elasticsearch < 5.0. We now detect this and tell the user that ES 5.0 is
    required by FBM.

A drawback of this approach is that if both the GeoIP and User-Agent plugins
are missing, only one will be reported. This might get solved by including the
user-agent one in ES, or by improving the error we get from ES, or by us querying
the node stats API

Note: this contains a change in the ES client, which makes it return the body
in case of errors. I think we need that part anyway, otherwise we often show
errors like 400 Bad request without any other details. I tried to do a minimal
change there, I hope I didn't introduce any changes in behaviour.

@tsg tsg added the in progress label Feb 27, 2017

Improve error when ES ingest node plugins are not loaded
We're parsing the Elasticsearch JSON error and try to produce an
error message that is as helpful as possible. The following cases
are detected:

* A plugin providing a processor is missing. In case the plugin is one of
  `ingest-geoip` or `ingest-user-agent`, we can also suggest the command that
  installs them.
* Elasticsearch < 5.0. We now detect this and tell the user that ES 5.0 is
  required by FBM.

A drawback of this approach is that if both the GeoIP and User-Agent plugins
are missing, only one will be reported. This might get solved by including the
user-agent one in ES, or by improving the error we get from ES, or by us querying
the node stats API

Note: this contains a change in the ES client, which makes it return the body
in case of errors. I think we need that part anyway, otherwise we often show
errors like `400 Bad request` without any other details. I tried to do a minimal
change there, I hope I didn't introduce any changes in behaviour.

@tsg tsg force-pushed the tsg:filebeat_modules_improve_error branch from 5085438 to 595dd48 Feb 27, 2017

@ruflin

ruflin approved these changes Feb 27, 2017

@@ -245,7 +247,7 @@ func (reg *ModuleRegistry) GetProspectorConfigs() ([]*common.Config, error) {
// PipelineLoader is a subset of the Elasticsearch client API capable of loading
// the pipelines.
type PipelineLoader interface {
LoadJSON(path string, json map[string]interface{}) error

This comment has been minimized.

Copy link
@ruflin

ruflin Feb 27, 2017

Collaborator

I would use error as the second return argument

This comment has been minimized.

Copy link
@tsg

tsg Feb 27, 2017

Author Collaborator

I updated this to have the error as the last returned arg.

} `json:"error"`
}
err := json.Unmarshal(body, &response)
if err != nil {

This comment has been minimized.

Copy link
@ruflin

ruflin Feb 27, 2017

Collaborator

client.Connection.version could be used to check the ES version. Unfortunately the variable is currently not public.

This comment has been minimized.

Copy link
@tsg

tsg Feb 27, 2017

Author Collaborator

Hmm, yeah, that would be an option. But we also don't have the client here and adding it would complicate unit testing, and we still need to do the error checking anyway, so I think it doesn't win us that much to add version checks.


// missing plugins?
if len(response.Error.RootCause) > 0 &&
response.Error.RootCause[0].Type == "parse_exception" &&

This comment has been minimized.

Copy link
@ruflin

ruflin Feb 27, 2017

Collaborator

ES should (in the future) expose here a special root cause type so no check of the text is needed.

This comment has been minimized.

Copy link
@tsg

tsg Feb 27, 2017

Author Collaborator

Yeah, I think mvg is working on improving that.

@ruflin

ruflin approved these changes Feb 28, 2017

@ruflin ruflin requested a review from urso Feb 28, 2017

if status >= 300 {
return status, nil, fmt.Errorf("%v", resp.Status)
retErr = fmt.Errorf("%v", resp.Status)
}

obj, err := ioutil.ReadAll(resp.Body)
if err != nil {
return status, nil, err

This comment has been minimized.

Copy link
@urso

urso Feb 28, 2017

Collaborator

we want to return retErr or err here?

This comment has been minimized.

Copy link
@tsg

tsg Feb 28, 2017

Author Collaborator

Hmm, debatable I guess. I'm changing it to retErr because I think that's closer to the previous behavior. Thanks.

var response1x struct {
Error string `json:"error"`
}
json.Unmarshal(body, &response1x)

This comment has been minimized.

Copy link
@urso

urso Feb 28, 2017

Collaborator

json.Unmarshal might also fails, cause body is not json at all. the client was changed to always return the raw body. e.g. if error is in nginx proxying to ES, content might be a plain message.

This comment has been minimized.

Copy link
@tsg

tsg Feb 28, 2017

Author Collaborator

I know, but in that case we fall back to the "Could not load pipeline. Additionally, error decoding body:" message, which I think is what we want. Looking into making the code clearer..

plugins := map[string]string{
"geoip": "ingest-geoip",
"user_agent": "ingest-user-agent",
}

This comment has been minimized.

Copy link
@urso

urso Feb 28, 2017

Collaborator

generalize map with:

func ingestName(name string) name {
  return fmt.Sprintf("ingest-%v", strings.Replace(name, "_", "-"))
}

This comment has been minimized.

Copy link
@tsg

tsg Feb 28, 2017

Author Collaborator

Hmm, i don't know if that is future proof. There's no guarantees that other plugins will follow this pattern.

// missing plugins?
if len(response.Error.RootCause) > 0 &&
response.Error.RootCause[0].Type == "parse_exception" &&
strings.HasPrefix(response.Error.RootCause[0].Reason, "No processor type exists with name") &&

This comment has been minimized.

Copy link
@urso

urso Feb 28, 2017

Collaborator

instead of checking error message prefix, is there an error type report we can check?

This comment has been minimized.

Copy link
@tsg

tsg Feb 28, 2017

Author Collaborator

The error type is parse_exception, so that's too generic.

@martijnvg are you adding a new error type as part of your PRs?

This comment has been minimized.

Copy link
@martijnvg

martijnvg Feb 28, 2017

Member

The PR doesn't change that. It failed during parsing hence the error type is parse_exception.

I'm open in changing this, but not sure what other existing error type to use. ES is very defensive in introducing new error types. The only existing general error type that comes up to me is resource_not_found. It is generic too, but maybe in this context better? (indicating that a processor type doesn't exist).

This comment has been minimized.

Copy link
@tsg

tsg Feb 28, 2017

Author Collaborator

I think I'd prefer not changing it in that case, because we'd still have to leave this branch in the code for ES < 5.4, so it's probably not worth it.

return fmt.Errorf("The Ingest Node functionality seems to be missing from Elasticsearch. "+
"The Filebeat modules require Elasticsearch >= 5.0. "+
"This is the response I got from Elasticsearch: %s", body)
}

This comment has been minimized.

Copy link
@urso

urso Feb 28, 2017

Collaborator

which error message will be returned in ingest node is disabled?

This comment has been minimized.

Copy link
@tsg

tsg Feb 28, 2017

Author Collaborator

Just tried it out. There's no real way of disabling the ingest functionality, but what one can do is set node.ingest: false on all the ES nodes. In that case, the pipeline loading works as usual but the _bulk insert fails with:

path: /_bulk, params: {}
java.lang.IllegalStateException: There are no ingest nodes in this cluster, unable to forward request to an ingest node.

I'd say improving the error handling in that case is beyond the scope of this PR.

This comment has been minimized.

Copy link
@urso

urso Feb 28, 2017

Collaborator

ouch. Maybe we can probe ingest node availability via simulate API?

This comment has been minimized.

Copy link
@tsg

tsg Feb 28, 2017

Author Collaborator

Yeah, maybe, some other time :)

@urso urso merged commit 96bb79b into elastic:master Mar 1, 2017

4 checks passed

CLA Commit author is a member of Elasticsearch
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
default Build finished.
Details

tsg added a commit to tsg/beats that referenced this pull request Mar 1, 2017

Improve error when ES Ingest node plugins are not loaded (elastic#3676)
* Improve error when ES ingest node plugins are not loaded

We're parsing the Elasticsearch JSON error and try to produce an
error message that is as helpful as possible. The following cases
are detected:

* A plugin providing a processor is missing. In case the plugin is one of
  `ingest-geoip` or `ingest-user-agent`, we can also suggest the command that
  installs them.
* Elasticsearch < 5.0. We now detect this and tell the user that ES 5.0 is
  required by FBM.

A drawback of this approach is that if both the GeoIP and User-Agent plugins
are missing, only one will be reported. This might get solved by including the
user-agent one in ES, or by improving the error we get from ES, or by us querying
the node stats API

Note: this contains a change in the ES client, which makes it return the body
in case of errors. I think we need that part anyway, otherwise we often show
errors like `400 Bad request` without any other details. I tried to do a minimal
change there, I hope I didn't introduce any changes in behaviour.

* Move error after []byte in the returned values

* addressed comments and added more tests

(cherry picked from commit 96bb79b)

@tsg tsg removed the needs_backport label Mar 1, 2017

monicasarbu added a commit that referenced this pull request Mar 1, 2017

Improve error when ES Ingest node plugins are not loaded (#3676) (#3703)
* Improve error when ES ingest node plugins are not loaded

We're parsing the Elasticsearch JSON error and try to produce an
error message that is as helpful as possible. The following cases
are detected:

* A plugin providing a processor is missing. In case the plugin is one of
  `ingest-geoip` or `ingest-user-agent`, we can also suggest the command that
  installs them.
* Elasticsearch < 5.0. We now detect this and tell the user that ES 5.0 is
  required by FBM.

A drawback of this approach is that if both the GeoIP and User-Agent plugins
are missing, only one will be reported. This might get solved by including the
user-agent one in ES, or by improving the error we get from ES, or by us querying
the node stats API

Note: this contains a change in the ES client, which makes it return the body
in case of errors. I think we need that part anyway, otherwise we often show
errors like `400 Bad request` without any other details. I tried to do a minimal
change there, I hope I didn't introduce any changes in behaviour.

* Move error after []byte in the returned values

* addressed comments and added more tests

(cherry picked from commit 96bb79b)

tsg added a commit to tsg/beats that referenced this pull request Mar 1, 2017

Show all missing plugins in the same err message
A drawback of the error handling in elastic#3676 was that if more than one
plugin is missing in ES, the error only reported the first one. This means
that the user might go through an annoying trial and error cycle.

To solve this, we make the modules declare their processor requirements
in the `manifest.yml` file and we compare them with the available processors
from calling `/_nodes/ingest`.

In case the module author doesn't declare the required processor plugins, the
error handling in elastic#3676 still applies.

ruflin added a commit that referenced this pull request Mar 2, 2017

Show all missing plugins in the same err message (#3706)
A drawback of the error handling in #3676 was that if more than one
plugin is missing in ES, the error only reported the first one. This means
that the user might go through an annoying trial and error cycle.

To solve this, we make the modules declare their processor requirements
in the `manifest.yml` file and we compare them with the available processors
from calling `/_nodes/ingest`.

In case the module author doesn't declare the required processor plugins, the
error handling in #3676 still applies.

tsg added a commit to tsg/beats that referenced this pull request Mar 2, 2017

Show all missing plugins in the same err message (elastic#3706)
A drawback of the error handling in elastic#3676 was that if more than one
plugin is missing in ES, the error only reported the first one. This means
that the user might go through an annoying trial and error cycle.

To solve this, we make the modules declare their processor requirements
in the `manifest.yml` file and we compare them with the available processors
from calling `/_nodes/ingest`.

In case the module author doesn't declare the required processor plugins, the
error handling in elastic#3676 still applies.
(cherry picked from commit ddba3f1)

ruflin added a commit that referenced this pull request Mar 3, 2017

Show all missing plugins in the same err message (#3706) (#3711)
A drawback of the error handling in #3676 was that if more than one
plugin is missing in ES, the error only reported the first one. This means
that the user might go through an annoying trial and error cycle.

To solve this, we make the modules declare their processor requirements
in the `manifest.yml` file and we compare them with the available processors
from calling `/_nodes/ingest`.

In case the module author doesn't declare the required processor plugins, the
error handling in #3676 still applies.
(cherry picked from commit ddba3f1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.