Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filebeat should not setup ML modules if the index pattern does not exist #11349

Closed
sophiec20 opened this issue Mar 20, 2019 · 13 comments · Fixed by #14532
Closed

Filebeat should not setup ML modules if the index pattern does not exist #11349

sophiec20 opened this issue Mar 20, 2019 · 13 comments · Fixed by #14532
Assignees
Labels
bug Filebeat Filebeat good first issue Indicates a good issue for first-time contributors

Comments

@sophiec20
Copy link
Contributor

sophiec20 commented Mar 20, 2019

  • Version: 7.0.0-RC1
  • Operating System: x86_64 x86_64 x86_64 GNU/Linux
  • Steps to Reproduce:
  1. deploy elasticsearch from tar.gz
  2. deploy kibana from tag.gz
  3. ./filebeat setup --machine-learning

On a clean installation, if you run ./filebeat setup --machine-learning first, before any other filebeat setup step, then the ML jobs created are invalid. They have their datafeed config set for indices: INDEX_PATTERN. This string is meant to be substituted for the kibana index pattern filebeat-* which is not possible as it does not yet exist.

If the index pattern exists prior to running ml setup, then the ML jobs are created properly.

The following setup methods work, as the index pattern is created before ML setup.

  1. Two step setup
./filebeat setup --dashboards
./filebeat setup --machine-learning
  1. Module setup (preferred)
./filebeat setup --modules apache
  1. Full setup
./filebeat setup

I suspect this problem is not limited to RC1 and has existed in 6.x time frame.

On a more conceptual note, perhaps we could consider removing the ./filebeat setup --machine-learning command line option for the following reasons:

  1. ML jobs are already created during module setup, which is more targeted for specific named modules.
  2. I think the use case behind running ./filebeat setup --machine-learning would be in order to add ML jobs to an existing deployment that is already collecting data using filebeat. However this can be done already from the wizard inside the ML Kibana app.
  3. ./filebeat setup --machine-learning does not allow you to pick specific modules. As it stands today, filebeat setup will create both nginx and apache ML jobs and it is unlikely both are required.
@kvch
Copy link
Contributor

kvch commented Mar 21, 2019

I agree with removing --machine-learning from the flags.

However, the third reason you mentioned is not true. It is possible to pick a specific module using the -modules flag. So ./filebeat setup --machine-learning -modules nginx sets up nginx ML module only.

@sophiec20
Copy link
Contributor Author

Thanks for the clarification. I can confirm that if ./filebeat setup --machine-learning --modules nginx is run before the index pattern filebeat-* exists, then the jobs created are still invalid due to the datafeed config having indices: INDEX_PATTERN set.

@kvch
Copy link
Contributor

kvch commented Mar 22, 2019

I suggest to add a deprecation warning in 7.x and delete the flag in 8.0. So in the future ML would be set up when the user runs ./filebeat setup only.

@kvch kvch added the good first issue Indicates a good issue for first-time contributors label Apr 11, 2019
@urso
Copy link

urso commented Jul 16, 2019

The main problem is the index pattern + the ml code in kibana defaulting to INDEX_PATTERN_NAME, if the index pattern does not yet exists. Filebeat installs the index pattern only if dashboards are installed.

Filebeat calls a kibana API to install the machine learning jobs. It does not hold the jobs. When running filebeat setup only, but not having dashboards available for install (or disabled alltogether), it also fails, as job setup is part of filebeat setup. The extra flags like --machine-learning, and others are mostly used to selectively overwrite/change resources later.

I wonder if the index pattern name is indeed required to create the data feeds or not.
Based on this the kibana code should:

  • return an error if the index pattern does not exist, but is required
  • or just use the correct index names as told by the Beat, if index pattern is not really required to setup a data feed (not sure if API supports this).

@jgowdyelastic ^ Can you comment on this please?

If the index pattern is a must, then Beats should ensure that the index pattern exists before attempting to install machine learning jobs.

@jgowdyelastic
Copy link
Member

jgowdyelastic commented Jul 17, 2019

The setup endpoint could be changed to allow it to work if an index pattern hasn't been created. This would allow the datafeed to be created correctly in this situation.
However, the reason the index pattern check is there in the first place is because the index pattern id is needed for custom urls and kibana saved objects which are also created by the ml module.
So in this situation the custom urls and dashboards created by the ML module will be broken.

I think the suggestion return an error if the index pattern does not exist, but is required is something we should do on the ML side. If the module requires the index pattern ID because it contains custom URLs or saved objects which uses it, then it should return an error.
And the datafeed should always replace INDEX_PATTERN_NAME even if there is no index pattern ID.

This change will not fix the issue with beats, because all of the Beats ML modules contain custom URLs and saved objects and so a valid index pattern is required.

@urso
Copy link

urso commented Jul 17, 2019

I think the suggestion return an error if the index pattern does not exist, but is required is something we should do on the ML side.

+1
I'm fine to always assume that the index pattern ID is required. Just in case users trying to create dashboards after ML. But in this case we should still return an error.

btw. I can easily workaround the error in ES security by adding read rights to an imaginary (non-existent) index named INDEX_PATTERN_NAME. This should not be possible.

This change will not fix the issue with beats, because all of the Beats ML modules contain custom URLs and saved objects and so a valid index pattern is required.

Which is ok. At least we would have a proper error message explaining the actual problem. On the Beats side we can still consider to separate the index pattern installation from the dashboards and require users to run it before. In the end filebeat setup --<stack-component-name> shall only be used to update the stack with recent changes. The initial setup requires filebeat setup as is.

@sophiec20
Copy link
Contributor Author

I missed how this conversation evolved...
Is there still a plan to remove machine learning setup from the beats command line in 8.x?
If so, is there a deprecation warning for 7.x?

@urso
Copy link

urso commented Nov 12, 2019

@ruflin Maybe you know best about the timeline. Like deprecating features in Beats and when integrations will have the required features.

@ruflin
Copy link
Member

ruflin commented Nov 14, 2019

@sophiec20 @urso I think it is too early to give a final answer. I hope the answer will be yes. If we remove it, we will need to add a deprecation warnings in the 7.x releases.

@sophiec20
Copy link
Contributor Author

sophiec20 commented Nov 14, 2019

The user experience from the command line is suboptimal as the setup step, in general, is happening before data is being ingested, and this is not the right time to create the ML jobs.

There is already a preferred way to create the ML modules from inside the ML UI App and there is a path to an improved future with Integrations. Considering the reward vs effort of improving the user experience of the setup ML command line in 7.x, then I would advocate we deprecate/remove this feature soon. From my point of view, because we already have the ability to create the ML modules in the Stack (i.e. in the ML app), then we are not actually removing functionality and this is not dependent on Integrations.

@ruflin
Copy link
Member

ruflin commented Nov 14, 2019

@sophiec20 +1 on your suggestion.

@urso
Copy link

urso commented Nov 14, 2019

Sounds good.

@kvch Please add a deprecation warning to 7.6.

@sophiec20 Is there some good documentation link we can include in the deprecation warning?

@kvch
Copy link
Contributor

kvch commented Nov 15, 2019

I opened a PR to add a deprecation warning: #14532

kvch added a commit that referenced this issue Nov 22, 2019
The flag `--machine-learning` is going to be removed in 8.0, as ML jobs have to be configured using the ML app in Kibana.

The following warning is emitted when running `./filebeat setup --machine-learning`:
```
$ ./filebeat setup --machine-learning
Setting up ML using setup --machine-learning is going to be removed in 8.0. Please use the ML app instead.
See more: https://www.elastic.co/guide/en/elastic-stack-overview/current/xpack-ml.html
Loaded machine learning job configurations
```

When the flag `-e` is passed a warning shows up in the logs:
```
2019-11-15T10:54:12.420+0100    WARN    [cfgwarn]       instance/beat.go:558    DEPRECATED: Setting up ML using Filebeat is going to be removed. Please use the
 ML app to setup jobs. Will be removed in version: 8.0
```

Has to be backported to 7.x.

Closes #11349
kvch added a commit to kvch/beats that referenced this issue Nov 22, 2019
The flag `--machine-learning` is going to be removed in 8.0, as ML jobs have to be configured using the ML app in Kibana.

The following warning is emitted when running `./filebeat setup --machine-learning`:
```
$ ./filebeat setup --machine-learning
Setting up ML using setup --machine-learning is going to be removed in 8.0. Please use the ML app instead.
See more: https://www.elastic.co/guide/en/elastic-stack-overview/current/xpack-ml.html
Loaded machine learning job configurations
```

When the flag `-e` is passed a warning shows up in the logs:
```
2019-11-15T10:54:12.420+0100    WARN    [cfgwarn]       instance/beat.go:558    DEPRECATED: Setting up ML using Filebeat is going to be removed. Please use the
 ML app to setup jobs. Will be removed in version: 8.0
```

Has to be backported to 7.x.

Closes elastic#11349
(cherry picked from commit 61718de)
kvch added a commit that referenced this issue Nov 22, 2019
The flag `--machine-learning` is going to be removed in 8.0, as ML jobs have to be configured using the ML app in Kibana.

The following warning is emitted when running `./filebeat setup --machine-learning`:
```
$ ./filebeat setup --machine-learning
Setting up ML using setup --machine-learning is going to be removed in 8.0. Please use the ML app instead.
See more: https://www.elastic.co/guide/en/elastic-stack-overview/current/xpack-ml.html
Loaded machine learning job configurations
```

When the flag `-e` is passed a warning shows up in the logs:
```
2019-11-15T10:54:12.420+0100    WARN    [cfgwarn]       instance/beat.go:558    DEPRECATED: Setting up ML using Filebeat is going to be removed. Please use the
 ML app to setup jobs. Will be removed in version: 8.0
```

Has to be backported to 7.x.

Closes #11349
(cherry picked from commit 61718de)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Filebeat Filebeat good first issue Indicates a good issue for first-time contributors
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants