Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Allow to preconfigure alternative ES outputs (on the same cluster) #111002

Merged
merged 25 commits into from Sep 21, 2021

Conversation

nchaulet
Copy link
Member

@nchaulet nchaulet commented Sep 2, 2021

Summary

Resolve partially #108968

Allow to preconfigure alternative Elasticsearch output in the same cluster and use them in the preconfigure API (= specifying policy and output in the kibana config file).

That PR allow to define multiple outputs in the Kibana config file and to use them in preconfigured policies as the data output or monitoring output.

Due to a limitation in Fleet server we only support one output per agent policy, this is enforced in the config schema.

We will not support remote cluster in 7.16.0 so that PR remove the ability to specify fleet_server.service_token.

Predefined ouput can only be added/updated/removed through the kibana config file.

Implementation details

  • I move the generation of the full policy (aka aggregated policy for the agent in his own module, as the agent policy started to be really large)

Config examples

xpack.fleet.outputs:
  - name: 'Test output'
    type: 'elasticsearch'
    id: 'output-123'
    hosts: ['http://test.fr']
  - name: 'Test output 2'
    type: 'elasticsearch'
    id: 'output-test'
    hosts: ['http://test.fr']
xpack.fleet.agentPolicies:
  - name: Alternative Default Agent Policy
    id: policy124
    namespace: test
    is_default: true
    is_managed: true
    data_output_id: output-123
    monitoring_output_id: output-123
    package_policies:
      - package:
          name: system
        name: System Integration
  - name: Alternative Default Fleet server Agent Policy
    id: policy123
    namespace: test
    is_default_fleet_server: true
    package_policies:
      - package:
          name: system
        name: System Integration
  - name: Test agent policy
    is_managed: true
    id: test-policy
    is_default_fleet_server: true
    monitoring_enabled: []
    package_policies:
      - package:
          name: fleet_server
        name: Fleet Server
        inputs:
          - type: fleet-server
            keep_enabled: true

How to test

You can build your own fleet server based on elastic/fleet-server#713 or just check for the generated policy in .fleet-policies

I am planning to add an e2e test after this PR and the fleet server PR got merged.

@nchaulet nchaulet self-assigned this Sep 2, 2021
@nchaulet nchaulet added Team:Fleet Team label for Observability Data Collection Fleet team v7.16.0 labels Sep 2, 2021
@nchaulet nchaulet added v8.0.0 auto-backport Deprecated: Automatically backport this PR after it's merged labels Sep 8, 2021
@nchaulet
Copy link
Member Author

nchaulet commented Sep 8, 2021

@joshdover I tried to used encryptedSavedObject here to encrypt fleet_server.service_token here but I am having an issue the encryptedSavedObject service do not allow to use predefined id that are not uuid:
I see a few potential solutions here but I would love to have your point of view here:

  • do we want to enforce uuid for output ids id do not seems the best user experience?
  • do we want to have a different field for the config id than the saved object id?
  • Is there any reason for the ecnryptedSavedObject service to enforce that or could we change that behavio?

@nchaulet nchaulet added the release_note:skip Skip the PR/issue when compiling release notes label Sep 9, 2021
@nchaulet nchaulet marked this pull request as ready for review September 9, 2021 19:52
@nchaulet nchaulet requested a review from a team as a code owner September 9, 2021 19:52
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@joshdover
Copy link
Member

@joshdover I tried to used encryptedSavedObject here to encrypt fleet_server.service_token here but I am having an issue the encryptedSavedObject service do not allow to use predefined id that are not uuid:
I see a few potential solutions here but I would love to have your point of view here:

  • do we want to enforce uuid for output ids id do not seems the best user experience?
  • do we want to have a different field for the config id than the saved object id?

I think between these two options, I prefer the second one as it allows a better user experience while not changing the mechanics of how ESOs work. One concern with this approach though is the possibility of having two outputs with the same output_id field since we can't enforce uniqueness 100% with Elasticsearch on fields other than _id. However, I think there is one way we can workaround this by generating a UUIDv5 which allows us to 'seed' the ID generator so the UUID that comes out is deterministic and will always be the same for a given output_id. Something like this should work:

import uuidv5 from 'uuid/v5';

const esoId = uuidv5(output.id, uuidv5.DNS); // the second arg is the namespace and is arbitrary here

If for some reason that doesn't work, we'll probably need to do the following:

  • A preflight check before creating the object that no other objects have the same output_id
  • When fetching the output objects, if more than one object is retreived for the same output_id we should:
    • If the objects are identical, delete one of them and update any policies that are pointing to the duplicate that is to be deleted.
    • If the objects are different, log & display an error that includes enough information for an admin to delete one of these objects as needed. In the future we can probably create an API and UI button for this action.
  • Is there any reason for the ecnryptedSavedObject service to enforce that or could we change that behavio?

Comment from the codebase:

// Saved objects with encrypted attributes should have IDs that are hard to guess especially
// since IDs are part of the AAD used during encryption, that's why we control them within this
// wrapper and don't allow consumers to specify their own IDs directly unless overwriting the original document.

So it seems that using a randomized ID makes it harder for an attacker to find an encrypted object (not sure I follow that logic exactly) and the ID itself is used as part of the encryption process.

@nchaulet
Copy link
Member Author

@joshdover I really like the solution with uuid/v5 worked without to much code change 👍

@nchaulet
Copy link
Member Author

@elasticmachine merge upstream

@nchaulet
Copy link
Member Author

@joshdover We are probably not going to support remote cluster for 7.16, what do you think will be the best approach here?

  • remove all code related to remote cluster and multiple output per policy
  • merge the support with a feature flag

@nchaulet
Copy link
Member Author

@joshdover thanks for the testing I found the issue you encountered, I was missing a case to handle the default output correctly, I fixed it in the ouput service getDefaultOuputId function and added the test case for that.
Also regarding the unhandled promise rejection I found the issue in the existing preconfigured policies code I added a missing await to fix it here, but I am wondering if we should fix it in previous versions too.

@joshdover
Copy link
Member

Should updating an output's hosts in the kibana.yml file also apply that change to any policies that reference it? I see that it's updating the ingest-outputs saved object with the new host, however the policies in .fleet-policies remain unchanged. Here are the steps I took to produce this:

  1. Start Kibana with config provided above
  2. Call /api/fleet/setup
  3. Confirm policies in .fleet-policies reference correct output IDs and hosts
  4. Change the hosts field for output-123 to hosts: ['http://test.com:9100'] in kibana.yml
  5. Restart Kibana
  6. Call /api/fleet/setup again
  7. See that policies in .fleet-policies reference the previous host

Also regarding the unhandled promise rejection I found the issue in the existing preconfigured policies code I added a missing await to fix it here, but I am wondering if we should fix it in previous versions too.

Agreed, makes sense to put up a targeted fix for 7.15.1.

@nchaulet
Copy link
Member Author

@joshdover Yes I feel stupid here I totally miss the update policies part, the latest commit I pushed address that.

Copy link
Member

@joshdover joshdover left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after adding some additional test coverage

@@ -445,6 +445,38 @@ class AgentPolicyService {
return res;
}

public async bumpAllAgentPoliciesForOutput(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go ahead and add a unit test for this logic to verify the correct policies are updated.

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
fleet 1077 1085 +8

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
fleet 130.2KB 130.4KB +124.0B

Saved Objects .kibana field count

Every field in each saved object type adds overhead to Elasticsearch. Kibana needs to keep the total field count below Elasticsearch's default limit of 1000 fields. Only specify field mappings for the fields you wish to search on or query. See https://www.elastic.co/guide/en/kibana/master/development-plugin-saved-objects.html#_mappings

id before after diff
ingest-agent-policies 15 17 +2
ingest-outputs 8 10 +2
total +4
Unknown metric groups

API count

id before after diff
fleet 1178 1186 +8

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @nchaulet

@nchaulet nchaulet merged commit 659b295 into elastic:master Sep 21, 2021
@nchaulet nchaulet deleted the feature-outputs-preconfigure branch September 21, 2021 17:52
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Sep 21, 2021
@kibanamachine
Copy link
Contributor

💚 Backport successful

Status Branch Result
7.x

This backport PR will be merged automatically after passing CI.

@dikshachauhan-qasource
Copy link

Hi @EricDavisX

Could you please provide us more info on how to validate above PR.

Thanks
QAS

@EricDavisX
Copy link
Contributor

@dikshachauhan-qasource I don't know exactly what we want. Let's talk to the team and figure it out. I have put in a test ticket here: #114285

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Deprecated: Automatically backport this PR after it's merged release_note:skip Skip the PR/issue when compiling release notes Team:Fleet Team label for Observability Data Collection Fleet team v7.16.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants