Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agent Configuration examples #15940

Merged
merged 13 commits into from
Feb 11, 2020
209 changes: 209 additions & 0 deletions x-pack/agent/docs/agent_configuration_example.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,209 @@
outputs:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do default, long_term_storage and monitoring mean? Am I correct in assuming that when writing data to Elasticsearch that each datatype Beats/Agent/Endpoint write will be hardcoded to go to particular destination? Is there room for more outputs if Endpoint ends up needing another Endpoint-specific output? Is there any way to control what index is written to for an output?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've discussed with @ferullo on zoom about this. but for clarity I will stil add part of the discussion here. The default, long_term_storage and monitoring are destination, in the initial version we would only support default and long_term_storage. When you define a datasource you can use the use_output option to defines to which named output to send the data.

So yes, if the endpoint need a special output destination it could be added.

Concerning the index, the index is not something that a user would be able to configure directly we do this to hide the complexity inherent to manage indices (templates mapping, ilm policies and dashboard) instead the agent is using the type of the input, the dataset and the defined namespace to generate the appropriate index destination. The only user editable part is the namespace, which is used as a grouping mechanism like "production" data.

I presume that endpoint should try to align with that indexing strategy and allow the user to set up the namespace maybe its something to discuss with @ruflin

default:
api_key: VuaCfGcBCdbkQm-e5aOx:ui2lp2axTNmsyakw9tvNnw
hosts: ["localhost:9200"]
mostlyjason marked this conversation as resolved.
Show resolved Hide resolved
ca_sha256: "7lHLiyp4J8m9kw38SJ7SURJP4bXRZv/BNxyyXkCcE/M="
# Not supported at first
queue:
type: disk

long_term_storage:
api_key: VuaCfGcBCdbkQm-e5aOx:ui2lp2axTNmsyakw9tvNnw
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we assuming everything is elasticsearch or do we support type still

Copy link
Contributor Author

@ph ph Feb 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets keep the type support that we have, we are so use to target elasticsearch I think this felt into a crack. so type: elasticsearch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added the type to this PR.

hosts: ["localhost:9200"]
ca_sha256: "7lHLiyp4J8m9kw38SJ7SURJP4bXRZv/BNxyyXkCcE/M="
queue:
type: disk

monitoring:
api_key: VuaCfGcBCdbkQm-e5aOx:ui2lp2axTNmsyakw9tvNnw
hosts: ["localhost:9200"]
ca_sha256: "7lHLiyp4J8m9kw38SJ7SURJP4bXRZv/BNxyyXkCcE/M="

settings.monitoring:
use_output: monitoring

datasources:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we just call this "sources"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would stick with datasources. Sources is more generic and would be correct too. We have already source in ECS. Also I heard datasource term used by people less familiar with the project to describe things like the "nginx datasource" so I think it fits well.

# use the nginx package
- id: nginx-x1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ph Lets keep an eye on the id and as soon as we start using it in the code somewhere have a discussion on how we use it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay to not make it mandatory for now.

title: "This is a nice title for human"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"description" rather than "title"?

Copy link
Contributor Author

@ph ph Jan 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was leaning toward "title" because I would that using description could be multiple lines where a title does not. but I do not have strong feeling to change it. Any opinion from the others?

# Package this config group is coming from. On importing, we know where it belongs
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ruflin the link between the datasource and the "managing" app was mentioned by @jen-huang, I think we wanted to make the link hidden using the ids but maybe making it explicit in the configuration is a good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jen-huang Had a chat with @ruflin and we do not need to have a managed_by defined in the datasource, actually that information should be in the package. This make sense if lets say we start making custom UI for specific datasources. Maybe a better UI for AWS services.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ph So datasources created by Uptime, APM, Endpoint, etc will all specify a package? What if they don't use a package?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. They will have to have a package even if it is only to define this linking. Otherwise we could hardcode it into our UI as these links should not change too often.

# The package tells the UI which application to link to
package:
name: epm/nginx
version: 1.7.0
namespace?: prod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

namespace is moved from stream to datasource, do you think it would be useful to have stream override of the namespace specified here?

Copy link
Contributor Author

@ph ph Jan 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We try as much as possible to not support overrides, the way would be to duplicate a datasource and set a another namespace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, streams cannot override input settings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on keeping it on the top level. I think having a namespace per stream is not a common case. If a user needs it, he can specify the datasource twice.

constraints:
# Contraints look are not final
- os.platform: { in: "windows" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be ecs compliant?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YES!

Copy link
Contributor Author

@ph ph Jan 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it should, I should say, that the constraints look is not final. but lets make sure the definition of them both work in the Agent and in the UI. It would be good to leverage that information to display more useful data to the user when they visualize and agent or a configuration.

- agent.version: { ">=": "8.0.0" }
inputs:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Endpoint also has a requirement for a datasource to be stopped or uninstalled. Maybe we need to add a key for state like "running," "stopped", "uninstalled," etc?

https://github.com/elastic/endpoint-app-team/issues/129

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when datasource is not needed it should not be part of the config. so endpoint or anything else should be able to detect that it's running something it's not supposed to

Copy link
Contributor Author

@ph ph Jan 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michalpristas @mostlyjason I wasn't clear in the configuration I should have added an optional enabled key at the datasource level, I have an example for a stream

So If the following happens, enabled default to true, this is indeed common or used behavior for existing beats. If its explicit set to false we disable either the stream or the complete datasource. Think about it as a table with a checkbox in the UI.

Concerning uninstalling, as @michalpristas said this is different than a toggle key and should not be present in the configuration.

- type: logs
processors?:
streams:
- id?: {id}
dataset: nginx.acccess
paths: /var/log/nginx/access.log
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we say paths but there is only one. is this an array or should this be a path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well no its should still be plural since we are using go-ucfg, a single string "/var/log" and a slice of strings are equivalent, if you use the following type to unpack. This is true for a lot of things in beats.

type C struct {
 Paths []string `config:"path"`
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in ucfg yes but in tests we're producing map[string]interface{} tree so i was wondering what's the intention here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct, but they will be reparsed after so that should not be a problem, or I am missing something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no no you're not, i just wanted to have these trees in a form which reflects our intention. so double checking whether this is supposed to be multiple paths or single value as with metricset

- id?: {id}
dataset: nginx.error
paths: /var/log/nginx/error.log
- type: metrics/nginx
streams:
- id?: {id}
dataset: nginx.stub_status
metricset: stub_status

#################################################################################################
# Custom Kafka datasource
- id: kafka-x1
title: "Consume data from kafka"
namespace?: prod
use_output: long_term_storage
inputs:
- type: kafka
host: localhost:6566
streams:
- dataset: foo.dataset
topic: foo
processors:
- extract_bro_specifics


#################################################################################################
# System EPM package
- id: system
title: Collect system information and metrics
package:
name: epm/system
version: 1.7.0
inputs:
- type: metrics/system
streams:
- id?: {id}
enabled?: false # default true
metricset: cpu
period: 10s
dataset: system.cpu
metrics: ["percentages", "normalized_percentages"]
- metricset: memory
dataset: system.memory
- metricset: diskio
dataset: system.diskio
- metricset: load
dataset: system.load
- metricset: memory
dataset: system.memory
- metricset: process
dataset: system.process
processes: ["firefox*"]
- metricset: process_summary
dataset: system.process_summary
- metricset: uptime
dataset: system.uptime
- metricset: socket_summary
dataset: system.socket_summary
- metricset: filesystem
dataset: system.filesystem
- metricset: raid
dataset: system.raid
- metricset: socket
dataset: system.socket
- metricset: service
dataset: system.service
- metricset: fsstat
dataset: system.fsstat
- metricset: foo
dataset: system.foo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this, unfortunately, needs to be simpler.
Could we set a default "dataset" in the input so users don't have to specify it? And then let them specify metricsets as an array?

- id: system  # make optional if possible
  inputs:
    - type: metrics/system
      streams:
       - metricset: ["cpu", "memory", "diskio", "load", "process", "uptime", "filesystem"]
          # default dataset, period etc.

or a short form that gets converted automatically:

- inputs:
    - type: metrics/system
      metricsets: ["cpu", "memory", "diskio", "load", "process", "uptime", "filesystem"]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, I think we trying to optimize for the UI experience vs the YAML edition experience.

If you look at the metricbeat module definition in beats they are do have settings, they are not just opt-in thing. I've used the system module because it's one of the bigger one and if you look at the metricset they do have settings. I do not find that obvious to change these options using the key/name approache.

But @ruflin do you have a strong opinion? I think there were also a reason concerning dataset for using the list.


  cpu.metrics:  ["percentages"]  # The other available options are normalized_percentages and ticks.
  core.metrics: ["percentages"]  # The other available option is ticks.
  #filesystem.ignore_types: []
  #process.include_top_n:
    #enabled: true
    #by_cpu: 0
    #by_memory: 0
  #process.cmdline.cache.enabled: true
  #process.cgroups.enabled: true
  #process.env.whitelist: []
  #process.include_cpu_ticks: false
  #raid.mount_point: '/'

  #socket.reverse_lookup.enabled: false
  #socket.reverse_lookup.success_ttl: 60s
  #socket.reverse_lookup.failure_ttl: 60s
  #diskio.include_devices: []

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roncohen What do you think about metrics module options as defined above? Where they fit in the shortened version?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My current thinking:

  • There is the long form which always works (as proposed)
  • There is a short form with one of the proposals above which is needs a bit of magic on the agent side to make it work with metricbeat.

The magical part can always be added later so I don't see it as a blocker. Talking about magic: I would also like to see the following working:

- id: system  # make optional if possible
  inputs:
    - type: metrics/system

It is like enabling the system module today, it just picks the default metricsets which are defined in Metricbeat.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, we can add more defaults and a rewrite step later 👍


#################################################################################################
### suggestion 1
- id: myendpoint-x1
Copy link

@kevinlog kevinlog Jan 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at first glance, I prefer suggestion 1 as I'm not sure we would maintain different configurations for different platforms. We'll make the entire thing atomic, so we'll send down some extra config and the Endpoint will know what to read.

thoughts @ferullo ?

EDIT: I think I misunderstood at first, it looks like they're just different layouts and all platforms are contained in the same config, sorry for the confusion.

title: Endpoint configuration
package:
name: endpoint
version: xxx

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've discussed adding versions into the configuration that need to match up with Endpoint, otherwise, it will reject the configuration. Is that what this version implies? Will we be able to manage individual versions for each package? (i.e. endpoint, metric beat, etc)?

A similar conversation on versions here:
https://github.com/elastic/endpoint-app-team/issues/129#issuecomment-579130066

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This version here is to know from which package and package version a config was created. This should help when we try to upgrade configs.

The part about limiting it to a specific agent version can be found here: https://github.com/elastic/beats/pull/15940/files#diff-043e80bbcc20fded11325ef31433396bR35

inputs:
- type: endpoint # Reserved key word
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a namespace for endpoint and sync with @kevinlog

streams:
- type: malware
detect: true
prevent: false
notify_user: false
threshold: recommended
platform: windows

- type: eventing
api: true
clr: false
dll_and_driver_load: false
dns: true
file: false
platform: windows

- type: malware
detect: true
prevent: false
notify_user: false
threshold: recommended
platform: mac

- type: eventing
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there down supposed to still be under streams or is what's above not supposed to be under streams? Half looks indented one space more than the other half.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes its supposed to be my vim-foo failed on the yaml format will fix it in the next round of changes.

api: true
clr: false
dll_and_driver_load: false
dns: true
file: false
platform: mac

- type: malware
detect: true
prevent: false
notify_user: false
threshold: recommended
platform: linux

- type: eventing
api: true
clr: false
dll_and_driver_load: false
dns: true
file: false
platform: linux

#################################################################################################
### suggestion 2
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion 2 seems cleaner to me for Endpoint. I like nesting the entire config for an OS under an OS section rather than intermingling them. This will make it easier for Endpoint to read the config for it's OS and also feels more in line with the fact that there are many features that are not available on all OSes,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am OK with that.

- id: myendpoint-1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do the individual ids in here imply that we'll edit individual sections on the Agent configuration? Is there an overall Agent configuration ID that we'll use with Ingest API's edit the entire configuration?

I'm concerned about editing the configuration in pieces, it's preferable to send the entire configuration down to the Agent/Endpoint at once.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be an overall unique ID for a configuration in the Fleet API. These "local" ids can be generated and become useful especially on error reporting, so we can tell the user which part of the config did not work.

It is still the plan to send down the full agent config down to the agent at once.

title: Endpoint configuration
package:
name: epm/endpoint # This establish the link with the package and will allow to link it to endpoint app.
version: xxx
inputs:
- type: endpoint # Reserved key word
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinlog do you think it makes sense to have a separate "protections" section or similar? I imagine that there are a number of config settings that aren't really "inputs" or "streams". From my perspective, we don't need to over-generalize this Agent config in the sense that everything must fit as a stream.

There's no reason why we need to make it all generic and the Agent oblivious to endgame - if that makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe input is stretching the concept for the endpoint, but they still fit out definition they generate data, our requirements and reserved keywords are minimal everything else at the input level would be free form and passed directly to the endpoint. So they could have protections or anything they want at that level.

- id: myendpoint-1
  title: Endpoint configuration
   package:
     name: epm/endpoint # This establish the link with the package and will allow to link it to endpoint app.
     version: xxx
   inputs:
     - type: endpoint # Reserved key word

windows:
eventing:
api: true
clr: false
dll_and_driver_load: false
dns: true
...
file: false
malware:
detect: true
prevent: false
notify_user: false
threshold: recommended

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is an example, but some of these could change. For instance we've talked about dropping threshold, is this still the case @ferullo @stevewritescode

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From an agent / fleet prespective, everything under input except type we don't know about and just forward. Any changes on this will not affect the agent itself.

mac:
eventing:
file: true
network: false
process: false
...
malware:
detect: true
prevent: false
notify_user: false
threshold: recommended
linux:
eventing:
file: true
network: false
process: false