Agent Configuration examples #15940

ph · 2020-01-29T18:51:40Z

This is a work in progress configuration:

Show how datasources works
Show inputs and streams
Show possible usage of multiples outputs
Show definition of endpoint
Show custom source and source generate from a package.

This is a work in progress configuration: - Show how datasources works - Show inputs and streams - Show possible usage of multiples outputs - Show definition of endpoint - Show custom source and source generate from a package.

elasticmachine · 2020-01-29T18:51:42Z

Pinging @elastic/ingest (Project:fleet)

ph · 2020-01-29T18:55:21Z

x-pack/agent/docs/agent_configuration_example.yml

+    # Is this something handled by fleet where we have an association between the ID and the creator
+    # of the datasource.
+    managed_by: fleet
+    # Package this config group is coming from. On importing, we know where it belongs


@ruflin the link between the datasource and the "managing" app was mentioned by @jen-huang, I think we wanted to make the link hidden using the ids but maybe making it explicit in the configuration is a good idea.

@jen-huang Had a chat with @ruflin and we do not need to have a managed_by defined in the datasource, actually that information should be in the package. This make sense if lets say we start making custom UI for specific datasources. Maybe a better UI for AWS services.

@ph So datasources created by Uptime, APM, Endpoint, etc will all specify a package? What if they don't use a package?

Yes. They will have to have a package even if it is only to define this linking. Otherwise we could hardcode it into our UI as these links should not change too often.

ph · 2020-01-29T19:14:05Z

cc @michalpristas and @jen-huang since you are impacted by this, still consider this as a work in progress.

- removed `managed_by` this should be a concern of the package. - Change the two endpoint suggestions

kevinlog · 2020-01-29T21:27:30Z

x-pack/agent/docs/agent_configuration_example.yml

+                detect: true
+                prevent: false
+                notify_user: false
+                threshold: recommended


I know this is an example, but some of these could change. For instance we've talked about dropping threshold, is this still the case @ferullo @stevewritescode

From an agent / fleet prespective, everything under input except type we don't know about and just forward. Any changes on this will not affect the agent itself.

kevinlog · 2020-01-29T21:29:25Z

x-pack/agent/docs/agent_configuration_example.yml

+
+#################################################################################################
+### suggestion 1
+ - id: myendpoint-x1


at first glance, I prefer suggestion 1 as I'm not sure we would maintain different configurations for different platforms. We'll make the entire thing atomic, so we'll send down some extra config and the Endpoint will know what to read.

thoughts @ferullo ?

EDIT: I think I misunderstood at first, it looks like they're just different layouts and all platforms are contained in the same config, sorry for the confusion.

kevinlog · 2020-01-29T21:33:34Z

x-pack/agent/docs/agent_configuration_example.yml

+   title: Endpoint configuration
+   package:
+     name: endpoint
+     version: xxx


We've discussed adding versions into the configuration that need to match up with Endpoint, otherwise, it will reject the configuration. Is that what this version implies? Will we be able to manage individual versions for each package? (i.e. endpoint, metric beat, etc)?

A similar conversation on versions here:
https://github.com/elastic/endpoint-app-team/issues/129#issuecomment-579130066

This version here is to know from which package and package version a config was created. This should help when we try to upgrade configs.

The part about limiting it to a specific agent version can be found here: https://github.com/elastic/beats/pull/15940/files#diff-043e80bbcc20fded11325ef31433396bR35

kevinlog · 2020-01-29T21:38:23Z

x-pack/agent/docs/agent_configuration_example.yml

+
+#################################################################################################
+### suggestion 2
+ - id: myendpoint-1


do the individual ids in here imply that we'll edit individual sections on the Agent configuration? Is there an overall Agent configuration ID that we'll use with Ingest API's edit the entire configuration?

I'm concerned about editing the configuration in pieces, it's preferable to send the entire configuration down to the Agent/Endpoint at once.

There will be an overall unique ID for a configuration in the Fleet API. These "local" ids can be generated and become useful especially on error reporting, so we can tell the user which part of the config did not work.

It is still the plan to send down the full agent config down to the agent at once.

kevinlog · 2020-01-29T21:38:50Z

FYI @elastic/endpoint-management @scunningham

michalpristas · 2020-01-30T07:06:24Z

x-pack/agent/docs/agent_configuration_example.yml

+    namespace?: prod
+    constraints:
+      # Contraints look are not final
+      - os.platform: { in: "windows" }


this should be ecs compliant?

Yes it should, I should say, that the constraints look is not final. but lets make sure the definition of them both work in the Agent and in the UI. It would be good to leverage that information to display more useful data to the user when they visualize and agent or a configuration.

ruflin · 2020-01-30T11:46:09Z

x-pack/agent/docs/agent_configuration_example.yml

+
+datasources:
+  # use the nginx package
+  - id: nginx-x1


@ph Lets keep an eye on the id and as soon as we start using it in the code somewhere have a discussion on how we use it.

Okay to not make it mandatory for now.

roncohen · 2020-01-30T10:52:11Z

x-pack/agent/docs/agent_configuration_example.yml

+settings.monitoring:
+  use_output: monitoring
+
+datasources:


should we just call this "sources"?

I would stick with datasources. Sources is more generic and would be correct too. We have already source in ECS. Also I heard datasource term used by people less familiar with the project to describe things like the "nginx datasource" so I think it fits well.

roncohen · 2020-01-30T10:53:18Z

x-pack/agent/docs/agent_configuration_example.yml

+datasources:
+  # use the nginx package
+  - id: nginx-x1
+    title: "This is a nice title for human"


"description" rather than "title"?

I was leaning toward "title" because I would that using description could be multiple lines where a title does not. but I do not have strong feeling to change it. Any opinion from the others?

roncohen · 2020-01-30T11:02:48Z

x-pack/agent/docs/agent_configuration_example.yml

+         - metricset: fsstat
+           dataset: system.fsstat
+         - metricset: foo
+           dataset: system.foo


this, unfortunately, needs to be simpler.
Could we set a default "dataset" in the input so users don't have to specify it? And then let them specify metricsets as an array?

- id: system # make optional if possible inputs: - type: metrics/system streams: - metricset: ["cpu", "memory", "diskio", "load", "process", "uptime", "filesystem"] # default dataset, period etc.

or a short form that gets converted automatically:

- inputs: - type: metrics/system metricsets: ["cpu", "memory", "diskio", "load", "process", "uptime", "filesystem"]

First, I think we trying to optimize for the UI experience vs the YAML edition experience.

If you look at the metricbeat module definition in beats they are do have settings, they are not just opt-in thing. I've used the system module because it's one of the bigger one and if you look at the metricset they do have settings. I do not find that obvious to change these options using the key/name approache.

But @ruflin do you have a strong opinion? I think there were also a reason concerning dataset for using the list.

cpu.metrics: ["percentages"] # The other available options are normalized_percentages and ticks. core.metrics: ["percentages"] # The other available option is ticks. #filesystem.ignore_types: [] #process.include_top_n: #enabled: true #by_cpu: 0 #by_memory: 0 #process.cmdline.cache.enabled: true #process.cgroups.enabled: true #process.env.whitelist: [] #process.include_cpu_ticks: false #raid.mount_point: '/' #socket.reverse_lookup.enabled: false #socket.reverse_lookup.success_ttl: 60s #socket.reverse_lookup.failure_ttl: 60s #diskio.include_devices: []

@roncohen What do you think about metrics module options as defined above? Where they fit in the shortened version?

My current thinking:

There is the long form which always works (as proposed)

There is a short form with one of the proposals above which is needs a bit of magic on the agent side to make it work with metricbeat.

The magical part can always be added later so I don't see it as a blocker. Talking about magic: I would also like to see the following working:

- id: system # make optional if possible inputs: - type: metrics/system

It is like enabling the system module today, it just picks the default metricsets which are defined in Metricbeat.

OK, we can add more defaults and a rewrite step later 👍

roncohen · 2020-01-30T12:01:16Z

x-pack/agent/docs/agent_configuration_example.yml

+     name: epm/endpoint # This establish the link with the package and will allow to link it to endpoint app.
+     version: xxx
+   inputs:
+     - type: endpoint # Reserved key word


@kevinlog do you think it makes sense to have a separate "protections" section or similar? I imagine that there are a number of config settings that aren't really "inputs" or "streams". From my perspective, we don't need to over-generalize this Agent config in the sense that everything must fit as a stream.

There's no reason why we need to make it all generic and the Agent oblivious to endgame - if that makes sense.

Maybe input is stretching the concept for the endpoint, but they still fit out definition they generate data, our requirements and reserved keywords are minimal everything else at the input level would be free form and passed directly to the endpoint. So they could have protections or anything they want at that level.

- id: myendpoint-1 title: Endpoint configuration package: name: epm/endpoint # This establish the link with the package and will allow to link it to endpoint app. version: xxx inputs: - type: endpoint # Reserved key word

ferullo

Thanks for an example!

ferullo · 2020-01-30T12:54:26Z

x-pack/agent/docs/agent_configuration_example.yml

+         threshold: recommended
+         platform: mac
+
+      - type: eventing


Is there down supposed to still be under streams or is what's above not supposed to be under streams? Half looks indented one space more than the other half.

Yes its supposed to be my vim-foo failed on the yaml format will fix it in the next round of changes.

ferullo · 2020-01-30T12:58:22Z

x-pack/agent/docs/agent_configuration_example.yml

+        platform: linux
+
+#################################################################################################
+### suggestion 2


Suggestion 2 seems cleaner to me for Endpoint. I like nesting the entire config for an OS under an OS section rather than intermingling them. This will make it easier for Endpoint to read the config for it's OS and also feels more in line with the fact that there are many features that are not available on all OSes,

I am OK with that.

ferullo · 2020-01-30T13:01:05Z

x-pack/agent/docs/agent_configuration_example.yml

@@ -0,0 +1,209 @@
+outputs:


What do default, long_term_storage and monitoring mean? Am I correct in assuming that when writing data to Elasticsearch that each datatype Beats/Agent/Endpoint write will be hardcoded to go to particular destination? Is there room for more outputs if Endpoint ends up needing another Endpoint-specific output? Is there any way to control what index is written to for an output?

I've discussed with @ferullo on zoom about this. but for clarity I will stil add part of the discussion here. The default, long_term_storage and monitoring are destination, in the initial version we would only support default and long_term_storage. When you define a datasource you can use the use_output option to defines to which named output to send the data.

So yes, if the endpoint need a special output destination it could be added.

Concerning the index, the index is not something that a user would be able to configure directly we do this to hide the complexity inherent to manage indices (templates mapping, ilm policies and dashboard) instead the agent is using the type of the input, the dataset and the defined namespace to generate the appropriate index destination. The only user editable part is the namespace, which is used as a grouping mechanism like "production" data.

I presume that endpoint should try to align with that indexing strategy and allow the user to set up the namespace maybe its something to discuss with @ruflin

ph · 2020-01-30T13:39:38Z

x-pack/agent/docs/agent_configuration_example.yml

+     name: endpoint
+     version: xxx
+   inputs:
+     - type: endpoint # Reserved key word


Add a namespace for endpoint and sync with @kevinlog

michalpristas · 2020-01-30T15:00:48Z

x-pack/agent/docs/agent_configuration_example.yml

+    package:
+      name: epm/nginx
+      version: 1.7.0
+    namespace?: prod


namespace is moved from stream to datasource, do you think it would be useful to have stream override of the namespace specified here?

We try as much as possible to not support overrides, the way would be to duplicate a datasource and set a another namespace.

Also, streams cannot override input settings.

+1 on keeping it on the top level. I think having a namespace per stream is not a common case. If a user needs it, he can specify the datasource twice.

x-pack/agent/docs/agent_configuration_example.yml

urso · 2020-01-30T16:42:47Z

Although we should restrict the usage of global processors, I wonder how processors like add_x_metadata fit in. These are often global, as these do enhance events based on the environment the Beat is run in. The add_host_metadata and add_cloud_metadata processors do query the host system on init time only. The add_docker_metadata and similar processors to enhance events based on events contents.

Do we actually want users to add any of the add_x_metadata processors to their configs? What if a datasource configured package is to be executed in different environment types (pure host and docker containers). Maybe the agent would add these envionrmental processors or even replace them with constants add_fields (processor) based on information the agent did query? In this case maybe the functionality might be 'hidden' behind an add_environment: <bool> setting, expecting the agent to do the right thing?

mostlyjason · 2020-01-30T17:29:27Z

Another setting to consider is adding the agent version for this configuration. This will allow future agent versions to make schema changes, and for us to provide an error if an agent is incompatible with the configuration version.

ph · 2020-01-30T18:09:48Z

Another setting to consider is adding the agent version for this configuration. This will allow future agent versions to make schema changes, and for us to provide an error if an agent is incompatible with the configuration version.

I think having a schema version (of lack of version means version is 1) is a good idea, it also depends how we do handle migration, if we assume that configuration is keep in fleet we can also assume that the following flow is true: update agent-> migrate configuration -> deploy configuration to agent. Which reduce agent to understand multiple version of a schema.

ph · 2020-01-30T18:21:05Z

@urso

Do we actually want users to add any of the add_x_metadata processors to their configs? What if a datasource configured package is to be executed in different environment types (pure host and docker containers). Maybe the agent would add these envionrmental processors or even replace them with constants add_fields (processor) based on information the agent did query? In this case maybe the functionality might be 'hidden' behind an add_environment: setting, expecting the agent to do the right thing?

When processors are globals or auto-added you are always at risk of having a conflict of fields (hey hostname case or source?) So I think if the solutions should be behind the package definition where we could either add the processors to the required processors by a streams or even have it at the datasource level.

add_environment: <bool> I think its a good suggestion if let's say that collecting the information is dependent on the running environment (like configured by a k8s streams of events?)

@ruflin seems like our talk concerning the processors should allow them to be defined at the datasource and maybe with a insert scope like processors.after or processor.before to control how the merging logic is applied.

Also is there anything preventing us to merge all the environment based processors to be merged into a single one?

michalpristas · 2020-02-06T10:57:24Z

x-pack/agent/docs/agent_configuration_example.yml

+      type: disk
+
+  long_term_storage:
+    api_key: VuaCfGcBCdbkQm-e5aOx:ui2lp2axTNmsyakw9tvNnw


are we assuming everything is elasticsearch or do we support type still

Lets keep the type support that we have, we are so use to target elasticsearch I think this felt into a crack. so type: elasticsearch

I just added the type to this PR.

michalpristas · 2020-02-07T12:10:43Z

x-pack/agent/docs/agent_configuration_example.yml

+          - id?: {id}
+            enabled?: true # default to true
+            dataset: nginx.acccess
+            paths: /var/log/nginx/access.log


we say paths but there is only one. is this an array or should this be a path?

Well no its should still be plural since we are using go-ucfg, a single string "/var/log" and a slice of strings are equivalent, if you use the following type to unpack. This is true for a lot of things in beats.

type C struct { Paths []string `config:"path"` }

in ucfg yes but in tests we're producing map[string]interface{} tree so i was wondering what's the intention here.

This is correct, but they will be reparsed after so that should not be a problem, or I am missing something?

no no you're not, i just wanted to have these trees in a form which reflects our intention. so double checking whether this is supposed to be multiple paths or single value as with metricset

- Elasticsearch (logs and metrics) - AWS (logs and metrics) - Kubernetes (metrics) - Docker (metrics)

ph · 2020-02-10T21:07:07Z

@ruflin I have added examples for Elasticsearc/Kubernetes/Docker and AWS.

Looking at Kubernetes, I wonder if they should be two different datasources.

ruflin

About Kubernetes: When @exekias created the K8s module we had many discussions should it be one or 2 modules. We ended up with 1 module because of the config if I remember correctly. It was always a bit odd as it was 1 module with 2 shared configs. But now with datasources and multiple inputs I think it makes much more sense:

1 Datasource: Kubernetes
3 inputs: logs, node metrics, state metrics

I would keep it the way you specified it with 3 inputs. I expect the user to configure it in one go.

@exekias I see at the end of the file we separate out the event metricset: https://github.com/elastic/beats/blob/master/metricbeat/module/kubernetes/_meta/config.yml#L46 I assume this is because it does not have a collection period. Should this go under the node metrics input or is this an additional input?

ruflin · 2020-02-11T07:38:27Z

x-pack/agent/docs/agent_configuration_example.yml

+            dataset: elasticsearch.slowlog
+            paths: [/var/log/elasticsearch/*_index_search_slowlog.log, /var/log/elasticsearch/*_index_indexing_slowlog.log]
+        - type: metrics/elasticsearch
+          hosts: ["http://localhost:9200"]


Can you add a few more common configs here like username, password, period, ssl.cerficate_authorities to make it more obvious the common configs are here? https://github.com/elastic/beats/blob/master/metricbeat/module/elasticsearch/_meta/config.reference.yml#L12

@ruflin I thought period was at the stream level and not on the common part of the input?

It is, skip it for now

We will probably need to come back to the question: Are there configs that can be specified on both levels like period as default and have a local period that overwrites it. Lets have this discussion later.

x-pack/agent/docs/agent_configuration_example.yml

ruflin · 2020-02-11T07:42:03Z

x-pack/agent/docs/agent_configuration_example.yml

+      name: epm/aws
+      version: 1.7.0
+    inputs:
+        # Looking at the AWS modules, I believe each fileset need to be in their own


If possible, I think we should make this 1 input with multiple streams. Why do you think it should be multiple inputs?

@ruflin From our previous discussion, inputs would contain hosts/ssl configuration. In that context looking at the filebeat's AWS module, the events are read from files inside an s3 bucket.

To fix that problem either:

We do what I've done, where multiple inputs are defined where each input have access to a different S3 bucket.

Or we allow the S3 input to ignore keys or consider keys that match a certain pattern. (This would require a bit of change in the s3 input, looking at the configuration its not supported)

We do have to think about the use-case, what it's more common? Use a single bucket for all the logs from AWS service or different buckets. I think because of policy/permission the different bucket route might be more popular.

I am no expert in AWS @kaiyan-sheng might know more here about the usage.

Is the queue url defining from where the events are fetched in the end? If yes, shouldn't this be on the stream level instead?

For AWS logs, I think better setup is to have different S3 buckets for different logs or different services. For example, one bucket for s3access log, one bucket for elb logs. Or one for vpc flow logs from VPC1 and one for vpc flow logs from VPC2. But all these buckets can have the same notification setup with the same SQS queue(if they are in the same region).

The queue url is not where the logs are fetched, it only points to the S3 bucket where the actual log needs to be read.

How does the input then now, which logs to pick and where to send it to?

@ruflin The input then uses the message from SQS queue to locate where the log is(in which s3 bucket and what's the name of the log) and read from S3.

I got that part. But it means, each dataset needs to have a different SQS queue as otherwise one dataset would pick up the logs from the others too.

@kaiyan-sheng Yes, it know where (bucket) and which (the file) but there is nothing I can see in the code that says: OK this file is actually a VPCflow.

So the way I am understand it is the AWS modules assumes that all the events coming from a specific SQS queue can only be the same type. I can be wrong but I don't see how disambiguation could happen.

Ahh sorry you are right! With different kinds of logs, they should be separated to different queues if we want to ingest them at the same time.

x-pack/agent/docs/agent_configuration_example.yml

ruflin · 2020-02-11T07:44:24Z

x-pack/agent/docs/agent_configuration_example.yml

+      name: epm/docker
+      version: 1.7.0
+    inputs:
+      - type: metrics/docker


Nit: I keep stumbling over the type. I would expect it to be docker/metrics. The reason is the order I think of collecting: Let me get some data from Docker and select only the metrics. I understand that from an agent perspective it is the other way around, select metricbeat and then run docker.

I don't have a strong feeling about metric before of after docker, I think your reasoning makes sense and might be more future proof if we do want to collect other thing. The mnemonic might be better.

I've made the change on the config.

x-pack/agent/docs/agent_configuration_example.yml

In elastic/beats#15940 datasources, inputs and streams are introduced into the agent config. To make it possible to configure these in the UI and through the API, some changes to the manifest definitions of a package and datasets are needed. **Package definition** **Further changes** * Rename `agent/input` to `agent/stream` as a stream is configured there.

In elastic/beats#15940 datasources, inputs and streams are introduced into the agent config. To make it possible to configure these in the UI and through the API, some changes to the manifest definitions of a package and datasets are needed. **Package manifest** Each package must specify the datasources it supports with the supported inputs inside. So far all the packages only support one datasource but I want to keep the door open for this to potentially change in the future. It also makes it possible to have the manifest config of a datasource be identical to the config which ends up in the agent config. The package manifest datasource definition looks as following (nginx example): ``` datasources: - # Do we need a name for the data source? name: nginx # List of inputs this datasource supports inputs: - # An id can be given, in case the type used here is not unique # This is for selection in the stream # id: nginx type: metrics/nginx # Common configuration options for this input vars: - name: hosts description: Nginx hosts default: ["http://127.0.0.1"] # All the config options that are required should be shown in the UI required: true - name: period description: "Collection period. Valid values: 10s, 5m, 2h" default: "10s" - name: username type: text - name: password # This is the html input type? type: password - type: logs # Common configuration options for this input vars: - type: syslog # Common configuration options for this input vars: ``` Inside the datasource, the supported inputs are specified with the common variables across all streams which use a certain input. In the UI I expect that we show the `required` configs by default and all the others are under "Advanced" or similar. **Dataset manifest** With the datasources and inputs defined on the package level, each dataset can specify which inputs it supports. Most datasets will only support one input for now. For the nginx metrics this looks as following: ``` inputs: - type: "metric/nginx" # Only the variables have to be repeated that are not specified as part of the input vars: # All variables are specified in the input already ``` As an example with supporting multiple inputs, we have the nginx error logs: ``` inputs: - type: log vars: - name: paths required: true default: - /var/log/nginx/error.log* os.darwin: - /usr/local/var/log/nginx/error.log* os.windows: - c:/programdata/nginx/logs/error.log* - type: syslog vars: # Are udp and tcp syslog input two different inputs? - name: protocol.udp.host required: true default: - "localhost:9000" ``` The log and syslog input are supported (not the case today, just an example). One the dataset level also all additional variables for this dataset are specified. The ones already specified on the input level in the package don't have to be specified again. **Stream definition** Now that the dataset has its supported inputs and variables defined, the stream can be defined. The stream defines which input it uses from the dataset and its configuration variables. Here an example for nginx metrics: ``` input: metrics/nginx metricsets: ["stubstatus"] period: {{period}} enabled: true hosts: {{hosts}} {{#if username}} username: "{{username}}" {{/if}} {{#if password}} password: "{{password}}" {{/if}} ``` During creation time of the stream config the variables from the datasource inputs and local variables from the dataset are filled in. A stream definition could also support multiple inputs as seen in the following example: ``` {{#if input == log}} input: log {{#each paths}} paths: "{{this}}" {{/each}} exclude_files: [".gz$"] processors: - add_locale: ~ {{/if}} {{#if input == syslog}} input: syslog {{/if}} ``` **Further changes** * Rename `agent/input` to `agent/stream` as a stream is configured there.

exekias · 2020-02-11T11:37:25Z

@exekias I see at the end of the file we separate out the event metricset: https://github.com/elastic/beats/blob/master/metricbeat/module/kubernetes/_meta/config.yml#L46 I assume this is because it does not have a collection period. Should this go under the node metrics input or is this an additional input?

This is node specific, but cluster wise. I would say it should be a new input

ph · 2020-02-11T14:32:58Z

@ruflin I think I've addressed all the concerns above?

ruflin

@ph I'm good with getting it in as is and have follow up conversation with smaller PRs.

ruflin · 2020-02-11T15:01:38Z

x-pack/agent/docs/agent_configuration_example.yml

+      name: epm/aws
+      version: 1.7.0
+    inputs:
+        # Looking at the AWS modules, I believe each fileset need to be in their own


Is the queue url defining from where the events are fetched in the end? If yes, shouldn't this be on the stream level instead?

x-pack/agent/docs/agent_configuration_example.yml

part.

ph · 2020-02-11T15:12:15Z

@ruflin Good, I've made the change for the AWS inputs vs stream and also added more credential for the aws/metrics.

ph · 2020-02-11T15:12:57Z

@ruflin I am merging this and lets make iteration on top of it.

@kaiyan-sheng

- Added period for the system/metrics - Remove a few slashes for dataset. - Move queue_url to the input level followed @kaiyan-sheng's [advice](elastic#15940 (comment)).

@kaiyan-sheng

* Agent tweaks - Added period for the system/metrics - Remove a few slashes for dataset. - Move queue_url to the input level followed @kaiyan-sheng's [advice](#15940 (comment)). * few issues with aws * Define queue on stream.

In elastic/beats#15940 datasources, inputs and streams are introduced into the agent config. To make it possible to configure these in the UI and through the API, some changes to the manifest definitions of a package and datasets are needed. **Package manifest** Each package must specify the datasources it supports with the supported inputs inside. So far all the packages only support one datasource but I want to keep the door open for this to potentially change in the future. It also makes it possible to have the manifest config of a datasource be identical to the config which ends up in the agent config. The package manifest datasource definition looks as following (nginx example): ``` datasources: - # Do we need a name for the data source? name: nginx # List of inputs this datasource supports inputs: - # An id can be given, in case the type used here is not unique # This is for selection in the stream # id: nginx type: metrics/nginx # Common configuration options for this input vars: - name: hosts description: Nginx hosts default: ["http://127.0.0.1"] # All the config options that are required should be shown in the UI required: true - name: period description: "Collection period. Valid values: 10s, 5m, 2h" default: "10s" - name: username type: text - name: password # This is the html input type? type: password - type: logs # Common configuration options for this input vars: - type: syslog # Common configuration options for this input vars: ``` Inside the datasource, the supported inputs are specified with the common variables across all streams which use a certain input. In the UI I expect that we show the `required` configs by default and all the others are under "Advanced" or similar. **Dataset manifest** With the datasources and inputs defined on the package level, each dataset can specify which inputs it supports. Most datasets will only support one input for now. For the nginx metrics this looks as following: ``` inputs: - type: "metric/nginx" # Only the variables have to be repeated that are not specified as part of the input vars: # All variables are specified in the input already ``` As an example with supporting multiple inputs, we have the nginx error logs: ``` inputs: - type: log vars: - name: paths required: true default: - /var/log/nginx/error.log* os.darwin: - /usr/local/var/log/nginx/error.log* os.windows: - c:/programdata/nginx/logs/error.log* - type: syslog vars: # Are udp and tcp syslog input two different inputs? - name: protocol.udp.host required: true default: - "localhost:9000" ``` The log and syslog input are supported (not the case today, just an example). One the dataset level also all additional variables for this dataset are specified. The ones already specified on the input level in the package don't have to be specified again. **Stream definition** Now that the dataset has its supported inputs and variables defined, the stream can be defined. The stream defines which input it uses from the dataset and its configuration variables. Here an example for nginx metrics: ``` input: metrics/nginx metricsets: ["stubstatus"] period: {{period}} enabled: true hosts: {{hosts}} {{#if username}} username: "{{username}}" {{/if}} {{#if password}} password: "{{password}}" {{/if}} ``` During creation time of the stream config the variables from the datasource inputs and local variables from the dataset are filled in. A stream definition could also support multiple inputs as seen in the following example: ``` {{#if input == log}} input: log {{#each paths}} paths: "{{this}}" {{/each}} exclude_files: [".gz$"] processors: - add_locale: ~ {{/if}} {{#if input == syslog}} input: syslog {{/if}} ``` **Further changes** * Rename `agent/input` to `agent/stream` as a stream is configured there.

* Agent Configuration examples This is a work in progress configuration: - Show how datasources works - Show inputs and streams - Show possible usage of multiples outputs - Show definition of endpoint - Show custom source and source generate from a package.

@kaiyan-sheng

* Agent tweaks - Added period for the system/metrics - Remove a few slashes for dataset. - Move queue_url to the input level followed @kaiyan-sheng's [advice](elastic#15940 (comment)). * few issues with aws * Define queue on stream.

Agent Configuration examples

706697a

This is a work in progress configuration: - Show how datasources works - Show inputs and streams - Show possible usage of multiples outputs - Show definition of endpoint - Show custom source and source generate from a package.

ph added review [zube]: In Review Project:fleet labels Jan 29, 2020

ph requested a review from ruflin January 29, 2020 18:51

ph added 2 commits January 29, 2020 13:53

add api_keys

f76a4a9

add a note concerning constraints

161a2b7

ph commented Jan 29, 2020

View reviewed changes

ph self-assigned this Jan 29, 2020

ph added [zube]: In Progress and removed [zube]: In Review labels Jan 29, 2020

small changes

00fa264

- removed `managed_by` this should be a concern of the package. - Change the two endpoint suggestions

kevinlog reviewed Jan 29, 2020

View reviewed changes

michalpristas reviewed Jan 30, 2020

View reviewed changes

ruflin reviewed Jan 30, 2020

View reviewed changes

roncohen reviewed Jan 30, 2020

View reviewed changes

ferullo reviewed Jan 30, 2020

View reviewed changes

ph commented Jan 30, 2020

View reviewed changes

michalpristas reviewed Jan 30, 2020

View reviewed changes

mostlyjason reviewed Jan 30, 2020

View reviewed changes

x-pack/agent/docs/agent_configuration_example.yml Show resolved Hide resolved

michalpristas reviewed Feb 6, 2020

View reviewed changes

Adding type: elasticsearch

cde9b93

michalpristas reviewed Feb 7, 2020

View reviewed changes

Add new package examples

3619fec

- Elasticsearch (logs and metrics) - AWS (logs and metrics) - Kubernetes (metrics) - Docker (metrics)

ruflin reviewed Feb 11, 2020

View reviewed changes

ruflin mentioned this pull request Feb 11, 2020

Introduce datasources in package to configure inputs and streams elastic/package-registry#212

Merged

ph added 3 commits February 11, 2020 08:58

Review changes before changeing metrics order for the type

dbcfc82

Change type

35a1aef

Fix the package

ac2c364

ph force-pushed the agent/configuration-example branch from ba0e747 to ac2c364 Compare February 11, 2020 14:32

ruflin approved these changes Feb 11, 2020

View reviewed changes

Fix the AWS inputs to streams added more credentials for the metric

6e10717

part.

ph merged commit 93a6f77 into elastic:fleet Feb 11, 2020

ph added a commit to ph/beats that referenced this pull request Feb 11, 2020

Agent tweaks

ea2904b

- Added period for the system/metrics - Remove a few slashes for dataset. - Move queue_url to the input level followed @kaiyan-sheng's [advice](elastic#15940 (comment)).

ph mentioned this pull request Feb 11, 2020

Agent configuration tweaks #16240

Merged

michalpristas mentioned this pull request Feb 28, 2020

[Fleet] Support new configuration format elastic/kibana#58874

Closed

ph mentioned this pull request Mar 7, 2022

[Agent] Configuration validations on Fleet and Agent side elastic/elastic-agent#157

Open

Agent Configuration examples #15940

Agent Configuration examples #15940

Conversation

ph commented Jan 29, 2020

elasticmachine commented Jan 29, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ph commented Jan 29, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevinlog Jan 29, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevinlog commented Jan 29, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ph Jan 30, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ph Jan 30, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ferullo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ph Jan 30, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

urso commented Jan 30, 2020

mostlyjason commented Jan 30, 2020

ph commented Jan 30, 2020

ph commented Jan 30, 2020

Choose a reason for hiding this comment

ph Feb 6, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ph commented Feb 10, 2020

ruflin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevinlog Jan 29, 2020 •

edited

ph Jan 30, 2020 •

edited

ph Jan 30, 2020 •

edited

ph Jan 30, 2020 •

edited

ph Feb 6, 2020 •

edited