Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agent Configuration examples #15940

Merged
merged 13 commits into from
Feb 11, 2020
Merged

Agent Configuration examples #15940

merged 13 commits into from
Feb 11, 2020

Conversation

ph
Copy link
Contributor

@ph ph commented Jan 29, 2020

This is a work in progress configuration:

  • Show how datasources works
  • Show inputs and streams
  • Show possible usage of multiples outputs
  • Show definition of endpoint
  • Show custom source and source generate from a package.

This is a work in progress configuration:

- Show how datasources works
- Show inputs and streams
- Show possible usage of multiples outputs
- Show definition of endpoint
- Show custom source and source generate from a package.
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ingest (Project:fleet)

# Is this something handled by fleet where we have an association between the ID and the creator
# of the datasource.
managed_by: fleet
# Package this config group is coming from. On importing, we know where it belongs
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ruflin the link between the datasource and the "managing" app was mentioned by @jen-huang, I think we wanted to make the link hidden using the ids but maybe making it explicit in the configuration is a good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jen-huang Had a chat with @ruflin and we do not need to have a managed_by defined in the datasource, actually that information should be in the package. This make sense if lets say we start making custom UI for specific datasources. Maybe a better UI for AWS services.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ph So datasources created by Uptime, APM, Endpoint, etc will all specify a package? What if they don't use a package?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. They will have to have a package even if it is only to define this linking. Otherwise we could hardcode it into our UI as these links should not change too often.

@ph ph self-assigned this Jan 29, 2020
@ph
Copy link
Contributor Author

ph commented Jan 29, 2020

cc @michalpristas and @jen-huang since you are impacted by this, still consider this as a work in progress.

- removed `managed_by` this should be a concern of the package.
- Change the two endpoint suggestions
detect: true
prevent: false
notify_user: false
threshold: recommended

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is an example, but some of these could change. For instance we've talked about dropping threshold, is this still the case @ferullo @stevewritescode

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From an agent / fleet prespective, everything under input except type we don't know about and just forward. Any changes on this will not affect the agent itself.


#################################################################################################
### suggestion 1
- id: myendpoint-x1
Copy link

@kevinlog kevinlog Jan 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at first glance, I prefer suggestion 1 as I'm not sure we would maintain different configurations for different platforms. We'll make the entire thing atomic, so we'll send down some extra config and the Endpoint will know what to read.

thoughts @ferullo ?

EDIT: I think I misunderstood at first, it looks like they're just different layouts and all platforms are contained in the same config, sorry for the confusion.

title: Endpoint configuration
package:
name: endpoint
version: xxx

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've discussed adding versions into the configuration that need to match up with Endpoint, otherwise, it will reject the configuration. Is that what this version implies? Will we be able to manage individual versions for each package? (i.e. endpoint, metric beat, etc)?

A similar conversation on versions here:
https://github.com/elastic/endpoint-app-team/issues/129#issuecomment-579130066

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This version here is to know from which package and package version a config was created. This should help when we try to upgrade configs.

The part about limiting it to a specific agent version can be found here: https://github.com/elastic/beats/pull/15940/files#diff-043e80bbcc20fded11325ef31433396bR35


#################################################################################################
### suggestion 2
- id: myendpoint-1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do the individual ids in here imply that we'll edit individual sections on the Agent configuration? Is there an overall Agent configuration ID that we'll use with Ingest API's edit the entire configuration?

I'm concerned about editing the configuration in pieces, it's preferable to send the entire configuration down to the Agent/Endpoint at once.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be an overall unique ID for a configuration in the Fleet API. These "local" ids can be generated and become useful especially on error reporting, so we can tell the user which part of the config did not work.

It is still the plan to send down the full agent config down to the agent at once.

@kevinlog
Copy link

FYI @elastic/endpoint-management @scunningham

namespace?: prod
constraints:
# Contraints look are not final
- os.platform: { in: "windows" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be ecs compliant?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YES!

Copy link
Contributor Author

@ph ph Jan 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it should, I should say, that the constraints look is not final. but lets make sure the definition of them both work in the Agent and in the UI. It would be good to leverage that information to display more useful data to the user when they visualize and agent or a configuration.


datasources:
# use the nginx package
- id: nginx-x1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ph Lets keep an eye on the id and as soon as we start using it in the code somewhere have a discussion on how we use it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay to not make it mandatory for now.

settings.monitoring:
use_output: monitoring

datasources:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we just call this "sources"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would stick with datasources. Sources is more generic and would be correct too. We have already source in ECS. Also I heard datasource term used by people less familiar with the project to describe things like the "nginx datasource" so I think it fits well.

datasources:
# use the nginx package
- id: nginx-x1
title: "This is a nice title for human"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"description" rather than "title"?

Copy link
Contributor Author

@ph ph Jan 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was leaning toward "title" because I would that using description could be multiple lines where a title does not. but I do not have strong feeling to change it. Any opinion from the others?

- metricset: fsstat
dataset: system.fsstat
- metricset: foo
dataset: system.foo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this, unfortunately, needs to be simpler.
Could we set a default "dataset" in the input so users don't have to specify it? And then let them specify metricsets as an array?

- id: system  # make optional if possible
  inputs:
    - type: metrics/system
      streams:
       - metricset: ["cpu", "memory", "diskio", "load", "process", "uptime", "filesystem"]
          # default dataset, period etc.

or a short form that gets converted automatically:

- inputs:
    - type: metrics/system
      metricsets: ["cpu", "memory", "diskio", "load", "process", "uptime", "filesystem"]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, I think we trying to optimize for the UI experience vs the YAML edition experience.

If you look at the metricbeat module definition in beats they are do have settings, they are not just opt-in thing. I've used the system module because it's one of the bigger one and if you look at the metricset they do have settings. I do not find that obvious to change these options using the key/name approache.

But @ruflin do you have a strong opinion? I think there were also a reason concerning dataset for using the list.


  cpu.metrics:  ["percentages"]  # The other available options are normalized_percentages and ticks.
  core.metrics: ["percentages"]  # The other available option is ticks.
  #filesystem.ignore_types: []
  #process.include_top_n:
    #enabled: true
    #by_cpu: 0
    #by_memory: 0
  #process.cmdline.cache.enabled: true
  #process.cgroups.enabled: true
  #process.env.whitelist: []
  #process.include_cpu_ticks: false
  #raid.mount_point: '/'

  #socket.reverse_lookup.enabled: false
  #socket.reverse_lookup.success_ttl: 60s
  #socket.reverse_lookup.failure_ttl: 60s
  #diskio.include_devices: []

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roncohen What do you think about metrics module options as defined above? Where they fit in the shortened version?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My current thinking:

  • There is the long form which always works (as proposed)
  • There is a short form with one of the proposals above which is needs a bit of magic on the agent side to make it work with metricbeat.

The magical part can always be added later so I don't see it as a blocker. Talking about magic: I would also like to see the following working:

- id: system  # make optional if possible
  inputs:
    - type: metrics/system

It is like enabling the system module today, it just picks the default metricsets which are defined in Metricbeat.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, we can add more defaults and a rewrite step later 👍

name: epm/endpoint # This establish the link with the package and will allow to link it to endpoint app.
version: xxx
inputs:
- type: endpoint # Reserved key word
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinlog do you think it makes sense to have a separate "protections" section or similar? I imagine that there are a number of config settings that aren't really "inputs" or "streams". From my perspective, we don't need to over-generalize this Agent config in the sense that everything must fit as a stream.

There's no reason why we need to make it all generic and the Agent oblivious to endgame - if that makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe input is stretching the concept for the endpoint, but they still fit out definition they generate data, our requirements and reserved keywords are minimal everything else at the input level would be free form and passed directly to the endpoint. So they could have protections or anything they want at that level.

- id: myendpoint-1
  title: Endpoint configuration
   package:
     name: epm/endpoint # This establish the link with the package and will allow to link it to endpoint app.
     version: xxx
   inputs:
     - type: endpoint # Reserved key word

Copy link

@ferullo ferullo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for an example!

threshold: recommended
platform: mac

- type: eventing
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there down supposed to still be under streams or is what's above not supposed to be under streams? Half looks indented one space more than the other half.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes its supposed to be my vim-foo failed on the yaml format will fix it in the next round of changes.

platform: linux

#################################################################################################
### suggestion 2
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion 2 seems cleaner to me for Endpoint. I like nesting the entire config for an OS under an OS section rather than intermingling them. This will make it easier for Endpoint to read the config for it's OS and also feels more in line with the fact that there are many features that are not available on all OSes,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am OK with that.

@@ -0,0 +1,209 @@
outputs:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do default, long_term_storage and monitoring mean? Am I correct in assuming that when writing data to Elasticsearch that each datatype Beats/Agent/Endpoint write will be hardcoded to go to particular destination? Is there room for more outputs if Endpoint ends up needing another Endpoint-specific output? Is there any way to control what index is written to for an output?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've discussed with @ferullo on zoom about this. but for clarity I will stil add part of the discussion here. The default, long_term_storage and monitoring are destination, in the initial version we would only support default and long_term_storage. When you define a datasource you can use the use_output option to defines to which named output to send the data.

So yes, if the endpoint need a special output destination it could be added.

Concerning the index, the index is not something that a user would be able to configure directly we do this to hide the complexity inherent to manage indices (templates mapping, ilm policies and dashboard) instead the agent is using the type of the input, the dataset and the defined namespace to generate the appropriate index destination. The only user editable part is the namespace, which is used as a grouping mechanism like "production" data.

I presume that endpoint should try to align with that indexing strategy and allow the user to set up the namespace maybe its something to discuss with @ruflin

name: endpoint
version: xxx
inputs:
- type: endpoint # Reserved key word
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a namespace for endpoint and sync with @kevinlog

package:
name: epm/nginx
version: 1.7.0
namespace?: prod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

namespace is moved from stream to datasource, do you think it would be useful to have stream override of the namespace specified here?

Copy link
Contributor Author

@ph ph Jan 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We try as much as possible to not support overrides, the way would be to duplicate a datasource and set a another namespace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, streams cannot override input settings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on keeping it on the top level. I think having a namespace per stream is not a common case. If a user needs it, he can specify the datasource twice.

@urso
Copy link

urso commented Jan 30, 2020

Although we should restrict the usage of global processors, I wonder how processors like add_x_metadata fit in. These are often global, as these do enhance events based on the environment the Beat is run in. The add_host_metadata and add_cloud_metadata processors do query the host system on init time only. The add_docker_metadata and similar processors to enhance events based on events contents.

Do we actually want users to add any of the add_x_metadata processors to their configs? What if a datasource configured package is to be executed in different environment types (pure host and docker containers). Maybe the agent would add these envionrmental processors or even replace them with constants add_fields (processor) based on information the agent did query? In this case maybe the functionality might be 'hidden' behind an add_environment: <bool> setting, expecting the agent to do the right thing?

@mostlyjason
Copy link

Another setting to consider is adding the agent version for this configuration. This will allow future agent versions to make schema changes, and for us to provide an error if an agent is incompatible with the configuration version.

@ph
Copy link
Contributor Author

ph commented Jan 30, 2020

Another setting to consider is adding the agent version for this configuration. This will allow future agent versions to make schema changes, and for us to provide an error if an agent is incompatible with the configuration version.

I think having a schema version (of lack of version means version is 1) is a good idea, it also depends how we do handle migration, if we assume that configuration is keep in fleet we can also assume that the following flow is true: update agent-> migrate configuration -> deploy configuration to agent. Which reduce agent to understand multiple version of a schema.

@ph
Copy link
Contributor Author

ph commented Jan 30, 2020

@urso

Do we actually want users to add any of the add_x_metadata processors to their configs? What if a datasource configured package is to be executed in different environment types (pure host and docker containers). Maybe the agent would add these envionrmental processors or even replace them with constants add_fields (processor) based on information the agent did query? In this case maybe the functionality might be 'hidden' behind an add_environment: setting, expecting the agent to do the right thing?

When processors are globals or auto-added you are always at risk of having a conflict of fields (hey hostname case or source?) So I think if the solutions should be behind the package definition where we could either add the processors to the required processors by a streams or even have it at the datasource level.

add_environment: <bool> I think its a good suggestion if let's say that collecting the information is dependent on the running environment (like configured by a k8s streams of events?)

@ruflin seems like our talk concerning the processors should allow them to be defined at the datasource and maybe with a insert scope like processors.after or processor.before to control how the merging logic is applied.

Also is there anything preventing us to merge all the environment based processors to be merged into a single one?

type: disk

long_term_storage:
api_key: VuaCfGcBCdbkQm-e5aOx:ui2lp2axTNmsyakw9tvNnw
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we assuming everything is elasticsearch or do we support type still

Copy link
Contributor Author

@ph ph Feb 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets keep the type support that we have, we are so use to target elasticsearch I think this felt into a crack. so type: elasticsearch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added the type to this PR.

- id?: {id}
enabled?: true # default to true
dataset: nginx.acccess
paths: /var/log/nginx/access.log
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we say paths but there is only one. is this an array or should this be a path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well no its should still be plural since we are using go-ucfg, a single string "/var/log" and a slice of strings are equivalent, if you use the following type to unpack. This is true for a lot of things in beats.

type C struct {
 Paths []string `config:"path"`
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in ucfg yes but in tests we're producing map[string]interface{} tree so i was wondering what's the intention here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct, but they will be reparsed after so that should not be a problem, or I am missing something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no no you're not, i just wanted to have these trees in a form which reflects our intention. so double checking whether this is supposed to be multiple paths or single value as with metricset

- Elasticsearch (logs and metrics)
- AWS (logs and metrics)
- Kubernetes (metrics)
- Docker (metrics)
@ph
Copy link
Contributor Author

ph commented Feb 10, 2020

@ruflin I have added examples for Elasticsearc/Kubernetes/Docker and AWS.

Looking at Kubernetes, I wonder if they should be two different datasources.

Copy link
Member

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About Kubernetes: When @exekias created the K8s module we had many discussions should it be one or 2 modules. We ended up with 1 module because of the config if I remember correctly. It was always a bit odd as it was 1 module with 2 shared configs. But now with datasources and multiple inputs I think it makes much more sense:

  • 1 Datasource: Kubernetes
  • 3 inputs: logs, node metrics, state metrics

I would keep it the way you specified it with 3 inputs. I expect the user to configure it in one go.

@exekias I see at the end of the file we separate out the event metricset: https://github.com/elastic/beats/blob/master/metricbeat/module/kubernetes/_meta/config.yml#L46 I assume this is because it does not have a collection period. Should this go under the node metrics input or is this an additional input?

dataset: elasticsearch.slowlog
paths: [/var/log/elasticsearch/*_index_search_slowlog.log, /var/log/elasticsearch/*_index_indexing_slowlog.log]
- type: metrics/elasticsearch
hosts: ["http://localhost:9200"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a few more common configs here like username, password, period, ssl.cerficate_authorities to make it more obvious the common configs are here? https://github.com/elastic/beats/blob/master/metricbeat/module/elasticsearch/_meta/config.reference.yml#L12

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ruflin I thought period was at the stream level and not on the common part of the input?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is, skip it for now

We will probably need to come back to the question: Are there configs that can be specified on both levels like period as default and have a local period that overwrites it. Lets have this discussion later.

x-pack/agent/docs/agent_configuration_example.yml Outdated Show resolved Hide resolved
name: epm/aws
version: 1.7.0
inputs:
# Looking at the AWS modules, I believe each fileset need to be in their own
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, I think we should make this 1 input with multiple streams. Why do you think it should be multiple inputs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ruflin From our previous discussion, inputs would contain hosts/ssl configuration. In that context looking at the filebeat's AWS module, the events are read from files inside an s3 bucket.

To fix that problem either:

  1. We do what I've done, where multiple inputs are defined where each input have access to a different S3 bucket.
  2. Or we allow the S3 input to ignore keys or consider keys that match a certain pattern. (This would require a bit of change in the s3 input, looking at the configuration its not supported)

We do have to think about the use-case, what it's more common? Use a single bucket for all the logs from AWS service or different buckets. I think because of policy/permission the different bucket route might be more popular.

I am no expert in AWS @kaiyan-sheng might know more here about the usage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the queue url defining from where the events are fetched in the end? If yes, shouldn't this be on the stream level instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For AWS logs, I think better setup is to have different S3 buckets for different logs or different services. For example, one bucket for s3access log, one bucket for elb logs. Or one for vpc flow logs from VPC1 and one for vpc flow logs from VPC2. But all these buckets can have the same notification setup with the same SQS queue(if they are in the same region).

The queue url is not where the logs are fetched, it only points to the S3 bucket where the actual log needs to be read.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the input then now, which logs to pick and where to send it to?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ruflin The input then uses the message from SQS queue to locate where the log is(in which s3 bucket and what's the name of the log) and read from S3.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got that part. But it means, each dataset needs to have a different SQS queue as otherwise one dataset would pick up the logs from the others too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kaiyan-sheng Yes, it know where (bucket) and which (the file) but there is nothing I can see in the code that says: OK this file is actually a VPCflow.

So the way I am understand it is the AWS modules assumes that all the events coming from a specific SQS queue can only be the same type. I can be wrong but I don't see how disambiguation could happen.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh sorry you are right! With different kinds of logs, they should be separated to different queues if we want to ingest them at the same time.

x-pack/agent/docs/agent_configuration_example.yml Outdated Show resolved Hide resolved
name: epm/docker
version: 1.7.0
inputs:
- type: metrics/docker
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I keep stumbling over the type. I would expect it to be docker/metrics. The reason is the order I think of collecting: Let me get some data from Docker and select only the metrics. I understand that from an agent perspective it is the other way around, select metricbeat and then run docker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong feeling about metric before of after docker, I think your reasoning makes sense and might be more future proof if we do want to collect other thing. The mnemonic might be better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made the change on the config.

x-pack/agent/docs/agent_configuration_example.yml Outdated Show resolved Hide resolved
ruflin added a commit to ruflin/package-registry that referenced this pull request Feb 11, 2020
In elastic/beats#15940 datasources, inputs and streams are introduced into the agent config. To make it possible to configure these in the UI and through the API, some changes to the manifest definitions of a package and datasets are needed.

**Package definition**

**Further changes**

* Rename `agent/input` to `agent/stream` as a stream is configured there.
ruflin added a commit to ruflin/package-registry that referenced this pull request Feb 11, 2020
In elastic/beats#15940 datasources, inputs and streams are introduced into the agent config. To make it possible to configure these in the UI and through the API, some changes to the manifest definitions of a package and datasets are needed.

**Package manifest**

Each package must specify the datasources it supports with the supported inputs inside. So far all the packages only support one datasource but I want to keep the door open for this to potentially change in the future. It also makes it possible to have the manifest config of a datasource be identical to the config which ends up in the agent config.

The package manifest datasource definition looks as following (nginx example):

```
datasources:
  -
    # Do we need a name for the data source?
    name: nginx

    # List of inputs this datasource supports
    inputs:
      -
        # An id can be given, in case the type used here is not unique
        # This is for selection in the stream
        # id: nginx
        type: metrics/nginx

        # Common configuration options for this input
        vars:
          - name: hosts
            description: Nginx hosts
            default:
              ["http://127.0.0.1"]
            # All the config options that are required should be shown in the UI
            required: true
          - name: period
            description: "Collection period. Valid values: 10s, 5m, 2h"
            default: "10s"
          - name: username
            type: text
          - name: password
            # This is the html input type?
            type: password

      -
        type: logs

        # Common configuration options for this input
        vars:

      -
        type: syslog

        # Common configuration options for this input
        vars:

```

Inside the datasource, the supported inputs are specified with the common variables across all streams which use a certain input. In the UI I expect that we show the `required` configs by default and all the others are under "Advanced" or similar.

**Dataset manifest**

With the datasources and inputs defined on the package level, each dataset can specify which inputs it supports. Most datasets will only support one input for now. For the nginx metrics this looks as following:

```
inputs:
  - type: "metric/nginx"

    # Only the variables have to be repeated that are not specified as part of the input
    vars:
      # All variables are specified in the input already
```

As an example with supporting multiple inputs, we have the nginx error logs:

```
inputs:
  - type: log
    vars:
      - name: paths
        required: true
        default:
          - /var/log/nginx/error.log*
        os.darwin:
          - /usr/local/var/log/nginx/error.log*
        os.windows:
          - c:/programdata/nginx/logs/error.log*

  - type: syslog
    vars:
      # Are udp and tcp syslog input two different inputs?
      - name: protocol.udp.host
        required: true
        default:
          - "localhost:9000"
```

The log and syslog input are supported (not the case today, just an example). One the dataset level also all additional variables for this dataset are specified. The ones already specified on the input level in the package don't have to be specified again.

**Stream definition**

Now that the dataset has its supported inputs and variables defined, the stream can be defined. The stream defines which input it uses from the dataset and its configuration variables. Here an example for nginx metrics:

```
input: metrics/nginx
metricsets: ["stubstatus"]
period: {{period}}
enabled: true

hosts: {{hosts}}

{{#if username}}
username: "{{username}}"
{{/if}}
{{#if password}}
password: "{{password}}"
{{/if}}
```

During creation time of the stream config the variables from the datasource inputs and local variables from the dataset are filled in.

A stream definition could also support multiple inputs as seen in the following example:

```

{{#if input == log}}
input: log

{{#each paths}}
paths: "{{this}}"
{{/each}}
exclude_files: [".gz$"]

processors:
  - add_locale: ~
{{/if}}

{{#if input == syslog}}
input: syslog

{{/if}}
```

**Further changes**

* Rename `agent/input` to `agent/stream` as a stream is configured there.
ruflin added a commit to ruflin/package-registry that referenced this pull request Feb 11, 2020
In elastic/beats#15940 datasources, inputs and streams are introduced into the agent config. To make it possible to configure these in the UI and through the API, some changes to the manifest definitions of a package and datasets are needed.

**Package manifest**

Each package must specify the datasources it supports with the supported inputs inside. So far all the packages only support one datasource but I want to keep the door open for this to potentially change in the future. It also makes it possible to have the manifest config of a datasource be identical to the config which ends up in the agent config.

The package manifest datasource definition looks as following (nginx example):

```
datasources:
  -
    # Do we need a name for the data source?
    name: nginx

    # List of inputs this datasource supports
    inputs:
      -
        # An id can be given, in case the type used here is not unique
        # This is for selection in the stream
        # id: nginx
        type: metrics/nginx

        # Common configuration options for this input
        vars:
          - name: hosts
            description: Nginx hosts
            default:
              ["http://127.0.0.1"]
            # All the config options that are required should be shown in the UI
            required: true
          - name: period
            description: "Collection period. Valid values: 10s, 5m, 2h"
            default: "10s"
          - name: username
            type: text
          - name: password
            # This is the html input type?
            type: password

      -
        type: logs

        # Common configuration options for this input
        vars:

      -
        type: syslog

        # Common configuration options for this input
        vars:

```

Inside the datasource, the supported inputs are specified with the common variables across all streams which use a certain input. In the UI I expect that we show the `required` configs by default and all the others are under "Advanced" or similar.

**Dataset manifest**

With the datasources and inputs defined on the package level, each dataset can specify which inputs it supports. Most datasets will only support one input for now. For the nginx metrics this looks as following:

```
inputs:
  - type: "metric/nginx"

    # Only the variables have to be repeated that are not specified as part of the input
    vars:
      # All variables are specified in the input already
```

As an example with supporting multiple inputs, we have the nginx error logs:

```
inputs:
  - type: log
    vars:
      - name: paths
        required: true
        default:
          - /var/log/nginx/error.log*
        os.darwin:
          - /usr/local/var/log/nginx/error.log*
        os.windows:
          - c:/programdata/nginx/logs/error.log*

  - type: syslog
    vars:
      # Are udp and tcp syslog input two different inputs?
      - name: protocol.udp.host
        required: true
        default:
          - "localhost:9000"
```

The log and syslog input are supported (not the case today, just an example). One the dataset level also all additional variables for this dataset are specified. The ones already specified on the input level in the package don't have to be specified again.

**Stream definition**

Now that the dataset has its supported inputs and variables defined, the stream can be defined. The stream defines which input it uses from the dataset and its configuration variables. Here an example for nginx metrics:

```
input: metrics/nginx
metricsets: ["stubstatus"]
period: {{period}}
enabled: true

hosts: {{hosts}}

{{#if username}}
username: "{{username}}"
{{/if}}
{{#if password}}
password: "{{password}}"
{{/if}}
```

During creation time of the stream config the variables from the datasource inputs and local variables from the dataset are filled in.

A stream definition could also support multiple inputs as seen in the following example:

```

{{#if input == log}}
input: log

{{#each paths}}
paths: "{{this}}"
{{/each}}
exclude_files: [".gz$"]

processors:
  - add_locale: ~
{{/if}}

{{#if input == syslog}}
input: syslog

{{/if}}
```

**Further changes**

* Rename `agent/input` to `agent/stream` as a stream is configured there.
@exekias
Copy link
Contributor

exekias commented Feb 11, 2020

@exekias I see at the end of the file we separate out the event metricset: https://github.com/elastic/beats/blob/master/metricbeat/module/kubernetes/_meta/config.yml#L46 I assume this is because it does not have a collection period. Should this go under the node metrics input or is this an additional input?

This is node specific, but cluster wise. I would say it should be a new input

@ph ph force-pushed the agent/configuration-example branch from ba0e747 to ac2c364 Compare February 11, 2020 14:32
@ph
Copy link
Contributor Author

ph commented Feb 11, 2020

@ruflin I think I've addressed all the concerns above?

Copy link
Member

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ph I'm good with getting it in as is and have follow up conversation with smaller PRs.

name: epm/aws
version: 1.7.0
inputs:
# Looking at the AWS modules, I believe each fileset need to be in their own
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the queue url defining from where the events are fetched in the end? If yes, shouldn't this be on the stream level instead?

@ph
Copy link
Contributor Author

ph commented Feb 11, 2020

@ruflin Good, I've made the change for the AWS inputs vs stream and also added more credential for the aws/metrics.

@ph
Copy link
Contributor Author

ph commented Feb 11, 2020

@ruflin I am merging this and lets make iteration on top of it.

@ph ph merged commit 93a6f77 into elastic:fleet Feb 11, 2020
ph added a commit to ph/beats that referenced this pull request Feb 11, 2020
- Added period for the system/metrics
- Remove a few slashes for dataset.
- Move queue_url to the input level followed @kaiyan-sheng's [advice](elastic#15940 (comment)).
@ph ph mentioned this pull request Feb 11, 2020
ph added a commit that referenced this pull request Feb 12, 2020
* Agent tweaks

- Added period for the system/metrics
- Remove a few slashes for dataset.
- Move queue_url to the input level followed @kaiyan-sheng's [advice](#15940 (comment)).

* few issues with aws

* Define queue on stream.
ruflin added a commit to elastic/package-registry that referenced this pull request Feb 18, 2020
In elastic/beats#15940 datasources, inputs and streams are introduced into the agent config. To make it possible to configure these in the UI and through the API, some changes to the manifest definitions of a package and datasets are needed.

**Package manifest**

Each package must specify the datasources it supports with the supported inputs inside. So far all the packages only support one datasource but I want to keep the door open for this to potentially change in the future. It also makes it possible to have the manifest config of a datasource be identical to the config which ends up in the agent config.

The package manifest datasource definition looks as following (nginx example):

```
datasources:
  -
    # Do we need a name for the data source?
    name: nginx

    # List of inputs this datasource supports
    inputs:
      -
        # An id can be given, in case the type used here is not unique
        # This is for selection in the stream
        # id: nginx
        type: metrics/nginx

        # Common configuration options for this input
        vars:
          - name: hosts
            description: Nginx hosts
            default:
              ["http://127.0.0.1"]
            # All the config options that are required should be shown in the UI
            required: true
          - name: period
            description: "Collection period. Valid values: 10s, 5m, 2h"
            default: "10s"
          - name: username
            type: text
          - name: password
            # This is the html input type?
            type: password

      -
        type: logs

        # Common configuration options for this input
        vars:

      -
        type: syslog

        # Common configuration options for this input
        vars:

```

Inside the datasource, the supported inputs are specified with the common variables across all streams which use a certain input. In the UI I expect that we show the `required` configs by default and all the others are under "Advanced" or similar.

**Dataset manifest**

With the datasources and inputs defined on the package level, each dataset can specify which inputs it supports. Most datasets will only support one input for now. For the nginx metrics this looks as following:

```
inputs:
  - type: "metric/nginx"

    # Only the variables have to be repeated that are not specified as part of the input
    vars:
      # All variables are specified in the input already
```

As an example with supporting multiple inputs, we have the nginx error logs:

```
inputs:
  - type: log
    vars:
      - name: paths
        required: true
        default:
          - /var/log/nginx/error.log*
        os.darwin:
          - /usr/local/var/log/nginx/error.log*
        os.windows:
          - c:/programdata/nginx/logs/error.log*

  - type: syslog
    vars:
      # Are udp and tcp syslog input two different inputs?
      - name: protocol.udp.host
        required: true
        default:
          - "localhost:9000"
```

The log and syslog input are supported (not the case today, just an example). One the dataset level also all additional variables for this dataset are specified. The ones already specified on the input level in the package don't have to be specified again.

**Stream definition**

Now that the dataset has its supported inputs and variables defined, the stream can be defined. The stream defines which input it uses from the dataset and its configuration variables. Here an example for nginx metrics:

```
input: metrics/nginx
metricsets: ["stubstatus"]
period: {{period}}
enabled: true

hosts: {{hosts}}

{{#if username}}
username: "{{username}}"
{{/if}}
{{#if password}}
password: "{{password}}"
{{/if}}
```

During creation time of the stream config the variables from the datasource inputs and local variables from the dataset are filled in.

A stream definition could also support multiple inputs as seen in the following example:

```

{{#if input == log}}
input: log

{{#each paths}}
paths: "{{this}}"
{{/each}}
exclude_files: [".gz$"]

processors:
  - add_locale: ~
{{/if}}

{{#if input == syslog}}
input: syslog

{{/if}}
```

**Further changes**

* Rename `agent/input` to `agent/stream` as a stream is configured there.
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
* Agent Configuration examples

This is a work in progress configuration:

- Show how datasources works
- Show inputs and streams
- Show possible usage of multiples outputs
- Show definition of endpoint
- Show custom source and source generate from a package.
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
* Agent tweaks

- Added period for the system/metrics
- Remove a few slashes for dataset.
- Move queue_url to the input level followed @kaiyan-sheng's [advice](elastic#15940 (comment)).

* few issues with aws

* Define queue on stream.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet