Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Elastic Agent CRD (standalone mode) #4010

Merged
merged 19 commits into from
Dec 10, 2020

Conversation

david-kow
Copy link
Contributor

In this PR

Added:

  • Elastic Agent CRD
  • Elastic Agent controller and its association controller
  • required RBAC changes for Helm/E2E tests
  • required PSP for E2E tests
  • E2E tests utilities for Elastic Agent
  • E2E test of Elastic Agent with System integration enabled
  • improved logging utilities

Additional notes

  • CRD is complete (contains all fields forseen right now), but controller does not support all of the features yet. These gaps will be tracked separately and are out-of-scope for this PR.
  • code is heavily based on the Beats code, but it's not shared. My assessment was that the work required to make the code reusable was not worth it and we could end up with it being fairly convoluted. Also, Elastic Agent most likely will evolve differently than Beats so having the code separated seems to be desirable.

Try it out yourself!

  • an example manifest that works OOTB on non-PSP cluster:
Click to see Elasticsearch, Kibana, Elastic Agent manifest
cat <<EOF | kubectl apply -f -
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: elasticsearch
spec:
  version: 7.10.0
  nodeSets:
  - name: default
    count: 3
    config:
      node.store.allow_mmap: false
---
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  name: kibana
spec:
  version: 7.10.0
  count: 1
  elasticsearchRef:
    name: elasticsearch
---
apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
metadata:
  name: elastic-agent
spec:
  version: 7.10.0
  elasticsearchRefs:
  - name: elasticsearch
  daemonSet:
    podTemplate:
      spec:
        securityContext:
          runAsUser: 0
  config:
    id: 2d70a6f0-33a5-11eb-bb2f-418d0388a8cf
    revision: 2
    agent:
      monitoring:
        enabled: true
        use_output: default
        logs: true
        metrics: true
    inputs:
      - id: 2e187fb0-33a5-11eb-bb2f-418d0388a8cf
        name: system-1
        revision: 1
        type: logfile
        use_output: default
        meta:
          package:
            name: system
            version: 0.9.1
        data_stream:
          namespace: default
        streams:
          - id: logfile-system.auth
            data_stream:
              dataset: system.auth
              type: logs
            paths:
              - /var/log/auth.log*
              - /var/log/secure*
            exclude_files:
              - .gz$
            multiline:
              pattern: ^\s
              match: after
            processors:
              - add_locale: null
              - add_fields:
                  target: ''
                  fields:
                    ecs.version: 1.5.0
          - id: logfile-system.syslog
            data_stream:
              dataset: system.syslog
              type: logs
            paths:
              - /var/log/messages*
              - /var/log/syslog*
            exclude_files:
              - .gz$
            multiline:
              pattern: ^\s
              match: after
            processors:
              - add_locale: null
              - add_fields:
                  target: ''
                  fields:
                    ecs.version: 1.5.0
      - id: 2e187fb0-33a5-11eb-bb2f-418d0388a8cf
        name: system-1
        revision: 1
        type: system/metrics
        use_output: default
        meta:
          package:
            name: system
            version: 0.9.1
        data_stream:
          namespace: default
        streams:
          - id: system/metrics-system.cpu
            data_stream:
              dataset: system.cpu
              type: metrics
            metricsets:
              - cpu
            cpu.metrics:
              - percentages
              - normalized_percentages
            period: 10s
          - id: system/metrics-system.diskio
            data_stream:
              dataset: system.diskio
              type: metrics
            metricsets:
              - diskio
            diskio.include_devices: null
            period: 10s
          - id: system/metrics-system.filesystem
            data_stream:
              dataset: system.filesystem
              type: metrics
            metricsets:
              - filesystem
            period: 1m
            processors:
              - drop_event.when.regexp:
                  system.filesystem.mount_point: ^/(sys|cgroup|proc|dev|etc|host|lib|snap)($|/)
          - id: system/metrics-system.fsstat
            data_stream:
              dataset: system.fsstat
              type: metrics
            metricsets:
              - fsstat
            period: 1m
            processors:
              - drop_event.when.regexp:
                  system.fsstat.mount_point: ^/(sys|cgroup|proc|dev|etc|host|lib|snap)($|/)
          - id: system/metrics-system.load
            data_stream:
              dataset: system.load
              type: metrics
            metricsets:
              - load
            period: 10s
          - id: system/metrics-system.memory
            data_stream:
              dataset: system.memory
              type: metrics
            metricsets:
              - memory
            period: 10s
          - id: system/metrics-system.network
            data_stream:
              dataset: system.network
              type: metrics
            metricsets:
              - network
            period: 10s
            network.interfaces: null
          - id: system/metrics-system.process
            data_stream:
              dataset: system.process
              type: metrics
            metricsets:
              - process
            period: 10s
            process.include_top_n.by_cpu: 5
            process.include_top_n.by_memory: 5
            process.cmdline.cache.enabled: true
            process.cgroups.enabled: false
            process.include_cpu_ticks: false
            processes:
              - .*
          - id: system/metrics-system.process_summary
            data_stream:
              dataset: system.process_summary
              type: metrics
            metricsets:
              - process_summary
            period: 10s
          - id: system/metrics-system.socket_summary
            data_stream:
              dataset: system.socket_summary
              type: metrics
            metricsets:
              - socket_summary
            period: 10s
          - id: system/metrics-system.uptime
            data_stream:
              dataset: system.uptime
              type: metrics
            metricsets:
              - uptime
            period: 10s
EOF
  • kubectl tree output for the above manifest:
$ kubectl tree agents elastic-agent
NAMESPACE  NAME                                                   READY  REASON  AGE
default    Agent/elastic-agent                                    -              59s
default    ├─DaemonSet/elastic-agent-agent                        -              55s
default    │ ├─ControllerRevision/elastic-agent-agent-7b97cddc98  -              55s
default    │ ├─Pod/elastic-agent-agent-dcvsb                      True           55s
default    │ ├─Pod/elastic-agent-agent-lmwkf                      True           55s
default    │ └─Pod/elastic-agent-agent-pwxhf                      True           55s
default    ├─Secret/elastic-agent-agent-config                    -              55s
default    ├─Secret/elastic-agent-agent-es-ca                     -              57s
default    └─Secret/elastic-agent-agent-user                      -              59s
  • data streams visible in Fleet UI:

Screenshot 2020-12-04 at 10 07 34

  • to create a manifest on your own:
    • create the desired policy in Fleet UI
    • in Policies tab go to Actions (three dots) > Add agent > Run standalone
    • remove outputs and use the config as the value of config in Agent spec

Screenshot 2020-12-04 at 11 24 57

Testing

  • E2E test utilities for Elastic Agent were added
  • basic, example E2E test is added (passing on PSP-enabled cluster when manually tested)
  • UT coverage will be added gradually

Logging and APM

As per previous discussions, our next CRD controller should pass tracing context around to allow log correlation. It turned out there is not that much of logging in the Agent controller, but the facilities were added (based on @charith-elastic work) and can be reused in the future. The logic I followed is that every function that returns reconciler.Results will have a matching APM Span and will capture errors, before bubbling them up. This way errors visible in APM provide more context. The changes in Agent controller compared to other controllers manifest as:

  • transaction metadata is passed in context
  • logger with transaction metadata included is passed in context:

Screenshot 2020-12-04 at 08 03 14

  • iteration is now a label on the transaction allowing to see a particular iteration execution in the APM UI with stacks, logs and errors attached:

Screenshot 2020-12-03 at 16 00 28

  • more spans in the code:

Screenshot 2020-12-03 at 21 10 26

@david-kow david-kow added >feature Adds or discusses adding a feature to the product release-highlight Candidate for the ECK release highlight summary v1.4.0 labels Dec 4, 2020
@david-kow david-kow added the elastic-agent For tasks related to Elastic Agent support label Dec 8, 2020
Copy link
Collaborator

@pebrc pebrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a very quick first pass. Looks good and works at least the default setup you provided. I want to find more time to look more closely. The only thing the stuck out a bit is the setup for the tracing where I think we need to streamline the API a bit and separate concerns between the generic log package and reconciliation a bit more maybe.

config/e2e/monitoring.yaml Outdated Show resolved Hide resolved
pkg/apis/agent/v1alpha1/agent_types.go Outdated Show resolved Hide resolved
pkg/apis/agent/v1alpha1/agent_types.go Outdated Show resolved Hide resolved
pkg/apis/agent/v1alpha1/agent_types.go Outdated Show resolved Hide resolved
pkg/controller/common/tracing/transaction.go Show resolved Hide resolved
pkg/utils/log/log.go Outdated Show resolved Hide resolved
pkg/controller/common/tracing/spans.go Outdated Show resolved Hide resolved
pkg/controller/agent/config.go Outdated Show resolved Hide resolved
pkg/controller/agent/controller.go Outdated Show resolved Hide resolved
david-kow and others added 8 commits December 10, 2020 08:05
Co-authored-by: Peter Brachwitz <peter.brachwitz@gmail.com>
Now logs have span.id injected correctly:
2020-12-10T12:55:08.729+0100    DEBUG   agent-controller        test log Reconcile      		{"trace.id": "56a3e12652c3111375da9c9859cf4e37", "transaction.id": "56a3e12652c31113", ...}
2020-12-10T12:55:08.729+0100    DEBUG   agent-controller        test log doReconcile    		{"trace.id": "56a3e12652c3111375da9c9859cf4e37", "transaction.id": "56a3e12652c31113", "span.id": "7f516431647f01dd", ...}
2020-12-10T12:55:08.729+0100    DEBUG   agent-controller        test log internalReconcile      {"trace.id": "56a3e12652c3111375da9c9859cf4e37", "transaction.id": "56a3e12652c31113", "span.id": "e31ac953fbd79bf7", ...}
2020-12-10T12:55:08.739+0100    DEBUG   agent-controller        test log reconcilePodVehicle    {"trace.id": "56a3e12652c3111375da9c9859cf4e37", "transaction.id": "56a3e12652c31113", "span.id": "3d76405e4e69b6f2", ...}
@david-kow david-kow requested a review from pebrc December 10, 2020 12:16
Copy link
Contributor

@charith-elastic charith-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I am a bit torn about elasticsearchRefs. If we support only one sink at the moment, maybe we should stick with elasticsearchRef because the CRD is still in alpha and can always be changed later. On the other hand, I can see that it'd be a painful migration to do.

pkg/apis/agent/v1alpha1/agent_types.go Show resolved Hide resolved
Copy link
Collaborator

@pebrc pebrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM I think I am in favour of merging. Nice work! I have not tested the tracing bits. Implementationwise it looks indeed very similar to our Beats implementation and I wonder if we can merge some of the code once we have the full picture of what Agent will look like.

test/e2e/test/agent/builder.go Outdated Show resolved Hide resolved
test/e2e/test/agent/builder.go Outdated Show resolved Hide resolved
test/e2e/test/agent/builder.go Outdated Show resolved Hide resolved
@david-kow
Copy link
Contributor Author

run full pr build

@david-kow david-kow merged commit 6c1518e into elastic:master Dec 10, 2020
@david-kow david-kow deleted the add_elastic_agent_crd branch December 10, 2020 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
elastic-agent For tasks related to Elastic Agent support >feature Adds or discusses adding a feature to the product release-highlight Candidate for the ECK release highlight summary v1.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants