Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate Elastic Agent #3201

Closed
pebrc opened this issue Jun 8, 2020 · 13 comments
Closed

Investigate Elastic Agent #3201

pebrc opened this issue Jun 8, 2020 · 13 comments
Assignees
Labels
discuss We need to figure this out >feature Adds or discusses adding a feature to the product

Comments

@pebrc
Copy link
Collaborator

pebrc commented Jun 8, 2020

We should

  • following the development in https://github.com/elastic/beats/labels/Team%3AIngest%20Management
  • experiment with the current implementation
  • explore and understand the deployment model on ECK/Kubernetes
  • give feedback/engage with the ingest management team
  • feeback the findings from this work to inform our own decisions about the Beats CRD and related work
@pebrc pebrc added the :beats label Jun 8, 2020
@botelastic botelastic bot added the triage label Jun 8, 2020
@pebrc pebrc added the >feature Adds or discusses adding a feature to the product label Jun 8, 2020
@botelastic botelastic bot removed the triage label Jun 8, 2020
@pebrc pebrc added discuss We need to figure this out triage labels Jun 8, 2020
@botelastic botelastic bot removed the triage label Jun 8, 2020
@pebrc
Copy link
Collaborator Author

pebrc commented Jun 27, 2020

While testing 1.2.0 I explored the possibility of deploying Elastic Agent as just another Beat. It turned out that while it is possible shoehorning Elastic Agent into the existing Beat CRD there is very little value in it: the config format for outputs differs from the format the Beats use so there is zero reusable configuration. Also when integrated with Fleet output configuration is set via Kibana's agent configuration API.

There are a few challenges when running Elastic Agent on k8s with ECK:

  • an Enrollment Token needs to be created in Kibana before any agent can be enrolled (I did this manually for the purposes of this test, the same token can be used to enroll multiple agents though)
  • there are of course the known limitations of Elastic Agent (most notably for k8s no autodiscover)
  • the Elasticsearch output configuration shipped by Kibana to the agent is unaware of any of ECK's self-signed certificates afaik and therefore non-functional
{
  "action": "checkin",
  "success": true,
  "actions": [
    {
      "agent_id": "05da5b91-7134-4d41-b279-e03723f67c25",
      "type": "CONFIG_CHANGE",
      "data": {
        "config": {
          "id": "4bcaca10-b7bb-11ea-b290-cd25dc4a6f57",
          "outputs": {
            "default": {
              "type": "elasticsearch",
              "hosts": [
                "https://o11y-es-http.default.svc:9200"
              ],
              "api_key": "<redacted>"
            }
          },
  • the below manifest uses an init container to enroll the agent. It is worth noting that enrolling the agent not only overwrites the original configuration but also creates an additional encrypted configuration file (fleet.yml) which needs to be shared with the main container.

The manifest below contains additional hostPath mounts which I simply copied from the filebeat manifest, which are not functional in any form atm.

---
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: o11y
spec:
  version: 7.8.0
  nodeSets:
  - name: default
    count: 3
    config:
      # This setting could have performance implications for production clusters.
      # See: https://www.elastic.co/guide/en/cloud-on-k8s/master/k8s-virtual-memory.html
      node.store.allow_mmap: false
---
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  name: o11y
spec:
  version: 7.8.0
  count: 1
  config:
    xpack.ingestManager.enabled: true
    xpack.ingestManager.fleet.elasticsearch.host: "https://o11y-es-http.default.svc:9200"
    xpack.ingestManager.fleet.kibana.host: "https://o11y-kb-http.default.svc:5601"
  elasticsearchRef:
    name: o11y
  http:
    service:
      spec:
        type: LoadBalancer
---        
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: agent-poc
spec:
  selector:
    matchLabels:
      common.k8s.elastic.co/type: agent
  template:
    metadata:
      labels:
        common.k8s.elastic.co/type: agent
    spec:
      automountServiceAccountToken: true
      terminationGracePeriodSeconds: 30
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      initContainers:
      - name: init-cfg
        command:
        - bash
        - -c
        - touch /usr/share/elastic-agent/config/agent/cfg.yml
        image: docker.elastic.co/beats/elastic-agent:7.8.0
        volumeMounts:
        - mountPath: /usr/share/elastic-agent/config/agent
          name: shared-config
      - name: enroll
        command:
        - elastic-agent
        - enroll
        args:
        - https://o11y-kb-http.default.svc:5601
        - ${ENROLLMENT_TOKEN}
        - -a
        - /usr/share/elastic-agent/config/other/ca.crt
        - -f
        - -c
        - /usr/share/elastic-agent/config/agent/cfg.yml
        - --path.home=/usr/share/elastic-agent/config/agent
        image: docker.elastic.co/beats/elastic-agent:7.8.0
        volumeMounts:
        - mountPath: /usr/share/elastic-agent/config/agent
          name: shared-config
        - mountPath: /usr/share/elastic-agent/config/other
          name: kibana-certs
      containers:
      - name: elastic-agent
        args:
        - run
        - -c
        - /usr/share/elastic-agent/config/agent/cfg.yml
        - --path.home=/usr/share/elastic-agent/config/agent
        command:
        - elastic-agent
        image: docker.elastic.co/beats/elastic-agent:7.8.0
        volumeMounts:
        - mountPath: /usr/share/elastic-agent/data
          name: agent-data
        - mountPath: /usr/share/elastic-agent/config/other
          name: kibana-certs
        - mountPath: /usr/share/elastic-agent/config/agent
          name: shared-config
        - mountPath: /var/lib/docker/containers
          name: varlibdockercontainers
        - mountPath: /var/log/containers
          name: varlogcontainers
        - mountPath: /var/log/pods
          name: varlogpods
      securityContext:
        runAsUser: 0
      volumes:
      - name: agent-data
        emptyDir: {}
      - name: kibana-certs
        secret:
          defaultMode: 420
          secretName: o11y-kb-http-certs-public
      - emptyDir: {}
        name: shared-config
      - hostPath:
          path: /var/lib/docker/containers
          type: ""
        name: varlibdockercontainers
      - hostPath:
          path: /var/log/containers
          type: ""
        name: varlogcontainers
      - hostPath:
          path: /var/log/pods
          type: ""
        name: varlogpods

@david-kow
Copy link
Contributor

Nice. Did you test the standalone (non-fleet) mode? Any issues there?

an Enrollment Token needs to be created in Kibana before any agent can be enrolled (I did this manually for the purposes of this test, the same token can be used to enroll multiple agents though)

This is probably something we want the operator to orchestrate.

the Elasticsearch output configuration shipped by Kibana to the agent is unaware of any of ECK's self-signed certificates afaik and therefore non-functional

Kibana already knows about CAs that are needed here, would it make sense for them to be included in the config that is pushed to the Agent? Ie. is that a feature that Fleet should implement?

@pebrc
Copy link
Collaborator Author

pebrc commented Jun 29, 2020

Nice. Did you test the standalone (non-fleet) mode? Any issues there?

I did not. But it would allow ECK to configure a correct default output with certificates.

an Enrollment Token needs to be created in Kibana before any agent can be enrolled (I did this manually for the purposes of this test, the same token can be used to enroll multiple agents though)

This is probably something we want the operator to orchestrate.

Yes I think so too.

the Elasticsearch output configuration shipped by Kibana to the agent is unaware of any of ECK's self-signed certificates afaik and therefore non-functional

Kibana already knows about CAs that are needed here, would it make sense for them to be included in the config that is pushed to the Agent? Ie. is that a feature that Fleet should implement?

Good point. I will raise an issue in the Beats repository.

@SeanPlacchetti
Copy link

SeanPlacchetti commented Aug 24, 2020

@david-kow Not sure if there's another issue tracking the implementation of the Elastic Agent in ECK, but I was trying to track down the progress on that, the 7.9 Elastic Stack drop has options to run the Elastic Agent as beta.

@david-kow
Copy link
Contributor

Hi @SeanPlacchetti, this didn't get too much traction so far, but this should change in following weeks. This is the right issue to track and I'll make sure it's updated with the progress.

@axw
Copy link
Member

axw commented Sep 17, 2020

FYI we're looking into integrating APM Server with Elastic Agent now: elastic/apm-server#4004. We anticipate that in 8.0 this will be the one and only way of running a fully functioning APM Server, as all index templates, pipelines, etc. would be managed by Fleet.

@david-kow
Copy link
Contributor

david-kow commented Sep 18, 2020

I played with the Elastic Agent and managed to get it to work in Fleet mode with hands-off setup. This includes creating Fleet user, grabbing tokens, enrolling agents and running the Agent, all while using our custom CAs. See the manifest at the bottom.

Some notes:

  • CA handling is not great right now, so [Elastic Agent] Support custom certificate authorities  beats#19504 would be really needed.
  • I feel like the enrolment could be simplified. The only input to the process in the simplest case are the credentials to connect to Kibana. Running Fleet setup (if it's still needed), enrolling using the default token (if it's still needed) and running the Agent could be a single command.
  • It would be great if enroll (I'm not sure if run doesn't already do it) offered a way to retry. This would help greatly when users are deploying the Stack and Agents simultaneously. Instead of Pod restarts when Kibana is not yet available the process would keep running and retries happening would be visible through logs. Similar feature is supported by Beats when connecting to Elasticsearch output or Kibana (for dashboard setup) already. @ruflin - something to consider maybe.
  • Kubernetes module integration doesn't work out-of-the-box, I'm getting the below.
2020-09-18T05:06:47.775Z        DEBUG   kibana/client.go:170    Request method: POST, path: /api/ingest_manager/fleet/agents/049b65a1-aaa2-41b2-9dd3-f4384a646eac/checkin
2020-09-18T05:06:48.488Z        DEBUG   application/action_dispatcher.go:81     Dispatch 1 actions of types: *fleetapi.ActionConfigChange
2020-09-18T05:06:48.489Z        DEBUG   application/handler_action_policy_change.go:23  handlerConfigChange: action 'action_id: 8654298c-7e72-4930-8518-031f9da51634, type: CONFIG_CHANGE' received
2020-09-18T05:06:48.490Z        DEBUG   application/handler_action_policy_change.go:34  handlerConfigChange: emit configuration for action action_id: 8654298c-7e72-4930-8518-031f9da51634, type: CONFIG_CHANGE
2020-09-18T05:06:48.491Z        DEBUG   application/emitter.go:39       Transforming configuration into a tree
2020-09-18T05:06:48.491Z        DEBUG   application/action_dispatcher.go:93     Failed to dispatch action 'action_id: 8654298c-7e72-4930-8518-031f9da51634, type: CONFIG_CHANGE', error: could not create the AST from the configuration: missing field accessing 'inputs'
2020-09-18T05:06:48.491Z        ERROR   application/fleet_gateway.go:159        failed to dispatch actions, error: could not create the AST from the configuration: missing field accessing 'inputs'
  • I wasn't able to find any "generic" Metricbeat integration. For logs there is "Custom logs" that is pretty flexible, but I couldn't find a similar one for metrics. I'll assume this is work in progress.
  • My understanding is that for standalone mode users, running with ECK would be fairly simple - provide configs, deploy, inspect/monitor via UI, but make changes through the ECK manifest. For Fleet mode users it's more complicated it seems.
    With the manifest above, ECK can deploy Agent, but depending on configurations set by user via Fleet the Pod might need different permissions. Examples would be the right RBAC for Kubernetes API endpoints, mounting the right host path or using hostNetwork when needed. Right now this is a two step process where first the Pod template needs to be modified to give the Agent right access and then changes can be made via Fleet. This is not very convenient, so we should think if we want to improve this and if yes, how to do it.

Elastic Agent with ECK (POC)

Screenshot 2020-09-18 at 07 54 11

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: elasticsearch
spec:
  version: 7.9.0
  nodeSets:
  - name: default
    count: 3
    config:
      node.store.allow_mmap: false
---
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  name: kibana
spec:
  version: 7.9.0
  count: 1
  config:
    xpack.ingestManager.fleet.elasticsearch.host: "https://elasticsearch-es-http.default.svc:9200"
    xpack.ingestManager.fleet.kibana.host: "https://kibana-kb-http.default.svc:5601"
  elasticsearchRef:
    name: elasticsearch
---        
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: agent-poc
spec:
  selector:
    matchLabels:
      common.k8s.elastic.co/type: agent
  template:
    metadata:
      labels:
        common.k8s.elastic.co/type: agent
    spec:
      automountServiceAccountToken: true
      terminationGracePeriodSeconds: 30
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      initContainers:
      - name: agent-setup
        command: ["/bin/sh","-c"]
        args: 
        - |
          set -e

          # this file is created when agent runs
          test -f "action_store.yml" && exit 0;
          
          # we need to trust custom CA from Kibana
          cp /usr/share/elastic-agent/config/kb/ca.crt /etc/pki/ca-trust/source/anchors/
          update-ca-trust
          
          # hardcoded Kibana service URL
          KIBANA_URL=https://kibana-kb-http.default.svc:5601

          # setup Fleet user
          curl -XPOST -u ${ELASTICSEARCH_USER}:${ELASTICSEARCH_PASS} "${KIBANA_URL}/api/ingest_manager/fleet/setup" -d '{"forceRecreate":false}' -H "kbn-xsrf: reporting" -H "Content-Type: application/json"

          # grab the first (default) enrollment token
          EK_ID=$(curl -u ${ELASTICSEARCH_USER}:${ELASTICSEARCH_PASS} "${KIBANA_URL}/api/ingest_manager/fleet/enrollment-api-keys?page=1&perPage=20" | jq -r .list[0].id)
          TOKEN=$(curl -u ${ELASTICSEARCH_USER}:${ELASTICSEARCH_PASS} "${KIBANA_URL}/api/ingest_manager/fleet/enrollment-api-keys/${EK_ID}" | jq -r .item.api_key)

          # create empty config file as enroll complains if it's not there
          touch  /usr/share/elastic-agent/config/agent/elastic-agent.yml

          # enrolls the agent and uses .../agent directory for config (elastic-agent.yml and fleet.yml)
          elastic-agent enroll ${KIBANA_URL} ${TOKEN} -f --path.config /usr/share/elastic-agent/config/agent
        image: docker.elastic.co/beats/elastic-agent:7.9.0
        env:
        - name: ELASTICSEARCH_USER
          value: elastic
        - name: ELASTICSEARCH_PASS
          valueFrom:
            secretKeyRef:
              name: elasticsearch-es-elastic-user
              key: elastic
        volumeMounts:
        - mountPath: /usr/share/elastic-agent/config/agent
          name: shared-config
        - mountPath: /usr/share/elastic-agent/config/kb
          name: kb-certs
        - mountPath: /usr/share/elastic-agent/config/es
          name: es-certs
      containers:
      - name: elastic-agent
        command: ["/bin/sh","-c"]
        args: 
        - |
          # trust Kibana (for Fleet) and ES (for Beats)
          cp /usr/share/elastic-agent/config/es/ca.crt /etc/pki/ca-trust/source/anchors/esca.crt
          cp /usr/share/elastic-agent/config/kb/ca.crt /etc/pki/ca-trust/source/anchors/kbca.crt
          update-ca-trust

          # runs the agent and uses elastic-agent.yml and fleet.yml created by init container
          elastic-agent run --path.config /usr/share/elastic-agent/config/agent -e
        image: docker.elastic.co/beats/elastic-agent:7.9.0
        volumeMounts:
        - mountPath: /usr/share/elastic-agent/config/kb
          name: kb-certs
        - mountPath: /usr/share/elastic-agent/config/es
          name: es-certs
        - mountPath: /usr/share/elastic-agent/config/agent
          name: shared-config
        - mountPath: /var/lib/docker/containers
          name: varlibdockercontainers
        - mountPath: /var/log/containers
          name: varlogcontainers
        - mountPath: /var/log/pods
          name: varlogpods
      securityContext:
        runAsUser: 0
      volumes:
      - name: kb-certs
        secret:
          defaultMode: 420
          secretName: kibana-kb-http-certs-public
      - name: es-certs
        secret:
          defaultMode: 420
          secretName: elasticsearch-es-http-certs-public 
      - emptyDir: {}
        name: shared-config
      - hostPath:
          path: /var/lib/docker/containers
          type: ""
        name: varlibdockercontainers
      - hostPath:
          path: /var/log/containers
          type: ""
        name: varlogcontainers
      - hostPath:
          path: /var/log/pods
          type: ""
        name: varlogpods

@ruflin
Copy link
Member

ruflin commented Sep 18, 2020

Great to see the progress on this. I think there are at least 2 different aspects of Elastic Agent in ECK:

  • Enrolled Elastic Agent into Fleet that is just idle and waiting for policies. This Agent can be used for polling use cases like AWS package, Uptime or getting policies for monitoring some other services running in the same k8s cluster like MySQL.
  • Enrolled Elastic Agent that directly monitors K8s/ECK: Here, we need additional volumes and permissions. This Agent would also cover the current MB / FB monitoring of the Stack. I'm wonder if there should be potentially 2 Agents for this: 1 monitoring k8s and all its bits in a generic way and one focused on monitoring the stack.

You mention above, that the k8s module does not work. I assume you use the prebuilt integration here (not module ;-) ).

++ on having retry, I think there are multiple use cases for this. Could you open an issue for this in the Beats repo?

For the generic metrics implementation, we don't have it yet. But I wonder what you need it for in this context?

Thanks a lot for pushing this forward, happy to push forward any changes needed on our end to make it happen.

@david-kow Are your changes by chance in a branch / draft PR somewhere so playing around with it would be possible? Or does the above snipped contain already all I need? Sorry, not too familiar with ECK yet.

@david-kow
Copy link
Contributor

Thanks for your comment @ruflin.

Yes, I was talking about Kubernetes integration, the module works well :)

I've created an issue for retry.

As to the generic metrics, nothing specific right now, but we do use it for Stack Monitoring in our current Beat examples, so I thought I'll mention it.

The above is all you need (assuming ECK is already installed) :)

  • Enrolled Elastic Agent into Fleet that is just idle and waiting for policies. This Agent can be used for polling use cases like AWS package, Uptime or getting policies for monitoring some other services running in the same k8s cluster like MySQL.
  • Enrolled Elastic Agent that directly monitors K8s/ECK: Here, we need additional volumes and permissions. This Agent would also cover the current MB / FB monitoring of the Stack. I'm wonder if there should be potentially 2 Agents for this: 1 monitoring k8s and all its bits in a generic way and one focused on monitoring the stack.

I think this might end up being more than two "flavours". When we add support for other Beats, we'll need more and more permissions and settings, like host paths mounted, host network access/pid, container capabilities and related k8s API permissions and RBAC resources. And then there are some k8s distribution specific concerns - mostly with OpenShift security context vs the rest. Also, there will be always a use case/scenario/setup that we didn't think about, so whatever we'll do we'll need to leave a way for users to set it up however they want.

The perfect solution would be to translate the config in Fleet to the required k8s config. This would allow for a very smooth experience, but it would require a significant amount of work, as each feature in the Agent (underlying Beats) would have a defined permission/config needed in k8s. It would also require coordination between Fleet and ECK, so ECK can update the Pod specs as needed. I don't think this is feasible and something worth investing in.

We had similar challenges with current Beats CRD and we ended up with no defaults, no presets, no built-in configurations. We can document (similar to Beats CRD docs) how to address few common scenarios for users to build upon.

The "default" config could be a no-config which would allow for some of the Beats, like Heartbeat or possibly Metricbeat to run. Others would need users to specify some config in the ECK manifest to run correctly. And possibly, even more configuration would be required for things like Autodiscover to work (for context: Autodiscover requires access to k8s APIs that can only be granted by creating k8s resources outside of ECKs manifests).

For bare Metricbeat/Heartbeat, on ECK side we could have:

apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
metadata:
  name: uptime
spec:
  version: 7.9.1
  elasticsearchRef:
    name: elasticsearch
  kibanaRef:
    name: kibana
  mode: fleet
  deployment:
    replicas: 1

For Filebeat:

apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
metadata:
  name: o11y
spec:
  version: 7.9.1
  elasticsearchRef:
    name: elasticsearch
  kibanaRef:
    name: kibana
  mode: fleet
  daemonSet:
    podTemplate:
      spec:
        automountServiceAccountToken: true
        terminationGracePeriodSeconds: 30
        dnsPolicy: ClusterFirstWithHostNet
        hostNetwork: true # Allows to provide richer host metadata
        containers:
        - name: filebeat
          securityContext:
            runAsUser: 0
          volumeMounts:
          - name: varlogcontainers
            mountPath: /var/log/containers
          - name: varlogpods
            mountPath: /var/log/pods
          - name: varlibdockercontainers
            mountPath: /var/lib/docker/containers
        volumes:
        - name: varlogcontainers
          hostPath:
            path: /var/log/containers
        - name: varlogpods
          hostPath:
            path: /var/log/pods
        - name: varlibdockercontainers
          hostPath:
            path: /var/lib/docker/containers

The above would align well with mode: standalone where enabling it would just mean that config: ... needs to be provided and there are no different defaults and/or assumptions made.

@Just-Insane
Copy link

Is there any update on this? I am trying to run Fleet in ECK, prior to this I was following the quickstart and was able to get the elastic-agent to enroll (running directly on the nodes), however, it would seemingly not deploy the beats or send data to Elasticsearch (with nothing in the logs).

It seems like the best way to get logs/metrics out of Kubernetes right now is to manually install and configure Beats on the nodes?

@david-kow
Copy link
Contributor

Hey @Just-Insane, no significant updates on this just yet. For making it work today with Elasticsearch/Kibana deployed by ECK, the best I know of is to try the proof-of-concept above.

Note that Elastic Agent/Fleet is different than "raw" Beats. We support the latter, but we don't support the Elastic Agent just yet. The work towards that is currently in progress.

For general logs/metrics using Beats you can check out our quickstart, configuration docs or just apply the examples.

@david-kow
Copy link
Contributor

Work for the Elastic Agent CRD in the standalone mode is tracked in a separate issue.

@pebrc
Copy link
Collaborator Author

pebrc commented Feb 1, 2021

Closing this for now as initial work to support Agent has been completed. Let's open a new issue for Fleet when the time comes.

@pebrc pebrc closed this as completed Feb 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss We need to figure this out >feature Adds or discusses adding a feature to the product
Projects
None yet
Development

No branches or pull requests

6 participants