Verify ElasticSearch API exists #275

leifmadsen · 2021-10-08T17:39:52Z

Verify the ElasticSearch API exists before executing changes on the
cluster that rely on Elastic Cloud on Kubernetes Operator to exist.

Clear out the fact cache before looking up the available api_groups on
the cluster, otherwise adding/removing the *.k8s.elastic.co can cause
issues due to fact caching in Ansible.

Wrap logic that requests an ElasticSearch instance and the eventing
Smart Gateways to verify the 'has_elasticsearch_api' parameter is set to
a value of true, otherwise, skip those steps.

Closes: rhbz#1959166

Verify the ElasticSearch API exists before executing changes on the cluster that rely on Elastic Cloud on Kubernetes Operator to exist. Clear out the fact cache before looking up the available api_groups on the cluster, otherwise adding/removing the *.k8s.elastic.co can cause issues due to fact caching in Ansible. Wrap logic that requests an ElasticSearch instance and the eventing Smart Gateways to verify the 'has_elasticsearch_api' parameter is set to a value of true, otherwise, skip those steps. Closes: rhbz#1959166

leifmadsen · 2021-10-08T17:45:05Z

I ran through this and found an issue with caching of the facts, where if you removed (or I guess added) the ECK Operator after the STO had started, the values in the api_groups would be invalid due to caching. I added a meta action to clear the fact cache, and this seems to have done the trick.

Example output of a cluster that doesn't have ECK installed (notice no APIs ending in *.k8s.elastic.co:

 TASK [Show existing API groups available to us] ******************************** 
ok: [localhost] => {
    "api_groups": [
        "",
        "apiregistration.k8s.io",
        "apps",
        "events.k8s.io",
        "authentication.k8s.io",
        "authorization.k8s.io",
        "autoscaling",
        "batch",
        "certificates.k8s.io",
        "networking.k8s.io",
        "extensions",
        "policy",
        "rbac.authorization.k8s.io",
        "storage.k8s.io",
        "admissionregistration.k8s.io",
        "apiextensions.k8s.io",
        "scheduling.k8s.io",
        "coordination.k8s.io",
        "node.k8s.io",
        "discovery.k8s.io",
        "flowcontrol.apiserver.k8s.io",
        "apps.openshift.io",
        "authorization.openshift.io",
        "build.openshift.io",
        "image.openshift.io",
        "oauth.openshift.io",
        "project.openshift.io",
        "quota.openshift.io",
        "route.openshift.io",
        "security.openshift.io",
        "template.openshift.io",
        "user.openshift.io",
        "packages.operators.coreos.com",
        "config.openshift.io",
        "operator.openshift.io",
        "apiserver.openshift.io",
        "autoscaling.openshift.io",
        "cloudcredential.openshift.io",
        "console.openshift.io",
        "imageregistry.operator.openshift.io",
        "ingress.operator.openshift.io",
        "k8s.cni.cncf.io",
        "machineconfiguration.openshift.io",
        "monitoring.coreos.com",
        "network.openshift.io",
        "network.operator.openshift.io",
        "operators.coreos.com",
        "samples.operator.openshift.io",
        "security.internal.openshift.io",
        "snapshot.storage.k8s.io",
        "tuned.openshift.io",
        "certmanager.k8s.io",
        "controlplane.operator.openshift.io",
        "integreatly.org",
        "interconnectedcloud.github.io",
        "metal3.io",
        "migration.k8s.io",
        "whereabouts.cni.cncf.io",
        "helm.openshift.io",
        "infra.watch",
        "loki.openshift.io",
        "machine.openshift.io",
        "smartgateway.infra.watch",
        "metrics.k8s.io"
    ]
}

Adding the has_elasticsearch_api checks to the component_elasticsearch playbook and in the plays for deploying the eventing SGs has resulted in a successful playbook execution even when ElasticSearch is enabled in the backends parameter of a ServiceTelemetry object, and events SGs are requested in the clouds parameter.

Test ServiceTelemetry manifest:

apiVersion: infra.watch/v1beta1
kind: ServiceTelemetry
metadata:
  name: default
  namespace: service-telemetry
spec:
  alerting:
    alertmanager:
      receivers:
        snmpTraps:
          enabled: true
      storage:
        strategy: persistent
  backends:
    events:
      elasticsearch:
        enabled: true
        storage:
          strategy: persistent
    logs:
      loki:
        enabled: false
        flavor: 1x.extra-small
        replicationFactor: 1
        storage:
          objectStorageSecret: test
          storageClass: standard
    metrics:
      prometheus:
        enabled: true
        storage:
          strategy: persistent
  clouds:
  - events:
      collectors:
      - collectorType: collectd
        subscriptionAddress: collectd/cloudops06-notify
      - collectorType: ceilometer
        subscriptionAddress: anycast/ceilometer/cloudops06-event.sample
    metrics:
      collectors:
      - collectorType: collectd
        subscriptionAddress: collectd/cloudops06-telemetry
      - collectorType: sensubility
        subscriptionAddress: sensubility/cloudops06-telemetry
      - collectorType: ceilometer
        subscriptionAddress: anycast/ceilometer/cloudops06-metering.sample
    name: cops06
  cloudsRemoveOnMissing: true
  highAvailability:
    enabled: false

This results in a successful deployment.

--------------------------- Ansible Task Status Event StdOut  -----------------

PLAY RECAP *********************************************************************
localhost                  : ok=58   changed=10   unreachable=0    failed=0    skipped=22   rescued=0    ignored=0

leifmadsen · 2021-10-08T17:54:42Z

Adding in ElasticSearch and forced an update to the ServiceTelemetry object to kick off a new reconciliation loop. Everything seems to be coming up as I would expect now as well. I see ES and I see the event SGs!

leifmadsen · 2021-10-08T17:54:53Z

test

csibbitt

Assuming we are not supposed to be supporting an external ES, then this is fine for now

csibbitt · 2021-10-12T14:25:01Z

roles/servicetelemetry/tasks/component_clouds.yml

@@ -49,6 +49,11 @@
    - this_cloud.events is defined
    - this_cloud.events.collectors is defined
    - this_cloud.events is iterable
+  # TODO: it should be possible to deploy the eventing SGs when ElasticSearch
+  # is not available, but currently the template for smartgateway_events
+  # expects to have information about a local ES instance on cluster.


Do we currently have any way to configure the eventing SG to use an external (non-STF-provisioned) elasticsearch instance? (I don't think we do, but) If so, this seems like it could interfere.

We don't have a way currently to provide which ES to connect to. The address is populated for us when ElasticSearch is enabled. In the case where the CRD is not available to request an ES instance, then the event SGs are not created.

In the future as part of work I have planned, it should be possible to schedule SGs without a backing store, and we'll need to extend the configuration of the clouds parameter so that an external address for the storage domain can be provided.

Currently if you want to connect an SG to an external data source, you will need to create the SG directly via SGO.

Okay thanks, that's what I thought

leifmadsen · 2021-10-12T14:47:51Z

Assuming we are not supposed to be supporting an external ES, then this is fine for now

Merging this as there is currently no support for defining an external ES instance from STO.

Verify the ElasticSearch API exists before executing changes on the cluster that rely on Elastic Cloud on Kubernetes Operator to exist. Clear out the fact cache before looking up the available api_groups on the cluster, otherwise adding/removing the *.k8s.elastic.co can cause issues due to fact caching in Ansible. Wrap logic that requests an ElasticSearch instance and the eventing Smart Gateways to verify the 'has_elasticsearch_api' parameter is set to a value of true, otherwise, skip those steps. Closes: rhbz#1959166 Cherry picked from commit f481622

leifmadsen added needs-backport 1.3 labels Oct 8, 2021

leifmadsen requested review from csibbitt and pleimer October 8, 2021 17:39

csibbitt approved these changes Oct 12, 2021

View reviewed changes

leifmadsen merged commit f481622 into master Oct 12, 2021

leifmadsen deleted the lmadsen-stf-334 branch October 12, 2021 14:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Verify ElasticSearch API exists #275

Verify ElasticSearch API exists #275

leifmadsen commented Oct 8, 2021

leifmadsen commented Oct 8, 2021 •

edited

leifmadsen commented Oct 8, 2021

leifmadsen commented Oct 8, 2021

csibbitt left a comment

csibbitt Oct 12, 2021

leifmadsen Oct 12, 2021

csibbitt Oct 12, 2021

leifmadsen commented Oct 12, 2021

Verify ElasticSearch API exists #275

Verify ElasticSearch API exists #275

Conversation

leifmadsen commented Oct 8, 2021

leifmadsen commented Oct 8, 2021 • edited

leifmadsen commented Oct 8, 2021

leifmadsen commented Oct 8, 2021

csibbitt left a comment

Choose a reason for hiding this comment

csibbitt Oct 12, 2021

Choose a reason for hiding this comment

leifmadsen Oct 12, 2021

Choose a reason for hiding this comment

csibbitt Oct 12, 2021

Choose a reason for hiding this comment

leifmadsen commented Oct 12, 2021

leifmadsen commented Oct 8, 2021 •

edited