Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify ElasticSearch API exists #275

Merged
merged 1 commit into from Oct 12, 2021
Merged

Verify ElasticSearch API exists #275

merged 1 commit into from Oct 12, 2021

Conversation

leifmadsen
Copy link
Member

Verify the ElasticSearch API exists before executing changes on the
cluster that rely on Elastic Cloud on Kubernetes Operator to exist.

Clear out the fact cache before looking up the available api_groups on
the cluster, otherwise adding/removing the *.k8s.elastic.co can cause
issues due to fact caching in Ansible.

Wrap logic that requests an ElasticSearch instance and the eventing
Smart Gateways to verify the 'has_elasticsearch_api' parameter is set to
a value of true, otherwise, skip those steps.

Closes: rhbz#1959166

Verify the ElasticSearch API exists before executing changes on the
cluster that rely on Elastic Cloud on Kubernetes Operator to exist.

Clear out the fact cache before looking up the available api_groups on
the cluster, otherwise adding/removing the *.k8s.elastic.co can cause
issues due to fact caching in Ansible.

Wrap logic that requests an ElasticSearch instance and the eventing
Smart Gateways to verify the 'has_elasticsearch_api' parameter is set to
a value of true, otherwise, skip those steps.

Closes: rhbz#1959166
@leifmadsen
Copy link
Member Author

leifmadsen commented Oct 8, 2021

I ran through this and found an issue with caching of the facts, where if you removed (or I guess added) the ECK Operator after the STO had started, the values in the api_groups would be invalid due to caching. I added a meta action to clear the fact cache, and this seems to have done the trick.

Example output of a cluster that doesn't have ECK installed (notice no APIs ending in *.k8s.elastic.co:

 TASK [Show existing API groups available to us] ******************************** 
ok: [localhost] => {
    "api_groups": [
        "",
        "apiregistration.k8s.io",
        "apps",
        "events.k8s.io",
        "authentication.k8s.io",
        "authorization.k8s.io",
        "autoscaling",
        "batch",
        "certificates.k8s.io",
        "networking.k8s.io",
        "extensions",
        "policy",
        "rbac.authorization.k8s.io",
        "storage.k8s.io",
        "admissionregistration.k8s.io",
        "apiextensions.k8s.io",
        "scheduling.k8s.io",
        "coordination.k8s.io",
        "node.k8s.io",
        "discovery.k8s.io",
        "flowcontrol.apiserver.k8s.io",
        "apps.openshift.io",
        "authorization.openshift.io",
        "build.openshift.io",
        "image.openshift.io",
        "oauth.openshift.io",
        "project.openshift.io",
        "quota.openshift.io",
        "route.openshift.io",
        "security.openshift.io",
        "template.openshift.io",
        "user.openshift.io",
        "packages.operators.coreos.com",
        "config.openshift.io",
        "operator.openshift.io",
        "apiserver.openshift.io",
        "autoscaling.openshift.io",
        "cloudcredential.openshift.io",
        "console.openshift.io",
        "imageregistry.operator.openshift.io",
        "ingress.operator.openshift.io",
        "k8s.cni.cncf.io",
        "machineconfiguration.openshift.io",
        "monitoring.coreos.com",
        "network.openshift.io",
        "network.operator.openshift.io",
        "operators.coreos.com",
        "samples.operator.openshift.io",
        "security.internal.openshift.io",
        "snapshot.storage.k8s.io",
        "tuned.openshift.io",
        "certmanager.k8s.io",
        "controlplane.operator.openshift.io",
        "integreatly.org",
        "interconnectedcloud.github.io",
        "metal3.io",
        "migration.k8s.io",
        "whereabouts.cni.cncf.io",
        "helm.openshift.io",
        "infra.watch",
        "loki.openshift.io",
        "machine.openshift.io",
        "smartgateway.infra.watch",
        "metrics.k8s.io"
    ]
}

Adding the has_elasticsearch_api checks to the component_elasticsearch playbook and in the plays for deploying the eventing SGs has resulted in a successful playbook execution even when ElasticSearch is enabled in the backends parameter of a ServiceTelemetry object, and events SGs are requested in the clouds parameter.

Test ServiceTelemetry manifest:

apiVersion: infra.watch/v1beta1
kind: ServiceTelemetry
metadata:
  name: default
  namespace: service-telemetry
spec:
  alerting:
    alertmanager:
      receivers:
        snmpTraps:
          enabled: true
      storage:
        strategy: persistent
  backends:
    events:
      elasticsearch:
        enabled: true
        storage:
          strategy: persistent
    logs:
      loki:
        enabled: false
        flavor: 1x.extra-small
        replicationFactor: 1
        storage:
          objectStorageSecret: test
          storageClass: standard
    metrics:
      prometheus:
        enabled: true
        storage:
          strategy: persistent
  clouds:
  - events:
      collectors:
      - collectorType: collectd
        subscriptionAddress: collectd/cloudops06-notify
      - collectorType: ceilometer
        subscriptionAddress: anycast/ceilometer/cloudops06-event.sample
    metrics:
      collectors:
      - collectorType: collectd
        subscriptionAddress: collectd/cloudops06-telemetry
      - collectorType: sensubility
        subscriptionAddress: sensubility/cloudops06-telemetry
      - collectorType: ceilometer
        subscriptionAddress: anycast/ceilometer/cloudops06-metering.sample
    name: cops06
  cloudsRemoveOnMissing: true
  highAvailability:
    enabled: false

This results in a successful deployment.

--------------------------- Ansible Task Status Event StdOut  -----------------

PLAY RECAP *********************************************************************
localhost                  : ok=58   changed=10   unreachable=0    failed=0    skipped=22   rescued=0    ignored=0   

@leifmadsen
Copy link
Member Author

Adding in ElasticSearch and forced an update to the ServiceTelemetry object to kick off a new reconciliation loop. Everything seems to be coming up as I would expect now as well. I see ES and I see the event SGs!

@leifmadsen
Copy link
Member Author

test

Copy link
Collaborator

@csibbitt csibbitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming we are not supposed to be supporting an external ES, then this is fine for now

@@ -49,6 +49,11 @@
- this_cloud.events is defined
- this_cloud.events.collectors is defined
- this_cloud.events is iterable
# TODO: it should be possible to deploy the eventing SGs when ElasticSearch
# is not available, but currently the template for smartgateway_events
# expects to have information about a local ES instance on cluster.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we currently have any way to configure the eventing SG to use an external (non-STF-provisioned) elasticsearch instance? (I don't think we do, but) If so, this seems like it could interfere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a way currently to provide which ES to connect to. The address is populated for us when ElasticSearch is enabled. In the case where the CRD is not available to request an ES instance, then the event SGs are not created.

In the future as part of work I have planned, it should be possible to schedule SGs without a backing store, and we'll need to extend the configuration of the clouds parameter so that an external address for the storage domain can be provided.

Currently if you want to connect an SG to an external data source, you will need to create the SG directly via SGO.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay thanks, that's what I thought

@leifmadsen
Copy link
Member Author

Assuming we are not supposed to be supporting an external ES, then this is fine for now

Merging this as there is currently no support for defining an external ES instance from STO.

@leifmadsen leifmadsen merged commit f481622 into master Oct 12, 2021
@leifmadsen leifmadsen deleted the lmadsen-stf-334 branch October 12, 2021 14:53
leifmadsen added a commit that referenced this pull request Oct 12, 2021
Verify the ElasticSearch API exists before executing changes on the
cluster that rely on Elastic Cloud on Kubernetes Operator to exist.

Clear out the fact cache before looking up the available api_groups on
the cluster, otherwise adding/removing the *.k8s.elastic.co can cause
issues due to fact caching in Ansible.

Wrap logic that requests an ElasticSearch instance and the eventing
Smart Gateways to verify the 'has_elasticsearch_api' parameter is set to
a value of true, otherwise, skip those steps.

Closes: rhbz#1959166
Cherry picked from commit f481622
leifmadsen added a commit that referenced this pull request Oct 12, 2021
Verify the ElasticSearch API exists before executing changes on the
cluster that rely on Elastic Cloud on Kubernetes Operator to exist.

Clear out the fact cache before looking up the available api_groups on
the cluster, otherwise adding/removing the *.k8s.elastic.co can cause
issues due to fact caching in Ansible.

Wrap logic that requests an ElasticSearch instance and the eventing
Smart Gateways to verify the 'has_elasticsearch_api' parameter is set to
a value of true, otherwise, skip those steps.

Closes: rhbz#1959166
Cherry picked from commit f481622
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

2 participants