Unable to connect to the ES when using K8s secrets #471

dushyant03 · 2019-06-18T11:56:27Z

HI,

I have setup the ES cleaner cronjob for an externally hosted Elastic search cluster, and i am trying to use secrets stored in kubernetes to connect to the ES cluster.

Below is my yaml

kind: CronJob
metadata:
  name: es-jaeger-cleaner
  namespace: monitoring
  labels:
    app: es-jaeger-cleaner
    env: staging
spec:
  # every 1 PM UTC-0
  schedule: "0 13 * * *"
  jobTemplate:
    metadata:
      labels:
        app: es-jaeger-cleaner
        env: staging
    spec:
      template:
        metadata:
          labels:
            app: es-jaeger-cleaner
            env: staging
        spec:
          containers:
          - name: es-jaeger-cleaner
            image: jaegertracing/jaeger-es-index-cleaner:latest
            # clean up ES data indices older than 7 days from now
            args: ["7", "<https://<esHOST:port>"]
            env:
            - name: TIMEOUT
              value: "300"
            - name: ES_USERNAME
              value: "<username>"
            - name: ES_PASSWORD
              valueFrom:
                secretKeyRef:
                  key: ES_PASSWORD
                  name: <secret_name>
          restartPolicy: OnFailure

And it somehow does not work and i get the below error

File "/es-index-cleaner/esCleaner.py", line 31, in main
    client = elasticsearch.Elasticsearch(sys.argv[2:], http_auth=(username, password))
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/client/__init__.py", line 206, in __init__
    self.transport = transport_class(_normalize_hosts(hosts), **kwargs)
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/transport.py", line 127, in __init__
    self.set_connections(hosts)
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/transport.py", line 177, in set_connections
    connections = list(zip(connections, hosts))
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/transport.py", line 173, in _create_connection
    return self.connection_class(**kwargs)
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/connection/http_urllib3.py", line 106, in __init__
    self.headers.update(urllib3.make_headers(basic_auth=http_auth))
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/request.py", line 65, in make_headers
    b64encode(b(basic_auth)).decode('utf-8')
  File "/usr/local/lib/python3.7/site-packages/urllib3/packages/six.py", line 620, in b
    return s.encode("latin-1")
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 9-13: ordinal not in range(256)

PS - If i hardcode the username and password, it works fine.
Please let me know if i am doing something wrong

The text was updated successfully, but these errors were encountered:

pavolloffay · 2019-06-18T12:32:18Z

Is the secret mounted into the cleaner job?

dushyant03 · 2019-06-18T12:41:16Z

Umm its not, i think that is the issue. Let me try

dushyant03 · 2019-06-18T14:39:50Z

@pavolloffay

So i mounted the secret on the cleaner Cron, and somehow its not picking it up.
I do get a 401 now, but it does not seem to be using the mounted secrets.

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: es-jaeger-cleaner
  namespace: monitoring
  labels:
    app: es-jaeger-cleaner
    env: staging
spec:
  # every 1 PM UTC-0
  schedule: "0 13 * * *"
  jobTemplate:
    metadata:
      labels:
        app: es-jaeger-cleaner
        env: staging
    spec:
      template:
        metadata:
          labels:
            app: es-jaeger-cleaner
            env: staging
        spec:
          containers:
          - name: es-jaeger-cleaner
            image: jaegertracing/jaeger-es-index-cleaner:latest
            # clean up ES data indices older than 7 days from now
            env:
            - name: TIMEOUT
              value: "300"
            - name: ES_USERNAME
              value: "<username>"
            - name: ES_PASSWORD
               value: /var/jaeger-es-secrets/password
            volumeMounts:
            - name: es-secrets
              mountPath: "/var/jaeger-es-secrets"
              readOnly: true
            args: ["7", "https://<ES endpoint:port>"]
          restartPolicy: Never
          volumes:
            - name: es-secrets
               secret:
                 secretName: jaeger-es-secrets

pavolloffay · 2019-06-18T15:08:51Z

A couple of remarks. Changing k8s objects manually is not supported - all changes should be done via jaeger CR.

JaegerEsIndexCleanerSpec injects the secret defined in the storage spec. And should create env variables out of it.
if the storage defines options --es.username/password it adds ES_USERNAME/PASSWORD as env vars.

jaeger-operator/pkg/cronjob/es_index_cleaner.go

Line 27 in eaa4d52

envFromSource = append(envFromSource, corev1.EnvFromSource{

secat · 2019-06-25T14:13:16Z

When using an already provisioned elasticsearch cluster with basic authentication and TLS enabled, what is the key/map format of the secret that we need to provide to Jaeger in the storage section?

    storage:
      type: elasticsearch
      options:
        es:
          server-urls: http://elasticsearch:9200
      secretName: jaeger-secrets

objectiser · 2019-06-25T14:47:21Z

As shown here, using the keys ES_USERNAME and ES_PASSWORD.

secat · 2019-06-25T15:48:13Z

@objectiser thank you!

jpkrohling · 2019-07-16T13:06:33Z

@kevinearls, @jkandasa: would one of you be able to test the use case mentioned by @dushyant03? It should be possible to use an external ES, with username and password coming from secrets.

kevinearls · 2019-07-16T15:44:35Z

@jpkrohling Not in the short term, I have a few too many other things queued up. @jkandasa do you have time?

secat · 2019-08-12T12:04:48Z

@jpkrohling I am using and external ES (from the Elastic Cloud on Kubernets (ECK) Operator) with self-signed certificates and username and password from secrets.

The generated resources by the operator seems wrong and don't works

$ kubectl --namespace=observability get po -l job-name=local-jaeger-tracing-es-index-cleaner-1565567700                                                                                                                                       NAME                                                     READY   STATUS   RESTARTS   AGE
local-jaeger-tracing-es-index-cleaner-1565567700-5cmzh   0/1     Error    0          12h
local-jaeger-tracing-es-index-cleaner-1565567700-5qjhl   0/1     Error    0          12h
local-jaeger-tracing-es-index-cleaner-1565567700-7cgms   0/1     Error    0          12h
local-jaeger-tracing-es-index-cleaner-1565567700-d2rqp   0/1     Error    0          12h
local-jaeger-tracing-es-index-cleaner-1565567700-df7vf   0/1     Error    0          12h
local-jaeger-tracing-es-index-cleaner-1565567700-r8hfw   0/1     Error    0          12h
local-jaeger-tracing-es-index-cleaner-1565567700-w92dw   0/1     Error    0          11h

Here is the generated cronjob resource:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  creationTimestamp: "2019-08-08T13:26:27Z"
  labels:
    app: jaeger
    app.kubernetes.io/component: cronjob-es-index-cleaner
    app.kubernetes.io/instance: local-jaeger-tracing
    app.kubernetes.io/managed-by: jaeger-operator
    app.kubernetes.io/name: local-jaeger-tracing-es-index-cleaner
    app.kubernetes.io/part-of: jaeger
  name: local-jaeger-tracing-es-index-cleaner
  namespace: observability
  ownerReferences:
  - apiVersion: jaegertracing.io/v1
    controller: true
    kind: Jaeger
    name: local-jaeger-tracing
    uid: fe49d753-b9df-11e9-9c3f-00155d25521e
  resourceVersion: "1815932"
  selfLink: /apis/batch/v1beta1/namespaces/observability/cronjobs/local-jaeger-tracing-es-index-cleaner
  uid: 202af614-b9e0-11e9-9c3f-00155d25521e
spec:
  concurrencyPolicy: Allow
  failedJobsHistoryLimit: 1
  jobTemplate:
    metadata:
      creationTimestamp: null
    spec:
      parallelism: 1
      template:
        metadata:
          annotations:
            prometheus.io/scrape: "false"
            sidecar.istio.io/inject: "false"
          creationTimestamp: null
        spec:
          containers:
          - args:
            - "7"
            - https://invited-guppy-elasticsearch-local-es-http.observability.svc:9200
            envFrom:
            - secretRef:
                name: tracingstack-storage-45aec255
            image: jaegertracing/jaeger-es-index-cleaner
            imagePullPolicy: Always
            name: local-jaeger-tracing-es-index-cleaner
            resources: {}
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
          dnsPolicy: ClusterFirst
          restartPolicy: Never
          schedulerName: default-scheduler
          securityContext: {}
          terminationGracePeriodSeconds: 30
  schedule: 55 23 * * *
  successfulJobsHistoryLimit: 3
  suspend: false
status:
  lastScheduleTime: "2019-08-11T23:55:00Z"

Here are the logs from a job:

$ kubectl --namespace=observability logs local-jaeger-tracing-es-index-cleaner-1565567700-5cmzh                                                                                                                                               Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 345, in _make_request
    self._validate_conn(conn)
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 844, in _validate_conn
    conn.connect()
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 326, in connect
    ssl_context=context)
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/ssl_.py", line 325, in ssl_wrap_socket
    return context.wrap_socket(sock, server_hostname=server_hostname)
  File "/usr/local/lib/python3.7/ssl.py", line 423, in wrap_socket
    session=session
  File "/usr/local/lib/python3.7/ssl.py", line 870, in _create
    self.do_handshake()
  File "/usr/local/lib/python3.7/ssl.py", line 1139, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1076)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/connection/http_urllib3.py", line 232, in perform_request
    method, url, body, retries=Retry(False), headers=request_headers, **kw
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 630, in urlopen
    raise SSLError(e)
urllib3.exceptions.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1076)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/curator/utils.py", line 643, in get_indices
    index='_all', params={'expand_wildcards': 'open,closed'})
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/client/utils.py", line 84, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/client/indices.py", line 643, in get_settings
    "GET", _make_path(index, "_settings", name), params=params
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/transport.py", line 353, in perform_request
    timeout=timeout,
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/connection/http_urllib3.py", line 241, in perform_request
    raise SSLError("N/A", str(e), e)
elasticsearch.exceptions.SSLError: ConnectionError([SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1076)) caused by: SSLError([SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1076))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/es-index-cleaner/esCleaner.py", line 106, in <module>
    main()
  File "/es-index-cleaner/esCleaner.py", line 40, in main
    ilo = curator.IndexList(client)
  File "/usr/local/lib/python3.7/site-packages/curator/indexlist.py", line 31, in __init__
    self.__get_indices()
  File "/usr/local/lib/python3.7/site-packages/curator/indexlist.py", line 66, in __get_indices
    self.all_indices = utils.get_indices(self.client)
  File "/usr/local/lib/python3.7/site-packages/curator/utils.py", line 653, in get_indices
    raise exceptions.FailedExecution('Failed to get indices. Error: {0}'.format(e))
curator.exceptions.FailedExecution: Failed to get indices. Error: ConnectionError([SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1076)) caused by: SSLError([SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1076))

Here is the jaeger CR:

apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  creationTimestamp: "2019-08-08T13:25:30Z"
  generation: 2
  name: local-jaeger-tracing
  namespace: observability
  resourceVersion: "1263119"
  selfLink: /apis/jaegertracing.io/v1/namespaces/observability/jaegers/local-jaeger-tracing
  uid: fe49d753-b9df-11e9-9c3f-00155d25521e
spec:
  agent:
    image: jaegertracing/jaeger-agent:1.13
    options: null
    resources: {}
    strategy: Sidecar
    volumeMounts: null
    volumes: null
  allInOne:
    image: docker.io/jaegertracing/all-in-one:1.13.1
    options:
      collector.zipkin.http-port: "9411"
    resources: {}
    volumeMounts: null
    volumes: null
  collector:
    image: ""
    options: null
    replicas: null
    resources: {}
    size: 0
    volumeMounts: null
    volumes: null
  ingester:
    image: ""
    options: null
    replicas: null
    resources: {}
    size: 0
    volumeMounts: null
    volumes: null
  ingress:
    enabled: false
    resources: {}
    security: none
    volumeMounts: null
    volumes: null
  query:
    image: ""
    options: null
    replicas: null
    resources: {}
    size: 0
    volumeMounts: null
    volumes: null
  resources: {}
  sampling:
    options: {}
  storage:
    cassandraCreateSchema:
      datacenter: ""
      enabled: null
      image: ""
      mode: ""
    dependencies:
      cassandraClientAuthEnabled: false
      elasticsearchClientNodeOnly: false
      elasticsearchNodesWanOnly: false
      enabled: true
      image: jaegertracing/spark-dependencies
      javaOpts: ""
      schedule: 55 23 * * *
      sparkMaster: ""
    elasticsearch:
      image: ""
      nodeCount: 1
      redundancyPolicy: ZeroRedundancy
      resources: {}
      storage: {}
    esIndexCleaner:
      enabled: true
      image: jaegertracing/jaeger-es-index-cleaner
      numberOfDays: 7
      schedule: 55 23 * * *
    esRollover:
      conditions: ""
      image: jaegertracing/jaeger-es-rollover
      readTTL: ""
      schedule: '*/30 * * * *'
    options:
      es.server-urls: https://invited-guppy-elasticsearch-local-es-http.observability.svc:9200
      es.tls.ca: /etc/ssl/certs/tls.crt
    secretName: tracingstack-storage-45aec255
    type: elasticsearch
  strategy: allInOne
  ui:
    options: {}
  volumeMounts:
  - mountPath: /etc/ssl/certs
    name: es-tls
  volumes:
  - name: es-tls
    secret:
      defaultMode: 420
      secretName: invited-guppy-elasticsearch-local-es-http-certs-public
status: {}

jpkrohling · 2019-08-12T12:09:30Z

cc @pavolloffay

pavolloffay · 2019-08-12T13:01:22Z

@jpkrohling I am using and external ES (from the Elastic Cloud on Kubernets (ECK) Operator) with self-signed certificates and username and password from secrets.

I guess you will have to specify TLS options for the index cleaner https://github.com/jaegertracing/jaeger/blob/master/plugin/storage/es/esCleaner.py#L22-L25 which is not supported at the moment.

Index cleaner CR exposes only a few options https://godoc.org/github.com/jaegertracing/jaeger-operator/pkg/apis/jaegertracing/v1#JaegerEsIndexCleanerSpec. Other configuration is derived from storage flags e.g. --es.username or --es.password. The TLS flags eg. --es.tls.* should be used to set TLS* env props in the index cleaner and rollover jobs.

pavolloffay · 2019-08-12T13:02:12Z

@dushyant03 were you able to connect to ES using user/pass?

pavolloffay · 2019-08-12T13:10:26Z

I have crated #592 for the TLS configuration with ES jobs.

malz · 2019-08-22T06:31:35Z

I'm running into a similar issue also using the EKS operator. I've mounted the secret to the cronjob spec and supplied the arguments through the ENV setting but it still appears to be throwing cert errors.

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  creationTimestamp: "2019-08-14T04:56:43Z"
  labels:
    app: jaeger
    app.kubernetes.io/component: cronjob-es-index-cleaner
    app.kubernetes.io/instance: jaeger
    app.kubernetes.io/managed-by: jaeger-operator
    app.kubernetes.io/name: jaeger-es-index-cleaner
    app.kubernetes.io/part-of: jaeger
  name: jaeger-es-index-cleaner
  namespace: observability
  ownerReferences:
  - apiVersion: jaegertracing.io/v1
    controller: true
    kind: Jaeger
    name: jaeger
    uid: e8f919ee-be4f-11e9-83f3-42010a92006a
  resourceVersion: "208844629"
  selfLink: /apis/batch/v1beta1/namespaces/observability/cronjobs/jaeger-es-index-cleaner
  uid: e93ae0b2-be4f-11e9-83f3-42010a92006a
spec:
  concurrencyPolicy: Allow
  failedJobsHistoryLimit: 1
  jobTemplate:
    metadata:
      creationTimestamp: null
    spec:
      parallelism: 1
      template:
        metadata:
          annotations:
            prometheus.io/scrape: "false"
            sidecar.istio.io/inject: "false"
          creationTimestamp: null
        spec:
          containers:
          - args:
            - "7"
            - https://store-es-http:9200
            env:
            - name: ES_TLS
              value: "true"
            - name: ES_TLS_CA
              value: /etc/ssl/certs/tls.crt
            envFrom:
            - secretRef:
                name: jaeger-secrets
            image: jaegertracing/jaeger-es-index-cleaner
            imagePullPolicy: Always
            name: jaeger-es-index-cleaner
            resources: {}
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
            volumeMounts:
            - mountPath: /etc/ssl/certs
              name: es-tls
          dnsPolicy: ClusterFirst
          restartPolicy: Never
          schedulerName: default-scheduler
          securityContext: {}
          terminationGracePeriodSeconds: 30
          volumes:
          - name: es-tls
            secret:
              defaultMode: 420
              secretName: store-es-http-certs-public
  schedule: 55 23 * * *
  successfulJobsHistoryLimit: 3
  suspend: false
status:
  lastScheduleTime: "2019-08-21T23:55:00Z"

pavolloffay · 2019-08-22T08:55:43Z

@malz remember that Jaeger operator will revert the changes manually done on the objects (you can undeploy operator to avoid it). I think you have to also use client cert and the key https://github.com/jaegertracing/jaeger/blob/master/plugin/storage/es/esCleaner.py#L22-L25

malz · 2019-08-23T01:53:41Z

@malz remember that Jaeger operator will revert the changes manually done on the objects (you can undeploy operator to avoid it). I think you have to also use client cert and the key https://github.com/jaegertracing/jaeger/blob/master/plugin/storage/es/esCleaner.py#L22-L25

Of course! I've tried deploying the pod using the Client Cert and Key however I'm getting issues with every combination of certs produced by the ES Operator. Not sure if this is necessarily a problem with Jaeger.

pavolloffay · 2019-08-23T09:30:04Z

@malz could you please provide a link to ES operator you are using? And if you make it work the ES CR and jaeger CR you used. It might be helpful for other folks.

malz · 2019-08-25T23:38:40Z

@malz could you please provide a link to ES operator you are using? And if you make it work the ES CR and jaeger CR you used. It might be helpful for other folks.

I'm using the official Elastic Operator: https://github.com/elastic/cloud-on-k8s version 0.0.9, running on GKE. Jaeger collector and query run perfectly but the index cleaning jobs aren't connecting.

pavolloffay · 2019-09-27T07:07:54Z

PR #614 adds TLS options to ES jobs.

If the collector and query are able to connect then the cron jobs should be able too - if the configuration is correct.

FelixRodriguezJara · 2019-11-08T13:31:27Z

Hello, I am having exactly the same issue. I have tried many things, including recommendations in this post, but I´m not able of seeing what I´m doing wrong. Ingester and Query can connect to ES with no issues, however, the cronjobs fails because of the error @secat was having: "self signed certificate in certificate chain". I am using the Jaeger Operator as well and my storage definition is:

storage:
type: elasticsearch
    options:
      es:
        server-urls: https://es-deployment-es-http.logging.svc:9200
        tls.ca: /es-certs/ca.crt
        tls.key: /es-certs/tls.key
        tls.cert: /es-certs/tls.crt
        tls:
          skip-host-verify: true
    secretName: es-cred

es-cred contains elastic credentials:

ES_PASSWORD: 24 bytes
ES_USERNAME: 7 bytes

I am mounting es certificates through a secret on top of /es-certs.

This is the cronjob created:

Name:                       pubsub-streaming-es-index-cleaner
Namespace:                  tracing
Labels:                     app=jaeger
                            app.kubernetes.io/component=cronjob-es-index-cleaner
                            app.kubernetes.io/instance=pubsub-streaming
                            app.kubernetes.io/managed-by=jaeger-operator
                            app.kubernetes.io/name=pubsub-streaming-es-index-cleaner
                            app.kubernetes.io/part-of=jaeger
Annotations:                <none>
Schedule:                   55 23 * * *
Concurrency Policy:         Allow
Suspend:                    False
Starting Deadline Seconds:  <unset>
Selector:                   <unset>
Parallelism:                1
Completions:                <unset>
Pod Template:
  Labels:           <none>
  Annotations:      linkerd.io/inject: disabled
                    prometheus.io/scrape: false
                    sidecar.istio.io/inject: false
  Service Account:  pubsub-streaming
  Containers:
   pubsub-streaming-es-index-cleaner:
    Image:      jaegertracing/jaeger-es-index-cleaner:1.14.0
    Port:       <none>
    Host Port:  <none>
    Args:
      7
      https://es-deployment-es-http.logging.svc:9200
    Environment Variables from:
      es-cred  Secret  Optional: false
    Environment:
      ES_TLS_CA:    /es-certs/ca.crt
      ES_TLS_CERT:  /es-certs/tls.crt
      ES_TLS_KEY:   /es-certs/tls.key
    Mounts:
      /es-certs from es-certs (rw)
      /pubsub-cred from google-cloud-key (rw)
  Volumes:
   es-certs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  es-deployment-es-http-certs-internal
    Optional:    false
   google-cloud-key:
    Type:            Secret (a volume populated by a Secret)
    SecretName:      local-pubsub-key
    Optional:        false
Last Schedule Time:  <unset>
Active Jobs:         <none>
Events:              <none>

Could you please help me understanding why the cron jobs don´t work with this configuration?

Thank you very much.

pavolloffay · 2019-11-08T15:50:45Z

I will have a look at this shortly. Many people experience problems when using TLS with ES.

In the meantime could you please paste here logs from cronjobs?

maybe we could disable sparkjob if TLS is enabled

FelixRodriguezJara · 2019-11-10T08:48:14Z

Hello @pavolloffay. Thank you very much for your quick reaction!

Yes, I've seen many people having this issue when using the operator and, after reading some posts and threads, I haven't found any workaround or final solution, that´s why I'm calling for help :).

I have tried pretty much every configuration and some of the changes proposed in the operator to make sure it mounts volumes and secrets required on the cronjob containers. Jaeger Ingester, Collector and Query connect to ES with no issue. Based on my understanding, the configuration should be the same as for the cronjobs (Operator gets it from storage options). I see username and password should be used and I'm passing these from a secret to the jobs via operator, together with the certs I've got from elasticsearch. However, the jobs seem to keep failing. Please find below the logs from the jobs, as per your request.

Regarding the point you've made about disabling sparkjob when TLS is enabled, don´t we need spark jobs running every day, same as for the index cleaner?

If we were to run spark and index cleaner cronjobs sepparately (flagging them as enable=false in the operator config), do you have any sample yaml I could use where TLS is used to connect to ES?

Thank you very much!

[root@devvm jaeger-operator]# kubectl logs pubsub-streaming-spark-dependencies-1573343700-2jvwd -n tracing
19/11/10 08:27:11 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/11/10 08:27:12 INFO ElasticsearchDependenciesJob: Running Dependencies job for 2019-11-10T00:00Z, reading from jaeger-span-2019-11-10 index, result storing to jaeger-dependencies-2019-11-10
19/11/10 08:27:13 ERROR NetworkClient: Node [https://10.102.180.152:9200] failed (javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target); no other nodes left - aborting...
Exception in thread "main" org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
	at org.elasticsearch.hadoop.rest.InitializationUtils.discoverClusterInfo(InitializationUtils.java:340)
	at org.elasticsearch.hadoop.rest.RestService.findPartitions(RestService.java:220)
	at org.elasticsearch.spark.rdd.AbstractEsRDD.esPartitions$lzycompute(AbstractEsRDD.scala:79)
	at org.elasticsearch.spark.rdd.AbstractEsRDD.esPartitions(AbstractEsRDD.scala:78)
	at org.elasticsearch.spark.rdd.AbstractEsRDD.getPartitions(AbstractEsRDD.scala:48)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
	at org.apache.spark.Partitioner$$anonfun$4.apply(Partitioner.scala:75)
	at org.apache.spark.Partitioner$$anonfun$4.apply(Partitioner.scala:75)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
	at scala.collection.immutable.List.map(List.scala:285)
	at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:75)
	at org.apache.spark.rdd.RDD$$anonfun$groupBy$1.apply(RDD.scala:691)
	at org.apache.spark.rdd.RDD$$anonfun$groupBy$1.apply(RDD.scala:691)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
	at org.apache.spark.rdd.RDD.groupBy(RDD.scala:690)
	at org.apache.spark.api.java.JavaRDDLike$class.groupBy(JavaRDDLike.scala:243)
	at org.apache.spark.api.java.AbstractJavaRDDLike.groupBy(JavaRDDLike.scala:45)
	at io.jaegertracing.spark.dependencies.elastic.ElasticsearchDependenciesJob.run(ElasticsearchDependenciesJob.java:224)
	at io.jaegertracing.spark.dependencies.elastic.ElasticsearchDependenciesJob.run(ElasticsearchDependenciesJob.java:203)
	at io.jaegertracing.spark.dependencies.DependenciesSparkJob.run(DependenciesSparkJob.java:54)
	at io.jaegertracing.spark.dependencies.DependenciesSparkJob.main(DependenciesSparkJob.java:40)
Caused by: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[https://10.102.180.152:9200]] 
	at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:152)
	at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:424)
	at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:388)
	at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:392)
	at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:168)
	at org.elasticsearch.hadoop.rest.RestClient.mainInfo(RestClient.java:735)
	at org.elasticsearch.hadoop.rest.InitializationUtils.discoverClusterInfo(InitializationUtils.java:330)
	... 33 more

 [root@devvm jaeger-operator]# kubectl logs pubsub-streaming-es-index-cleaner-1573343700-lpngp -n tracing
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 345, in _make_request
    self._validate_conn(conn)
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 844, in _validate_conn
    conn.connect()
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 326, in connect
    ssl_context=context)
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/ssl_.py", line 325, in ssl_wrap_socket
    return context.wrap_socket(sock, server_hostname=server_hostname)
  File "/usr/local/lib/python3.7/ssl.py", line 423, in wrap_socket
    session=session
  File "/usr/local/lib/python3.7/ssl.py", line 870, in _create
    self.do_handshake()
  File "/usr/local/lib/python3.7/ssl.py", line 1139, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1076)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/connection/http_urllib3.py", line 233, in perform_request
    method, url, body, retries=Retry(False), headers=request_headers, **kw
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 630, in urlopen
    raise SSLError(e)
urllib3.exceptions.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1076)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/curator/utils.py", line 643, in get_indices
    index='_all', params={'expand_wildcards': 'open,closed'})
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/client/utils.py", line 84, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/client/indices.py", line 643, in get_settings
    "GET", _make_path(index, "_settings", name), params=params
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/transport.py", line 350, in perform_request
    timeout=timeout,
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/connection/http_urllib3.py", line 242, in perform_request
    raise SSLError("N/A", str(e), e)
elasticsearch.exceptions.SSLError: ConnectionError([SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1076)) caused by: SSLError([SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1076))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/es-index-cleaner/esCleaner.py", line 106, in <module>
    main()
  File "/es-index-cleaner/esCleaner.py", line 40, in main
    ilo = curator.IndexList(client)
  File "/usr/local/lib/python3.7/site-packages/curator/indexlist.py", line 31, in __init__
    self.__get_indices()
  File "/usr/local/lib/python3.7/site-packages/curator/indexlist.py", line 66, in __get_indices
    self.all_indices = utils.get_indices(self.client)
  File "/usr/local/lib/python3.7/site-packages/curator/utils.py", line 653, in get_indices
    raise exceptions.FailedExecution('Failed to get indices. Error: {0}'.format(e))
curator.exceptions.FailedExecution: Failed to get indices. Error: ConnectionError([SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1076)) caused by: SSLError([SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1076))

Thank you very much, @pavolloffay!

pavolloffay · 2019-11-12T10:53:23Z

To my understanding, all failures in this thread are caused when using user/pass with CA cert or tls.skip-host-verify option.

The ES scripts/cronjobs do not support skip verify or using the CA without es.tls=true. First we need to support it there. I will submit a PR for it.

FelixRodriguezJara · 2019-11-12T15:37:18Z

Hello @pavolloffay.

Yes to enabling cronjobs to support skip-host-verify feature. Can you think of a workaround meanwhile?

Regarding es.tls=true, in order to discard a problem with it, I tested creating a cronjob by myself instead of using the operator (I couldn´t pass this argument to the cronjobs via the operator because what we have already discussed). The job fails because of the same reason, with the same logs ("self signed certificate in certificate chain), what makes me think that it might not be the only issue.

I´ve used same credentials and certificate as for the Query, Ingester and Collector, which connect properly.

The cronjob definition I´ve used is the following one:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: pubsub-streaming-es-index-cleaner
  namespace: tracing
  labels:
    app: es-jaeger-cleaner
spec:
  # every 1 PM UTC-0
  schedule: "0 13 * * *"
  jobTemplate:
    metadata:
      labels:
        app: es-jaeger-cleaner
    spec:
      template:
        metadata:
          labels:
            app: es-jaeger-cleaner
        spec:
          containers:
          - name: es-jaeger-cleaner
            image: jaegertracing/jaeger-es-index-cleaner:latest
            # clean up ES data indices older than 7 days from now
            args: ["7", "https://es-deployment-es-http.logging:9200"]
            env:
            - name: ES_USERNAME
              valueFrom:
               secretKeyRef:
                 name: es-cred
                 key: ES_USERNAME
            - name: ES_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: es-cred
                  key: ES_PASSWORD
            - name: ES_TLS
              value: "true"
            - name: TIMEOUT
              value: "300"
            - name: ES_TLS_CA
              value: /etc/ssl/certs/ca.crt
            - name: ES_TLS_CERT
              value: /etc/ssl/certs/tls.crt
            - name: ES_TLS_KEY
              value: /etc/ssl/certs/tls.key
            volumeMounts:
            - name: es-certs
              mountPath: /etc/ssl/certs
          restartPolicy: OnFailure
          volumes:
          - name: es-certs
            secret:
              secretName: es-deployment-es-http-certs-internal

pavolloffay · 2019-11-12T16:52:28Z

The workaround is not to use insecure TLS and rather use mTLS.

I am working on a fix to allow using insecure and CA cert in python scripts.

pavolloffay · 2019-11-12T17:33:52Z

This can be considered as a duplicate of #592

FelixRodriguezJara · 2019-11-13T09:42:45Z

The workaround is not to use insecure TLS and rather use mTLS.

I am working on a fix to allow using insecure and CA cert in python scripts.

Alright, thank you very much @pavolloffay, that's great! From a configuration perspective, do we have to do any changes or cronjobs will pick up the username, password and ca cert from the storage configuration?

pavolloffay · 2019-11-13T11:59:37Z

There won't be any other configuration required.

Here is a simple CR. I will also improve docs and probably write a blog post explaining how people can use Jaeger operator with Elastic CO operator.

# setup an elasticsearch with `make es`
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: simple-prod
spec:
  strategy: production
  storage:
    type: elasticsearch
    options:
      es:
        # Note: This assumes elasticsearch is running in the "default" namespace.
        server-urls: https://quickstart-es-http.default.svc:9200
        use-aliases: true
        tls.ca: /es/secrets/ca.crt
    #        tls.skip-host-verify: true
    #        username: elastic
    #        password: ql7hbmqfzzkrtn6klcdsh8n5
    secretName: jaeger-secret
  volumeMounts:
    - name: secrets
      mountPath: /es/secrets/
      readOnly: true
  volumes:
    - name: secrets
      secret:
        secretName: quickstart-es-http-certs-public

jpkrohling added the Elasticsearch The issues related to Elasticsearch storage label Jul 16, 2019

pavolloffay mentioned this issue Aug 12, 2019

Support TLS for Elasticsearch index cleaner and rollover jobs #592

Closed

1 task

malz mentioned this issue Sep 27, 2019

Export TLS env vars and volumes to index cleaner and rollover jobs #614

Closed

pavolloffay closed this as completed Nov 12, 2019

This was referenced Nov 12, 2019

Support TLS in Elasticsearch cron jobs #758

Merged

Support insecure TLS and only CA cert for Elasticsearch jaegertracing/jaeger#1918

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to connect to the ES when using K8s secrets #471

Unable to connect to the ES when using K8s secrets #471

dushyant03 commented Jun 18, 2019 •

edited

pavolloffay commented Jun 18, 2019

dushyant03 commented Jun 18, 2019

dushyant03 commented Jun 18, 2019

pavolloffay commented Jun 18, 2019

secat commented Jun 25, 2019

objectiser commented Jun 25, 2019

secat commented Jun 25, 2019

jpkrohling commented Jul 16, 2019

kevinearls commented Jul 16, 2019

secat commented Aug 12, 2019 •

edited

jpkrohling commented Aug 12, 2019

pavolloffay commented Aug 12, 2019

pavolloffay commented Aug 12, 2019

pavolloffay commented Aug 12, 2019

malz commented Aug 22, 2019

pavolloffay commented Aug 22, 2019

malz commented Aug 23, 2019 •

edited

pavolloffay commented Aug 23, 2019

malz commented Aug 25, 2019

pavolloffay commented Sep 27, 2019

FelixRodriguezJara commented Nov 8, 2019 •

edited

pavolloffay commented Nov 8, 2019 •

edited

FelixRodriguezJara commented Nov 10, 2019

pavolloffay commented Nov 12, 2019

FelixRodriguezJara commented Nov 12, 2019

pavolloffay commented Nov 12, 2019

pavolloffay commented Nov 12, 2019

FelixRodriguezJara commented Nov 13, 2019

pavolloffay commented Nov 13, 2019

Unable to connect to the ES when using K8s secrets #471

Unable to connect to the ES when using K8s secrets #471

Comments

dushyant03 commented Jun 18, 2019 • edited

pavolloffay commented Jun 18, 2019

dushyant03 commented Jun 18, 2019

dushyant03 commented Jun 18, 2019

pavolloffay commented Jun 18, 2019

secat commented Jun 25, 2019

objectiser commented Jun 25, 2019

secat commented Jun 25, 2019

jpkrohling commented Jul 16, 2019

kevinearls commented Jul 16, 2019

secat commented Aug 12, 2019 • edited

jpkrohling commented Aug 12, 2019

pavolloffay commented Aug 12, 2019

pavolloffay commented Aug 12, 2019

pavolloffay commented Aug 12, 2019

malz commented Aug 22, 2019

pavolloffay commented Aug 22, 2019

malz commented Aug 23, 2019 • edited

pavolloffay commented Aug 23, 2019

malz commented Aug 25, 2019

pavolloffay commented Sep 27, 2019

FelixRodriguezJara commented Nov 8, 2019 • edited

pavolloffay commented Nov 8, 2019 • edited

FelixRodriguezJara commented Nov 10, 2019

pavolloffay commented Nov 12, 2019

FelixRodriguezJara commented Nov 12, 2019

pavolloffay commented Nov 12, 2019

pavolloffay commented Nov 12, 2019

FelixRodriguezJara commented Nov 13, 2019

pavolloffay commented Nov 13, 2019

dushyant03 commented Jun 18, 2019 •

edited

secat commented Aug 12, 2019 •

edited

malz commented Aug 23, 2019 •

edited

FelixRodriguezJara commented Nov 8, 2019 •

edited

pavolloffay commented Nov 8, 2019 •

edited