Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to connect to the ES when using K8s secrets #471

Closed
dushyant03 opened this issue Jun 18, 2019 · 29 comments · Fixed by #758
Closed

Unable to connect to the ES when using K8s secrets #471

dushyant03 opened this issue Jun 18, 2019 · 29 comments · Fixed by #758
Labels
Elasticsearch The issues related to Elasticsearch storage

Comments

@dushyant03
Copy link

dushyant03 commented Jun 18, 2019

HI,

I have setup the ES cleaner cronjob for an externally hosted Elastic search cluster, and i am trying to use secrets stored in kubernetes to connect to the ES cluster.

Below is my yaml

kind: CronJob
metadata:
  name: es-jaeger-cleaner
  namespace: monitoring
  labels:
    app: es-jaeger-cleaner
    env: staging
spec:
  # every 1 PM UTC-0
  schedule: "0 13 * * *"
  jobTemplate:
    metadata:
      labels:
        app: es-jaeger-cleaner
        env: staging
    spec:
      template:
        metadata:
          labels:
            app: es-jaeger-cleaner
            env: staging
        spec:
          containers:
          - name: es-jaeger-cleaner
            image: jaegertracing/jaeger-es-index-cleaner:latest
            # clean up ES data indices older than 7 days from now
            args: ["7", "<https://<esHOST:port>"]
            env:
            - name: TIMEOUT
              value: "300"
            - name: ES_USERNAME
              value: "<username>"
            - name: ES_PASSWORD
              valueFrom:
                secretKeyRef:
                  key: ES_PASSWORD
                  name: <secret_name>
          restartPolicy: OnFailure

And it somehow does not work and i get the below error

File "/es-index-cleaner/esCleaner.py", line 31, in main
    client = elasticsearch.Elasticsearch(sys.argv[2:], http_auth=(username, password))
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/client/__init__.py", line 206, in __init__
    self.transport = transport_class(_normalize_hosts(hosts), **kwargs)
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/transport.py", line 127, in __init__
    self.set_connections(hosts)
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/transport.py", line 177, in set_connections
    connections = list(zip(connections, hosts))
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/transport.py", line 173, in _create_connection
    return self.connection_class(**kwargs)
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/connection/http_urllib3.py", line 106, in __init__
    self.headers.update(urllib3.make_headers(basic_auth=http_auth))
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/request.py", line 65, in make_headers
    b64encode(b(basic_auth)).decode('utf-8')
  File "/usr/local/lib/python3.7/site-packages/urllib3/packages/six.py", line 620, in b
    return s.encode("latin-1")
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 9-13: ordinal not in range(256)

PS - If i hardcode the username and password, it works fine.
Please let me know if i am doing something wrong

@pavolloffay
Copy link
Member

Is the secret mounted into the cleaner job?

@dushyant03
Copy link
Author

Umm its not, i think that is the issue. Let me try

@dushyant03
Copy link
Author

@pavolloffay

So i mounted the secret on the cleaner Cron, and somehow its not picking it up.
I do get a 401 now, but it does not seem to be using the mounted secrets.

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: es-jaeger-cleaner
  namespace: monitoring
  labels:
    app: es-jaeger-cleaner
    env: staging
spec:
  # every 1 PM UTC-0
  schedule: "0 13 * * *"
  jobTemplate:
    metadata:
      labels:
        app: es-jaeger-cleaner
        env: staging
    spec:
      template:
        metadata:
          labels:
            app: es-jaeger-cleaner
            env: staging
        spec:
          containers:
          - name: es-jaeger-cleaner
            image: jaegertracing/jaeger-es-index-cleaner:latest
            # clean up ES data indices older than 7 days from now
            env:
            - name: TIMEOUT
              value: "300"
            - name: ES_USERNAME
              value: "<username>"
            - name: ES_PASSWORD
               value: /var/jaeger-es-secrets/password
            volumeMounts:
            - name: es-secrets
              mountPath: "/var/jaeger-es-secrets"
              readOnly: true
            args: ["7", "https://<ES endpoint:port>"]
          restartPolicy: Never
          volumes:
            - name: es-secrets
               secret:
                 secretName: jaeger-es-secrets 

@pavolloffay
Copy link
Member

A couple of remarks. Changing k8s objects manually is not supported - all changes should be done via jaeger CR.

  • JaegerEsIndexCleanerSpec injects the secret defined in the storage spec. And should create env variables out of it.
  • if the storage defines options --es.username/password it adds ES_USERNAME/PASSWORD as env vars.

envFromSource = append(envFromSource, corev1.EnvFromSource{

@secat
Copy link
Contributor

secat commented Jun 25, 2019

When using an already provisioned elasticsearch cluster with basic authentication and TLS enabled, what is the key/map format of the secret that we need to provide to Jaeger in the storage section?

    storage:
      type: elasticsearch
      options:
        es:
          server-urls: http://elasticsearch:9200
      secretName: jaeger-secrets

@objectiser
Copy link
Contributor

As shown here, using the keys ES_USERNAME and ES_PASSWORD.

@secat
Copy link
Contributor

secat commented Jun 25, 2019

@objectiser thank you!

@jpkrohling jpkrohling added the Elasticsearch The issues related to Elasticsearch storage label Jul 16, 2019
@jpkrohling
Copy link
Contributor

@kevinearls, @jkandasa: would one of you be able to test the use case mentioned by @dushyant03? It should be possible to use an external ES, with username and password coming from secrets.

@kevinearls
Copy link
Contributor

@jpkrohling Not in the short term, I have a few too many other things queued up. @jkandasa do you have time?

@secat
Copy link
Contributor

secat commented Aug 12, 2019

@jpkrohling I am using and external ES (from the Elastic Cloud on Kubernets (ECK) Operator) with self-signed certificates and username and password from secrets.

The generated resources by the operator seems wrong and don't works

$ kubectl --namespace=observability get po -l job-name=local-jaeger-tracing-es-index-cleaner-1565567700                                                                                                                                       NAME                                                     READY   STATUS   RESTARTS   AGE
local-jaeger-tracing-es-index-cleaner-1565567700-5cmzh   0/1     Error    0          12h
local-jaeger-tracing-es-index-cleaner-1565567700-5qjhl   0/1     Error    0          12h
local-jaeger-tracing-es-index-cleaner-1565567700-7cgms   0/1     Error    0          12h
local-jaeger-tracing-es-index-cleaner-1565567700-d2rqp   0/1     Error    0          12h
local-jaeger-tracing-es-index-cleaner-1565567700-df7vf   0/1     Error    0          12h
local-jaeger-tracing-es-index-cleaner-1565567700-r8hfw   0/1     Error    0          12h
local-jaeger-tracing-es-index-cleaner-1565567700-w92dw   0/1     Error    0          11h

Here is the generated cronjob resource:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  creationTimestamp: "2019-08-08T13:26:27Z"
  labels:
    app: jaeger
    app.kubernetes.io/component: cronjob-es-index-cleaner
    app.kubernetes.io/instance: local-jaeger-tracing
    app.kubernetes.io/managed-by: jaeger-operator
    app.kubernetes.io/name: local-jaeger-tracing-es-index-cleaner
    app.kubernetes.io/part-of: jaeger
  name: local-jaeger-tracing-es-index-cleaner
  namespace: observability
  ownerReferences:
  - apiVersion: jaegertracing.io/v1
    controller: true
    kind: Jaeger
    name: local-jaeger-tracing
    uid: fe49d753-b9df-11e9-9c3f-00155d25521e
  resourceVersion: "1815932"
  selfLink: /apis/batch/v1beta1/namespaces/observability/cronjobs/local-jaeger-tracing-es-index-cleaner
  uid: 202af614-b9e0-11e9-9c3f-00155d25521e
spec:
  concurrencyPolicy: Allow
  failedJobsHistoryLimit: 1
  jobTemplate:
    metadata:
      creationTimestamp: null
    spec:
      parallelism: 1
      template:
        metadata:
          annotations:
            prometheus.io/scrape: "false"
            sidecar.istio.io/inject: "false"
          creationTimestamp: null
        spec:
          containers:
          - args:
            - "7"
            - https://invited-guppy-elasticsearch-local-es-http.observability.svc:9200
            envFrom:
            - secretRef:
                name: tracingstack-storage-45aec255
            image: jaegertracing/jaeger-es-index-cleaner
            imagePullPolicy: Always
            name: local-jaeger-tracing-es-index-cleaner
            resources: {}
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
          dnsPolicy: ClusterFirst
          restartPolicy: Never
          schedulerName: default-scheduler
          securityContext: {}
          terminationGracePeriodSeconds: 30
  schedule: 55 23 * * *
  successfulJobsHistoryLimit: 3
  suspend: false
status:
  lastScheduleTime: "2019-08-11T23:55:00Z"

Here are the logs from a job:

$ kubectl --namespace=observability logs local-jaeger-tracing-es-index-cleaner-1565567700-5cmzh                                                                                                                                               Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 345, in _make_request
    self._validate_conn(conn)
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 844, in _validate_conn
    conn.connect()
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 326, in connect
    ssl_context=context)
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/ssl_.py", line 325, in ssl_wrap_socket
    return context.wrap_socket(sock, server_hostname=server_hostname)
  File "/usr/local/lib/python3.7/ssl.py", line 423, in wrap_socket
    session=session
  File "/usr/local/lib/python3.7/ssl.py", line 870, in _create
    self.do_handshake()
  File "/usr/local/lib/python3.7/ssl.py", line 1139, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1076)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/connection/http_urllib3.py", line 232, in perform_request
    method, url, body, retries=Retry(False), headers=request_headers, **kw
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 630, in urlopen
    raise SSLError(e)
urllib3.exceptions.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1076)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/curator/utils.py", line 643, in get_indices
    index='_all', params={'expand_wildcards': 'open,closed'})
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/client/utils.py", line 84, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/client/indices.py", line 643, in get_settings
    "GET", _make_path(index, "_settings", name), params=params
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/transport.py", line 353, in perform_request
    timeout=timeout,
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/connection/http_urllib3.py", line 241, in perform_request
    raise SSLError("N/A", str(e), e)
elasticsearch.exceptions.SSLError: ConnectionError([SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1076)) caused by: SSLError([SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1076))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/es-index-cleaner/esCleaner.py", line 106, in <module>
    main()
  File "/es-index-cleaner/esCleaner.py", line 40, in main
    ilo = curator.IndexList(client)
  File "/usr/local/lib/python3.7/site-packages/curator/indexlist.py", line 31, in __init__
    self.__get_indices()
  File "/usr/local/lib/python3.7/site-packages/curator/indexlist.py", line 66, in __get_indices
    self.all_indices = utils.get_indices(self.client)
  File "/usr/local/lib/python3.7/site-packages/curator/utils.py", line 653, in get_indices
    raise exceptions.FailedExecution('Failed to get indices. Error: {0}'.format(e))
curator.exceptions.FailedExecution: Failed to get indices. Error: ConnectionError([SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1076)) caused by: SSLError([SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1076))

Here is the jaeger CR:

apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  creationTimestamp: "2019-08-08T13:25:30Z"
  generation: 2
  name: local-jaeger-tracing
  namespace: observability
  resourceVersion: "1263119"
  selfLink: /apis/jaegertracing.io/v1/namespaces/observability/jaegers/local-jaeger-tracing
  uid: fe49d753-b9df-11e9-9c3f-00155d25521e
spec:
  agent:
    image: jaegertracing/jaeger-agent:1.13
    options: null
    resources: {}
    strategy: Sidecar
    volumeMounts: null
    volumes: null
  allInOne:
    image: docker.io/jaegertracing/all-in-one:1.13.1
    options:
      collector.zipkin.http-port: "9411"
    resources: {}
    volumeMounts: null
    volumes: null
  collector:
    image: ""
    options: null
    replicas: null
    resources: {}
    size: 0
    volumeMounts: null
    volumes: null
  ingester:
    image: ""
    options: null
    replicas: null
    resources: {}
    size: 0
    volumeMounts: null
    volumes: null
  ingress:
    enabled: false
    resources: {}
    security: none
    volumeMounts: null
    volumes: null
  query:
    image: ""
    options: null
    replicas: null
    resources: {}
    size: 0
    volumeMounts: null
    volumes: null
  resources: {}
  sampling:
    options: {}
  storage:
    cassandraCreateSchema:
      datacenter: ""
      enabled: null
      image: ""
      mode: ""
    dependencies:
      cassandraClientAuthEnabled: false
      elasticsearchClientNodeOnly: false
      elasticsearchNodesWanOnly: false
      enabled: true
      image: jaegertracing/spark-dependencies
      javaOpts: ""
      schedule: 55 23 * * *
      sparkMaster: ""
    elasticsearch:
      image: ""
      nodeCount: 1
      redundancyPolicy: ZeroRedundancy
      resources: {}
      storage: {}
    esIndexCleaner:
      enabled: true
      image: jaegertracing/jaeger-es-index-cleaner
      numberOfDays: 7
      schedule: 55 23 * * *
    esRollover:
      conditions: ""
      image: jaegertracing/jaeger-es-rollover
      readTTL: ""
      schedule: '*/30 * * * *'
    options:
      es.server-urls: https://invited-guppy-elasticsearch-local-es-http.observability.svc:9200
      es.tls.ca: /etc/ssl/certs/tls.crt
    secretName: tracingstack-storage-45aec255
    type: elasticsearch
  strategy: allInOne
  ui:
    options: {}
  volumeMounts:
  - mountPath: /etc/ssl/certs
    name: es-tls
  volumes:
  - name: es-tls
    secret:
      defaultMode: 420
      secretName: invited-guppy-elasticsearch-local-es-http-certs-public
status: {}

@jpkrohling
Copy link
Contributor

cc @pavolloffay

@pavolloffay
Copy link
Member

@jpkrohling I am using and external ES (from the Elastic Cloud on Kubernets (ECK) Operator) with self-signed certificates and username and password from secrets.

I guess you will have to specify TLS options for the index cleaner https://github.com/jaegertracing/jaeger/blob/master/plugin/storage/es/esCleaner.py#L22-L25 which is not supported at the moment.

Index cleaner CR exposes only a few options https://godoc.org/github.com/jaegertracing/jaeger-operator/pkg/apis/jaegertracing/v1#JaegerEsIndexCleanerSpec. Other configuration is derived from storage flags e.g. --es.username or --es.password. The TLS flags eg. --es.tls.* should be used to set TLS* env props in the index cleaner and rollover jobs.

@pavolloffay
Copy link
Member

@dushyant03 were you able to connect to ES using user/pass?

@pavolloffay
Copy link
Member

I have crated #592 for the TLS configuration with ES jobs.

@malz
Copy link

malz commented Aug 22, 2019

I'm running into a similar issue also using the EKS operator. I've mounted the secret to the cronjob spec and supplied the arguments through the ENV setting but it still appears to be throwing cert errors.

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  creationTimestamp: "2019-08-14T04:56:43Z"
  labels:
    app: jaeger
    app.kubernetes.io/component: cronjob-es-index-cleaner
    app.kubernetes.io/instance: jaeger
    app.kubernetes.io/managed-by: jaeger-operator
    app.kubernetes.io/name: jaeger-es-index-cleaner
    app.kubernetes.io/part-of: jaeger
  name: jaeger-es-index-cleaner
  namespace: observability
  ownerReferences:
  - apiVersion: jaegertracing.io/v1
    controller: true
    kind: Jaeger
    name: jaeger
    uid: e8f919ee-be4f-11e9-83f3-42010a92006a
  resourceVersion: "208844629"
  selfLink: /apis/batch/v1beta1/namespaces/observability/cronjobs/jaeger-es-index-cleaner
  uid: e93ae0b2-be4f-11e9-83f3-42010a92006a
spec:
  concurrencyPolicy: Allow
  failedJobsHistoryLimit: 1
  jobTemplate:
    metadata:
      creationTimestamp: null
    spec:
      parallelism: 1
      template:
        metadata:
          annotations:
            prometheus.io/scrape: "false"
            sidecar.istio.io/inject: "false"
          creationTimestamp: null
        spec:
          containers:
          - args:
            - "7"
            - https://store-es-http:9200
            env:
            - name: ES_TLS
              value: "true"
            - name: ES_TLS_CA
              value: /etc/ssl/certs/tls.crt
            envFrom:
            - secretRef:
                name: jaeger-secrets
            image: jaegertracing/jaeger-es-index-cleaner
            imagePullPolicy: Always
            name: jaeger-es-index-cleaner
            resources: {}
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
            volumeMounts:
            - mountPath: /etc/ssl/certs
              name: es-tls
          dnsPolicy: ClusterFirst
          restartPolicy: Never
          schedulerName: default-scheduler
          securityContext: {}
          terminationGracePeriodSeconds: 30
          volumes:
          - name: es-tls
            secret:
              defaultMode: 420
              secretName: store-es-http-certs-public
  schedule: 55 23 * * *
  successfulJobsHistoryLimit: 3
  suspend: false
status:
  lastScheduleTime: "2019-08-21T23:55:00Z"

@pavolloffay
Copy link
Member

@malz remember that Jaeger operator will revert the changes manually done on the objects (you can undeploy operator to avoid it). I think you have to also use client cert and the key https://github.com/jaegertracing/jaeger/blob/master/plugin/storage/es/esCleaner.py#L22-L25

@malz
Copy link

malz commented Aug 23, 2019

@malz remember that Jaeger operator will revert the changes manually done on the objects (you can undeploy operator to avoid it). I think you have to also use client cert and the key https://github.com/jaegertracing/jaeger/blob/master/plugin/storage/es/esCleaner.py#L22-L25

Of course! I've tried deploying the pod using the Client Cert and Key however I'm getting issues with every combination of certs produced by the ES Operator. Not sure if this is necessarily a problem with Jaeger.

@pavolloffay
Copy link
Member

@malz could you please provide a link to ES operator you are using? And if you make it work the ES CR and jaeger CR you used. It might be helpful for other folks.

@malz
Copy link

malz commented Aug 25, 2019

@malz could you please provide a link to ES operator you are using? And if you make it work the ES CR and jaeger CR you used. It might be helpful for other folks.

I'm using the official Elastic Operator: https://github.com/elastic/cloud-on-k8s version 0.0.9, running on GKE. Jaeger collector and query run perfectly but the index cleaning jobs aren't connecting.

@pavolloffay
Copy link
Member

PR #614 adds TLS options to ES jobs.

If the collector and query are able to connect then the cron jobs should be able too - if the configuration is correct.

@FelixRodriguezJara
Copy link

FelixRodriguezJara commented Nov 8, 2019

Hello, I am having exactly the same issue. I have tried many things, including recommendations in this post, but I´m not able of seeing what I´m doing wrong. Ingester and Query can connect to ES with no issues, however, the cronjobs fails because of the error @secat was having: "self signed certificate in certificate chain". I am using the Jaeger Operator as well and my storage definition is:

storage:
type: elasticsearch
    options:
      es:
        server-urls: https://es-deployment-es-http.logging.svc:9200
        tls.ca: /es-certs/ca.crt
        tls.key: /es-certs/tls.key
        tls.cert: /es-certs/tls.crt
        tls:
          skip-host-verify: true
    secretName: es-cred

es-cred contains elastic credentials:

ES_PASSWORD: 24 bytes
ES_USERNAME: 7 bytes

I am mounting es certificates through a secret on top of /es-certs.

This is the cronjob created:

Name:                       pubsub-streaming-es-index-cleaner
Namespace:                  tracing
Labels:                     app=jaeger
                            app.kubernetes.io/component=cronjob-es-index-cleaner
                            app.kubernetes.io/instance=pubsub-streaming
                            app.kubernetes.io/managed-by=jaeger-operator
                            app.kubernetes.io/name=pubsub-streaming-es-index-cleaner
                            app.kubernetes.io/part-of=jaeger
Annotations:                <none>
Schedule:                   55 23 * * *
Concurrency Policy:         Allow
Suspend:                    False
Starting Deadline Seconds:  <unset>
Selector:                   <unset>
Parallelism:                1
Completions:                <unset>
Pod Template:
  Labels:           <none>
  Annotations:      linkerd.io/inject: disabled
                    prometheus.io/scrape: false
                    sidecar.istio.io/inject: false
  Service Account:  pubsub-streaming
  Containers:
   pubsub-streaming-es-index-cleaner:
    Image:      jaegertracing/jaeger-es-index-cleaner:1.14.0
    Port:       <none>
    Host Port:  <none>
    Args:
      7
      https://es-deployment-es-http.logging.svc:9200
    Environment Variables from:
      es-cred  Secret  Optional: false
    Environment:
      ES_TLS_CA:    /es-certs/ca.crt
      ES_TLS_CERT:  /es-certs/tls.crt
      ES_TLS_KEY:   /es-certs/tls.key
    Mounts:
      /es-certs from es-certs (rw)
      /pubsub-cred from google-cloud-key (rw)
  Volumes:
   es-certs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  es-deployment-es-http-certs-internal
    Optional:    false
   google-cloud-key:
    Type:            Secret (a volume populated by a Secret)
    SecretName:      local-pubsub-key
    Optional:        false
Last Schedule Time:  <unset>
Active Jobs:         <none>
Events:              <none>

Could you please help me understanding why the cron jobs don´t work with this configuration?

Thank you very much.

@pavolloffay
Copy link
Member

pavolloffay commented Nov 8, 2019

I will have a look at this shortly. Many people experience problems when using TLS with ES.

In the meantime could you please paste here logs from cronjobs?

  • maybe we could disable sparkjob if TLS is enabled

@FelixRodriguezJara
Copy link

Hello @pavolloffay. Thank you very much for your quick reaction!

Yes, I've seen many people having this issue when using the operator and, after reading some posts and threads, I haven't found any workaround or final solution, that´s why I'm calling for help :).

I have tried pretty much every configuration and some of the changes proposed in the operator to make sure it mounts volumes and secrets required on the cronjob containers. Jaeger Ingester, Collector and Query connect to ES with no issue. Based on my understanding, the configuration should be the same as for the cronjobs (Operator gets it from storage options). I see username and password should be used and I'm passing these from a secret to the jobs via operator, together with the certs I've got from elasticsearch. However, the jobs seem to keep failing. Please find below the logs from the jobs, as per your request.

Regarding the point you've made about disabling sparkjob when TLS is enabled, don´t we need spark jobs running every day, same as for the index cleaner?

If we were to run spark and index cleaner cronjobs sepparately (flagging them as enable=false in the operator config), do you have any sample yaml I could use where TLS is used to connect to ES?

Thank you very much!

[root@devvm jaeger-operator]# kubectl logs pubsub-streaming-spark-dependencies-1573343700-2jvwd -n tracing
19/11/10 08:27:11 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/11/10 08:27:12 INFO ElasticsearchDependenciesJob: Running Dependencies job for 2019-11-10T00:00Z, reading from jaeger-span-2019-11-10 index, result storing to jaeger-dependencies-2019-11-10
19/11/10 08:27:13 ERROR NetworkClient: Node [https://10.102.180.152:9200] failed (javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target); no other nodes left - aborting...
Exception in thread "main" org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
	at org.elasticsearch.hadoop.rest.InitializationUtils.discoverClusterInfo(InitializationUtils.java:340)
	at org.elasticsearch.hadoop.rest.RestService.findPartitions(RestService.java:220)
	at org.elasticsearch.spark.rdd.AbstractEsRDD.esPartitions$lzycompute(AbstractEsRDD.scala:79)
	at org.elasticsearch.spark.rdd.AbstractEsRDD.esPartitions(AbstractEsRDD.scala:78)
	at org.elasticsearch.spark.rdd.AbstractEsRDD.getPartitions(AbstractEsRDD.scala:48)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
	at org.apache.spark.Partitioner$$anonfun$4.apply(Partitioner.scala:75)
	at org.apache.spark.Partitioner$$anonfun$4.apply(Partitioner.scala:75)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
	at scala.collection.immutable.List.map(List.scala:285)
	at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:75)
	at org.apache.spark.rdd.RDD$$anonfun$groupBy$1.apply(RDD.scala:691)
	at org.apache.spark.rdd.RDD$$anonfun$groupBy$1.apply(RDD.scala:691)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
	at org.apache.spark.rdd.RDD.groupBy(RDD.scala:690)
	at org.apache.spark.api.java.JavaRDDLike$class.groupBy(JavaRDDLike.scala:243)
	at org.apache.spark.api.java.AbstractJavaRDDLike.groupBy(JavaRDDLike.scala:45)
	at io.jaegertracing.spark.dependencies.elastic.ElasticsearchDependenciesJob.run(ElasticsearchDependenciesJob.java:224)
	at io.jaegertracing.spark.dependencies.elastic.ElasticsearchDependenciesJob.run(ElasticsearchDependenciesJob.java:203)
	at io.jaegertracing.spark.dependencies.DependenciesSparkJob.run(DependenciesSparkJob.java:54)
	at io.jaegertracing.spark.dependencies.DependenciesSparkJob.main(DependenciesSparkJob.java:40)
Caused by: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[https://10.102.180.152:9200]] 
	at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:152)
	at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:424)
	at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:388)
	at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:392)
	at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:168)
	at org.elasticsearch.hadoop.rest.RestClient.mainInfo(RestClient.java:735)
	at org.elasticsearch.hadoop.rest.InitializationUtils.discoverClusterInfo(InitializationUtils.java:330)
	... 33 more
 [root@devvm jaeger-operator]# kubectl logs pubsub-streaming-es-index-cleaner-1573343700-lpngp -n tracing
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 345, in _make_request
    self._validate_conn(conn)
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 844, in _validate_conn
    conn.connect()
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 326, in connect
    ssl_context=context)
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/ssl_.py", line 325, in ssl_wrap_socket
    return context.wrap_socket(sock, server_hostname=server_hostname)
  File "/usr/local/lib/python3.7/ssl.py", line 423, in wrap_socket
    session=session
  File "/usr/local/lib/python3.7/ssl.py", line 870, in _create
    self.do_handshake()
  File "/usr/local/lib/python3.7/ssl.py", line 1139, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1076)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/connection/http_urllib3.py", line 233, in perform_request
    method, url, body, retries=Retry(False), headers=request_headers, **kw
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 630, in urlopen
    raise SSLError(e)
urllib3.exceptions.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1076)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/curator/utils.py", line 643, in get_indices
    index='_all', params={'expand_wildcards': 'open,closed'})
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/client/utils.py", line 84, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/client/indices.py", line 643, in get_settings
    "GET", _make_path(index, "_settings", name), params=params
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/transport.py", line 350, in perform_request
    timeout=timeout,
  File "/usr/local/lib/python3.7/site-packages/elasticsearch/connection/http_urllib3.py", line 242, in perform_request
    raise SSLError("N/A", str(e), e)
elasticsearch.exceptions.SSLError: ConnectionError([SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1076)) caused by: SSLError([SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1076))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/es-index-cleaner/esCleaner.py", line 106, in <module>
    main()
  File "/es-index-cleaner/esCleaner.py", line 40, in main
    ilo = curator.IndexList(client)
  File "/usr/local/lib/python3.7/site-packages/curator/indexlist.py", line 31, in __init__
    self.__get_indices()
  File "/usr/local/lib/python3.7/site-packages/curator/indexlist.py", line 66, in __get_indices
    self.all_indices = utils.get_indices(self.client)
  File "/usr/local/lib/python3.7/site-packages/curator/utils.py", line 653, in get_indices
    raise exceptions.FailedExecution('Failed to get indices. Error: {0}'.format(e))
curator.exceptions.FailedExecution: Failed to get indices. Error: ConnectionError([SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1076)) caused by: SSLError([SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1076))

Thank you very much, @pavolloffay!

@pavolloffay
Copy link
Member

To my understanding, all failures in this thread are caused when using user/pass with CA cert or tls.skip-host-verify option.

The ES scripts/cronjobs do not support skip verify or using the CA without es.tls=true. First we need to support it there. I will submit a PR for it.

@FelixRodriguezJara
Copy link

Hello @pavolloffay.

Yes to enabling cronjobs to support skip-host-verify feature. Can you think of a workaround meanwhile?

Regarding es.tls=true, in order to discard a problem with it, I tested creating a cronjob by myself instead of using the operator (I couldn´t pass this argument to the cronjobs via the operator because what we have already discussed). The job fails because of the same reason, with the same logs ("self signed certificate in certificate chain), what makes me think that it might not be the only issue.

I´ve used same credentials and certificate as for the Query, Ingester and Collector, which connect properly.

The cronjob definition I´ve used is the following one:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: pubsub-streaming-es-index-cleaner
  namespace: tracing
  labels:
    app: es-jaeger-cleaner
spec:
  # every 1 PM UTC-0
  schedule: "0 13 * * *"
  jobTemplate:
    metadata:
      labels:
        app: es-jaeger-cleaner
    spec:
      template:
        metadata:
          labels:
            app: es-jaeger-cleaner
        spec:
          containers:
          - name: es-jaeger-cleaner
            image: jaegertracing/jaeger-es-index-cleaner:latest
            # clean up ES data indices older than 7 days from now
            args: ["7", "https://es-deployment-es-http.logging:9200"]
            env:
            - name: ES_USERNAME
              valueFrom:
               secretKeyRef:
                 name: es-cred
                 key: ES_USERNAME
            - name: ES_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: es-cred
                  key: ES_PASSWORD
            - name: ES_TLS
              value: "true"
            - name: TIMEOUT
              value: "300"
            - name: ES_TLS_CA
              value: /etc/ssl/certs/ca.crt
            - name: ES_TLS_CERT
              value: /etc/ssl/certs/tls.crt
            - name: ES_TLS_KEY
              value: /etc/ssl/certs/tls.key
            volumeMounts:
            - name: es-certs
              mountPath: /etc/ssl/certs
          restartPolicy: OnFailure
          volumes:
          - name: es-certs
            secret:
              secretName: es-deployment-es-http-certs-internal

@pavolloffay
Copy link
Member

The workaround is not to use insecure TLS and rather use mTLS.

I am working on a fix to allow using insecure and CA cert in python scripts.

@pavolloffay
Copy link
Member

This can be considered as a duplicate of #592

@FelixRodriguezJara
Copy link

The workaround is not to use insecure TLS and rather use mTLS.

I am working on a fix to allow using insecure and CA cert in python scripts.

Alright, thank you very much @pavolloffay, that's great! From a configuration perspective, do we have to do any changes or cronjobs will pick up the username, password and ca cert from the storage configuration?

@pavolloffay
Copy link
Member

There won't be any other configuration required.

Here is a simple CR. I will also improve docs and probably write a blog post explaining how people can use Jaeger operator with Elastic CO operator.

# setup an elasticsearch with `make es`
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: simple-prod
spec:
  strategy: production
  storage:
    type: elasticsearch
    options:
      es:
        # Note: This assumes elasticsearch is running in the "default" namespace.
        server-urls: https://quickstart-es-http.default.svc:9200
        use-aliases: true
        tls.ca: /es/secrets/ca.crt
    #        tls.skip-host-verify: true
    #        username: elastic
    #        password: ql7hbmqfzzkrtn6klcdsh8n5
    secretName: jaeger-secret
  volumeMounts:
    - name: secrets
      mountPath: /es/secrets/
      readOnly: true
  volumes:
    - name: secrets
      secret:
        secretName: quickstart-es-http-certs-public

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Elasticsearch The issues related to Elasticsearch storage
Projects
None yet
8 participants