Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collector cannot push to elasticsearch behind TLS proxy #485

Closed
tanner-bruce opened this Issue Oct 20, 2017 · 17 comments

Comments

Projects
None yet
6 participants
@tanner-bruce
Copy link

tanner-bruce commented Oct 20, 2017

We are running a remote collector that should push to an Elasticsearch cluster behind a TLS reverse proxy.

Upon startup, the docker container will die after printing the message
{"level":"fatal","ts":1508514781.8733246,"caller":"collector/main.go:86","msg":"Unable to set up builder","error":"health check timeout: no Elasticsearch node available","errorVerbose":"no Elasticsearch node available\ngithub.com/uber/jaeger/vendor/github.com/olivere/elastic.init\n\t/home/travis/gopath/src/github.com/uber/jaeger/vendor/github.com/olivere/elastic/client.go:84\ngithub.com/uber/jaeger/pkg/es/config.init\n\t/home/travis/gopath/src/github.com/uber/jaeger/pkg/es/config/config.go:102\ngithub.com/uber/jaeger/cmd/builder.init\n\t/home/travis/gopath/src/github.com/uber/jaeger/cmd/builder/doc.go:20\nmain.init\n\t/home/travis/gopath/src/github.com/uber/jaeger/cmd/collector/main.go:161\nruntime.main\n\t/home/travis/.gimme/versions/go1.7.linux.amd64/src/runtime/proc.go:172\nruntime.goexit\n\t/home/travis/.gimme/versions/go1.7.linux.amd64/src/runtime/asm_amd64.s:2086\nhealth check timeout\ngithub.com/uber/jaeger/vendor/github.com/olivere/elastic.(*Client).startupHealthcheck\n\t/home/travis/gopath/src/github.com/uber/jaeger/vendor/github.com/olivere/elastic/client.go:1067\ngithub.com/uber/jaeger/vendor/github.com/olivere/elastic.NewClient\n\t/home/travis/gopath/src/github.com/uber/jaeger/vendor/github.com/olivere/elastic/client.go:240\ngithub.com/uber/jaeger/pkg/es/config.(*Configuration).NewClient\n\t/home/travis/gopath/src/github.com/uber/jaeger/pkg/es/config/config.go:50\ngithub.com/uber/jaeger/cmd/collector/app/builder.(*SpanHandlerBuilder).initElasticStore\n\t/home/travis/gopath/src/github.com/uber/jaeger/cmd/collector/app/builder/span_handler_builder.go:102\ngithub.com/uber/jaeger/cmd/collector/app/builder.NewSpanHandlerBuilder\n\t/home/travis/gopath/src/github.com/uber/jaeger/cmd/collector/app/builder/span_handler_builder.go:75\nmain.main.func1\n\t/home/travis/gopath/src/github.com/uber/jaeger/cmd/collector/main.go:84\ngithub.com/uber/jaeger/vendor/github.com/spf13/cobra.(*Command).execute\n\t/home/travis/gopath/src/github.com/uber/jaeger/vendor/github.com/spf13/cobra/command.go:636\ngithub.com/uber/jaeger/vendor/github.com/spf13/cobra.(*Command).ExecuteC\n\t/home/travis/gopath/src/github.com/uber/jaeger/vendor/github.com/spf13/cobra/command.go:722\ngithub.com/uber/jaeger/vendor/github.com/spf13/cobra.(*Command).Execute\n\t/home/travis/gopath/src/github.com/uber/jaeger/vendor/github.com/spf13/cobra/command.go:681\nmain.main\n\t/home/travis/gopath/src/github.com/uber/jaeger/cmd/collector/main.go:139\nruntime.main\n\t/home/travis/.gimme/versions/go1.7.linux.amd64/src/runtime/proc.go:183\nruntime.goexit\n\t/home/travis/.gimme/versions/go1.7.linux.amd64/src/runtime/asm_amd64.s:2086","stacktrace":"github.com/uber/jaeger/vendor/go.uber.org/zap.Stack\n\t/home/travis/gopath/src/github.com/uber/jaeger/vendor/go.uber.org/zap/field.go:191\ngithub.com/uber/jaeger/vendor/go.uber.org/zap.(*Logger).check\n\t/home/travis/gopath/src/github.com/uber/jaeger/vendor/go.uber.org/zap/logger.go:301\ngithub.com/uber/jaeger/vendor/go.uber.org/zap.(*Logger).Fatal\n\t/home/travis/gopath/src/github.com/uber/jaeger/vendor/go.uber.org/zap/logger.go:235\nmain.main.func1\n\t/home/travis/gopath/src/github.com/uber/jaeger/cmd/collector/main.go:86\ngithub.com/uber/jaeger/vendor/github.com/spf13/cobra.(*Command).execute\n\t/home/travis/gopath/src/github.com/uber/jaeger/vendor/github.com/spf13/cobra/command.go:636\ngithub.com/uber/jaeger/vendor/github.com/spf13/cobra.(*Command).ExecuteC\n\t/home/travis/gopath/src/github.com/uber/jaeger/vendor/github.com/spf13/cobra/command.go:722\ngithub.com/uber/jaeger/vendor/github.com/spf13/cobra.(*Command).Execute\n\t/home/travis/gopath/src/github.com/uber/jaeger/vendor/github.com/spf13/cobra/command.go:681\nmain.main\n\t/home/travis/gopath/src/github.com/uber/jaeger/cmd/collector/main.go:139"}

I've filed olivere/elastic#625 for the bad error message, but would like to open the discussion up here as to whether the jaeger-collector docker image should include ca-certificates.

@yurishkuro

This comment has been minimized.

Copy link
Member

yurishkuro commented Oct 20, 2017

shouldn't the certificates be provided on the host and just mapped into container? I don't know how it's done typically, but putting them in the docker image sounds odd.

@tanner-bruce

This comment has been minimized.

Copy link
Author

tanner-bruce commented Oct 20, 2017

Keeping these two definitions from the docker site in mind:

A container image is a lightweight, stand-alone, executable package of a piece of software that includes everything needed to run it: code, runtime, system tools, system libraries, settings.

Docker containers isolate applications from one another and from the underlying infrastructure. Docker provides the strongest default isolation to limit app issues to a single container instead of the entire machine.

I would argue that the collector should include the ca-certificates - we don't want to rely on the underlying infrastructure to have those certificates or provide them for the container. Maybe I haven't even installed them on my server, it's not strictly necessary. Since without them, the collector cannot access an HTTPS server, I would say the current container is not meeting the goal of providing everything needed to run it.

FWIW I'm very OK with maintaining my own image, I just wanted to bring this up for discussion, as I think providing the certificates makes for a nicer out-of-the-box experience for users.

@jpkrohling

This comment has been minimized.

Copy link
Member

jpkrohling commented Oct 21, 2017

I don't know how this would apply for "pure" Docker, but on platforms like OpenShift and Kubernetes, the CA that signs the certs for the "services" (like Elasticsearch here) is located at a well-known place (/var/run/secrets/kubernetes.io/serviceaccount/ca.crt). The CA cert itself isn't part of the image, but part of the orchestration service and is mounted as a volume at runtime.

@yurishkuro

This comment has been minimized.

Copy link
Member

yurishkuro commented Oct 21, 2017

actually going back to the original issue, why do we need ca certificates in the first place? The collectors do not serve HTTPS traffic, and you don't need certificates being the client in the HTTPS connection, only being the server. Unless you have a setup where you use certificates to authenticate both sides of the connection, in this case it seems obvious that we're talking about deployment concerns for a specific installation, which cannot be part of the image.

@tanner-bruce

This comment has been minimized.

Copy link
Author

tanner-bruce commented Oct 23, 2017

The issue is when the you run jaeger collector with elasticsearch behind https. Running this with the official container will fail.

docker run -i --name collector -p 14267:14267 -p 14268:14268 -p 9411:9411 \
    jaeger-collector /go/bin/collector-linux \
    --span-storage.type=elasticsearch  \
    --es.server-urls=https://elastic.com \
    --es.username=elastic \
    --es.password=pass \
    --es.num-shards=5 \
    --es.num-replicas=1 \
    --es.sniffer=false \
    --log-level=debug
@freelinuxer

This comment has been minimized.

Copy link

freelinuxer commented Nov 11, 2017

Hi, @tanner-bruce @yurishkuro @jpkrohling do we have a solution/workaround for this issue?
I think that I am going throug same issue now.
My test k8s pod connects to a production ES (https) just fine as @yurishkuro mentioned. (Client does not need a cert).
But, Jager-collector and jaeger-query fail to connect the prod ElasticSearch(https).

@tanner-bruce

This comment has been minimized.

Copy link
Author

tanner-bruce commented Nov 11, 2017

@freelinuxer I built my own images using bitnami/minideb as the base and installed the ca certificates necessary.

If there is interest I can contribute a second docker build (i.e alongside the scratch base) using minideb or alpine as the base with ca-certs installed. We tend to prefer minideb as it is lightweight but still glibc based and is gives a familiar debian environment.

@freelinuxer

This comment has been minimized.

Copy link

freelinuxer commented Nov 11, 2017

@tanner-bruce That will help a lot. !
Could you provide a blog or instruction on this as I am not familiar how I'd put cert and create custom images for collector and query. Thanks !!

@yurishkuro

This comment has been minimized.

Copy link
Member

yurishkuro commented Nov 12, 2017

@tanner-bruce would be useful if you could share your Dockerfiles.

@tanner-bruce

This comment has been minimized.

Copy link
Author

tanner-bruce commented Nov 12, 2017

@freelinuxer in a pinch you can use this:

If you compile jaeger:

FROM bitnami/minideb

EXPOSE 9411
EXPOSE 14267
EXPOSE 14268
EXPOSE 14269

RUN install_packages ca-certificates

COPY collector-linux /go/bin/
ENTRYPOINT ["/go/bin/collector-linux"]

If you don't want to compile, you can use a multi-stage build to get the binary from the official container image. This is for the query service, so note that it also requires copying the /go/jaeger-ui folder. The collector does not require this.

FROM jaegertracing/jaeger-query:0.9.0 AS jaeger
FROM bitnami/minideb

EXPOSE 16686

RUN install_packages ca-certificates

COPY --from=jaeger /go/bin/query-linux /go/bin/query-linux
COPY --from=jaeger /go/jaeger-ui/ /go/jaeger-ui/
ENTRYPOINT ["/go/bin/query-linux"]

You can replace this with alpine by changing the RUN install_packages ca-certificates to RUN apk add --update --no-cache ca-certificates and also updating the base image.

@freelinuxer

This comment has been minimized.

Copy link

freelinuxer commented Nov 13, 2017

@tanner-bruce I guess Jaeger-query image also needs similar changes as you described above since it also needs to talk to ElasticSearch. correct ?

@tanner-bruce

This comment has been minimized.

Copy link
Author

tanner-bruce commented Nov 13, 2017

@freelinuxer yes, it does. In the snippets I posted above, the first one is for collector, the second one, which shows how to build it without compiling, is for query.

@sagikazarmark

This comment has been minimized.

Copy link

sagikazarmark commented Dec 14, 2017

actually going back to the original issue, why do we need ca certificates in the first place?

I suppose host verification is turned on by default in the underlying elasticsearch/http client.

I'm trying to make it work by using a self-signed cert and mounting it to /etc/ssl/certs/ca-certificates.crt file (based on this), but no luck so far. :(

@freelinuxer

This comment has been minimized.

Copy link

freelinuxer commented Dec 15, 2017

I tried with @tanner-bruce 's approach and it worked for me.
#485 (comment)
I built my own binaries and container images.

@sagikazarmark

This comment has been minimized.

Copy link

sagikazarmark commented Dec 15, 2017

I managed to make it work in the meantime (turned out I was using the wrong CA)

@yurishkuro

This comment has been minimized.

Copy link
Member

yurishkuro commented Dec 16, 2017

@tanner-bruce would you be able to test the fix in #598?

I tested with a mini program

Dockerfile

FROM alpine as certs
RUN apk add --update --no-cache ca-certificates

FROM scratch
COPY --from=certs /usr/share/ca-certificates/ /usr/share/ca-certificates/
COPY --from=certs /etc/ssl/ /etc/ssl/

COPY main /

CMD ["/main"]

main.go

package main

import (
	"net/http"
	"time"
)

func main() {
	for {
		resp, err := http.Get("https://google.com")
		if err != nil {
			println(err.Error())
		} else {
			println(resp.StatusCode)
		}
		time.Sleep(10 * time.Second)
	}
}

@wafflebot wafflebot bot removed the review label Dec 19, 2017

@sergeyklay

This comment has been minimized.

Copy link

sergeyklay commented Jul 12, 2018

We have the same issue with docker compose:

version: "3"

services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:5.6.2
    hostname: es-loadbalancer
    networks:
      - elastic-jaeger
    ports:
      - "9200:9200"
      - "9300:9300"
    restart: on-failure
    environment:
      - "xpack.security.enabled=true"
      - "discovery.type=single-node"
      - "http.host=0.0.0.0"
      - "transport.host=127.0.0.1"
    volumes:
      - esdata:/usr/share/elasticsearch/data

  jaeger-collector:
    image: jaegertracing/jaeger-collector
    hostname: jaeger-collector
    ports:
      - "14269:14269"
      - "14268:14268"
      - "14267:14267"
      - "9411:9411"
    networks:
      - elastic-jaeger
    restart: on-failure
    environment:
      - SPAN_STORAGE_TYPE=elasticsearch
    command: [
      "--es.server-urls=http://es-loadbalancer:9200",
      "--es.username=elastic",
      "--es.password=changeme",
      "--es.num-shards=1",
      "--span-storage.type=elasticsearch",
      "--log-level=error"
    ]
    depends_on:
      - elasticsearch

volumes:
  esdata:
    driver: local

networks:
  elastic-jaeger:
    driver: bridge
$ curl -sSL -u elastic:changeme 'http://127.0.0.1:9200/_nodes/http?pretty'

{
  "_nodes" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "cluster_name" : "docker-cluster",
  "nodes" : {
    "S14jI0XOSDWCmgPcVyjoBA" : {
      "name" : "S14jI0X",
      "transport_address" : "172.18.0.2:9300",
      "host" : "172.18.0.2",
      "ip" : "172.18.0.2",
      "version" : "5.6.2",
      "build_hash" : "57e20f3",
      "roles" : [
        "master",
        "data",
        "ingest"
      ],
      "attributes" : {
        "ml.max_open_jobs" : "10",
        "ml.enabled" : "true"
      },
      "http" : {
        "bound_address" : [
          "0.0.0.0:9200"
        ],
        "publish_address" : "172.18.0.2:9200",
        "max_content_length_in_bytes" : 104857600
      }
    }
  }
}

Collector says:

{
"level":"fatal",
"ts":1531403934.0984566,
"caller":"collector/main.go:100",
"msg":"Failed to init storage factory",
"error":"health check timeout: no Elasticsearch node available",
"errorVerbose":"no Elasticsearch node available\ngithub.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic%2ev5.init\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic.v5/client.go:88\ngithub.com/jaegertracing/jaeger/pkg/es.init\n\t<autogenerated>:1\ngithub.com/jaegertracing/jaeger/plugin/storage/es.init\n\t<autogenerated>:1\ngithub.com/jaegertracing/jaeger/plugin/storage.init\n\t<autogenerated>:1\ngithub.com/jaegertracing/jaeger/cmd/env.init\n\t<autogenerated>:1\nmain.init\n\t<autogenerated>:1\nruntime.main\n\t/home/travis/.gimme/versions/go1.10.linux.amd64/src/runtime/proc.go:186\nruntime.goexit\n\t/home/travis/.gimme/versions/go1.10.linux.amd64/src/runtime/asm_amd64.s:2361\nhealth check timeout\ngithub.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic%2ev5.(*Client).startupHealthcheck\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic.v5/client.go:1114\ngithub.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic%2ev5.NewClient\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic.v5/client.go:244\ngithub.com/jaegertracing/jaeger/pkg/es/config.(*Configuration).NewClient\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/pkg/es/config/config.go:59\ngithub.com/jaegertracing/jaeger/plugin/storage/es.(*Factory).Initialize\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/plugin/storage/es/factory.go:65\ngithub.com/jaegertracing/jaeger/plugin/storage.(*Factory).Initialize\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/plugin/storage/factory.go:90\nmain.main.func1\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/collector/main.go:99\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).execute\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:698\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).ExecuteC\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:783\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).Execute\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:736\nmain.main\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/collector/main.go:190\nruntime.main\n\t/home/travis/.gimme/versions/go1.10.linux.amd64/src/runtime/proc.go:198\nruntime.goexit\n\t/home/travis/.gimme/versions/go1.10.linux.amd64/src/runtime/asm_amd64.s:2361",
"stacktrace":"main.main.func1\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/collector/main.go:100\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).execute\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:698\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).ExecuteC\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:783\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).Execute\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:736\nmain.main\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/collector/main.go:190\nruntime.main\n\t/home/travis/.gimme/versions/go1.10.linux.amd64/src/runtime/proc.go:198"
}

Is this related to the certificates?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.