Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Dockerfile to individual components #157

Merged
merged 1 commit into from
May 26, 2017
Merged

Added Dockerfile to individual components #157

merged 1 commit into from
May 26, 2017

Conversation

jpkrohling
Copy link
Contributor

No description provided.

@jpkrohling
Copy link
Contributor Author

This PR adds the following make targets:

  • docker -- to build the Docker images locally, based on the newly created Dockerfiles. It does not build docker images for the existing images, as those are handled by other tasks (all-in-one and crossdock)
  • docker-push -- to push the images built by the docker target to a repository. This requires a previous docker login.

The Travis changes were made kind of in a blind mode, as I'm not sure how I can test it. The bash parts were tested as much as I could.

I'm also not quite sure how I can remove the submodule updates to this commit, as I never really dealt with submodules. Any hints there is highly appreciated :)

@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling fc807c5 on jpkrohling:JPK-Dockerfiles into baa60f6 on uber:master.

@@ -0,0 +1,31 @@
#!/bin/bash
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite sure if this is the most appropriate for this script. It has to be either on the same directory as the Dockerfile or in a sub directory, but not on a parent directory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks ok here, given the other scripts. I would just rename it to create-schema.sh

@@ -0,0 +1,6 @@
FROM centos:7
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The images are inheriting the CentOS 7 images, as those provide some tools which are useful for debugging. Once the images mature a bit on the community, we can change the base image to Alpine or similar.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The smallest base image that still allows basic debug abilities would be best.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CentOS image isn't that big (192.5 MB), which is shared by all images. But if you have a specific image to suggest, I could try that out and see if it works.

@@ -16,6 +16,9 @@ matrix:
- go: 1.7
env:
- CROSSDOCK=true
- go: 1.7
env:
- DOCKER=true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to admit that I don't know for sure what this does. I guess this will trigger different executors on Travis, but I just did a copy/paste of the sections above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a travis test matrix; for each one of the entries, travis will kick off a new build with the environment variables set in the entry. We did this to speed up the travis build process by parallelizing the subcomponents. The copy pasta is good

@@ -46,7 +47,7 @@ md-to-godoc-gen:

.PHONY: clean
clean:
rm -rf cover.out cover.html lint.log fmt.log
rm -rf cover.out cover.html lint.log fmt.log jaeger-ui-build
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't really belong to this commit. I hope it's OK to leave this change here.

rm -rf cmd/query/jaeger-ui-build

.PHONY: docker-push
docker-push:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This target is only useful for development purposes and it's not being used for the actual builds. Should this be removed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we should never be manually pushing and always have travis push the repos on a successful build.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think we'd ever have need for manually pushing anything; should always wait for travis to push on a successful build.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did quite a few times under my own namespace, to test the images on OpenShift/Kubernetes. If I add a comment to this target, saying that this is intended only for local development purposes, would that be OK to leave it there?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm

@@ -0,0 +1,7 @@
FROM jpkroehling/cassandra
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was discussed on the mailing list. I'll either change this to use the "official" Dockerhub image for Cassandra or change to use an image we are building for OpenShift. I don't expect to change this before this PR is merged, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add TODO here and a task once this lands

@@ -16,6 +16,9 @@ matrix:
- go: 1.7
env:
- CROSSDOCK=true
- go: 1.7
env:
- DOCKER=true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a travis test matrix; for each one of the entries, travis will kick off a new build with the environment variables set in the entry. We did this to speed up the travis build process by parallelizing the subcomponents. The copy pasta is good

rm -rf cmd/query/jaeger-ui-build

.PHONY: docker-push
docker-push:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we should never be manually pushing and always have travis push the repos on a successful build.

rm -rf cmd/query/jaeger-ui-build

.PHONY: docker-push
docker-push:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think we'd ever have need for manually pushing anything; should always wait for travis to push on a successful build.

@@ -0,0 +1,6 @@
FROM centos:7
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The smallest base image that still allows basic debug abilities would be best.

@@ -0,0 +1,7 @@
FROM jpkroehling/cassandra
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add TODO here and a task once this lands

@@ -43,3 +46,4 @@ script:
- if [ "$COVERAGE" == true ]; then travis_retry goveralls -coverprofile=cover.out -service=travis-ci || true ; else echo 'skipping coverage'; fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on L41 above, you probably have to change it to:

- if [[ "$ALL_IN_ONE" == true || "$DOCKER" == true ]]; then bash ./travis/install-ui-deps.sh ; fi

because we need yarn to build the ui which is needed in build-docker-images.sh

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or we can add a new environment variable called INSTALL_YARN and add it to the matrix for ALL_IN_ONE and DOCKER


source ~/.nvm/nvm.sh
nvm use 6
DOCKER_NAMESPACE=jaegertracing make docker
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stupid question incoming: shouldn't there be a newline before make docker?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. In bash, this has the effect of setting this var for this command, so, make will see DOCKER_NAMESPACE as jaegertracing without the parent script needing to export it first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL

.PHONY: docker
docker: build_ui build-agent-linux build-collector-linux build-query-linux
cp -r jaeger-ui-build/build/ cmd/query/jaeger-ui-build
docker build -t $(DOCKER_NAMESPACE)/jaeger-cassandra-schema plugin/storage/cassandra/ ; \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call this jaeger-cassandra instead? The schema gets applied on startup and after that it's essentially just a cassandra box. The name might throw off some people.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no "after" for this image :) It's used as a Kubernetes "Job": it will just run the shell script that generates the keyspace and will gracefully die afterwards. It's derived from the Cassandra image because it needs the cqlsh tools, so, I thought it would be easier (and save space) to just consume a image that I know existed there, but Cassandra itself will never start on this image.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahhh gotcha. Maybe jaeger-cassandra-bootstrap? fine with jaeger-cassandra-schema too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to use the same mechanism across all artifacts.

@black-adder It sounds like this is doing something different from what e2e test is doing to init Cassandra.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we do for e2e is pretty ridiculous though, we keep bringing up collector and wait until it doesn't fatal out due to cassandra not being ready. I prefer this new approach.

^ above answer is an answer to a different c* question below

For this question, we have the e2e handler actively probe the cassandra cluster to see if it's ready.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked to @jsanda last week about something related to this and he made me realize that this might also not be appropriate.

In any case: I think we could add a separate task for having a more reliable bootstrap method, perhaps consistent among all deployment architectures. In the meantime, I'd keep it like this, so that we can move forward.

@@ -111,6 +112,21 @@ build-query-linux:
build-collector-linux:
CGO_ENABLED=0 GOOS=linux installsuffix=cgo go build -o ./cmd/collector/collector-linux ./cmd/collector/main.go

.PHONY: docker
docker: build_ui build-agent-linux build-collector-linux build-query-linux
cp -r jaeger-ui-build/build/ cmd/query/jaeger-ui-build
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not clear why this is needed. We already build all-in-one without copying this dir into query source, instead we just mount it to the Docker image from ./jaeger-ui-build

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's because the Dockerfile is within the cmd/query , and Docker can only build stuff that is on its own directory or any sub directory, but not in any parent directory (for security reasons I suppose). So, we need to have a copy there.

.PHONY: docker
docker: build_ui build-agent-linux build-collector-linux build-query-linux
cp -r jaeger-ui-build/build/ cmd/query/jaeger-ui-build
docker build -t $(DOCKER_NAMESPACE)/jaeger-cassandra-schema plugin/storage/cassandra/ ; \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to use the same mechanism across all artifacts.

@black-adder It sounds like this is doing something different from what e2e test is doing to init Cassandra.

@@ -0,0 +1,6 @@
FROM centos:7
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we'd put all Docker files in one dir. But no strong feeling either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, as long as the contents for the Docker images are in sub directories from where the Dockerfiles are located. I'd rather keep it alongside with the image's contents.


COPY collector-linux /go/bin/

CMD ["/go/bin/collector-linux"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment applies to all images - don't we need to be translating env vars into command line switches, to make the images follow https://12factor.net/ ?

Ideally our binaries could be reading env vars directly, there are frameworks like spf13/Cobra that support that, but we haven't implemented it yet. So the only way currently to configure each binary is to translate env vars into command line switches.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily: I'm not quite sure how it would work for Docker Compose, but for Docker at the command line, Kubernetes and OpenShift, you can override the command line and set extra env vars.

So, the best approach is to specify only what's really required to get the binary running, so that you don't need to maintain two places with default values.

This is an example of how to set an extra env var:

$ docker run -e MY_ENV_VAR=test -it jpkroehling/jaeger-agent bash
[root@86fc09d20bfc /]# echo $MY_ENV_VAR
test

And here is an example of how to override the command to execute when running an image: https://github.com/jpkrohling/jaeger-openshift/blob/JPK-OpenShift-IndividualComponents/template.yml#L147-L149

echo "Generating the schema for the keyspace ${KEYSPACE} and datacenter ${DATACENTER}"
export KEYSPACE

# the `test` parameter is to force the script to use a SimpleStrategy instead of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? simple strategy only makes sense in test env with a single Cassandra host.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually using the prod, but got into a Cassandra issue. Something like Error reading service_names from storage: Cannot achieve consistency level LOCAL_ONE.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prod configuration defines replication factor 2, so you need at least 2 Cassandra nodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that's it then. I'm not that familiar with Cassandra and noticed that I could change the script to test and it would work :-) My deployment script creates 2 Cassandra nodes, but it seems it was still not enough to avoid this error.

Should we move all these Cassandra-related tasks to a new task?

@@ -0,0 +1,31 @@
#!/bin/bash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks ok here, given the other scripts. I would just rename it to create-schema.sh


CQLSH_HOST=${CQLSH_HOST:-"cassandra"}
CASSANDRA_WAIT_TIMEOUT=${CASSANDRA_WAIT_TIMEOUT:-"60"}
DATACENTER=${DATACENTER:-"openshift"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't look like a good default. I am actually not sure what datacenter the official Cassandra image mentions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The datacenter is actually required by the existing schema generator: https://github.com/uber/jaeger/blob/master/plugin/storage/cassandra/cassandra3v001-schema.sh#L23

So, it's either prod or test, and when passing prod, a datacenter has to be provided.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's required for prod because it uses the name when defining replication strategy. That datacenter must match what Cassandra uses in its cassandra-rackdc.properties.

total_wait=0
while true
do
ping -c 1 ${CQLSH_HOST} > /dev/null 2>&1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the typical way of checking that the container app is running? Seems line it might succeed as soon as the container / IP is available, but the Cassandra server may still be starting.

@black-adder what did we use for e2e test? Didn't we wait for the actual response from Cassandra?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kubernetes/OpenShift will establish the connection only once the target service has a container passing the "readiness" check. We don't have such a check right now, but this is more like a hack and might eventually fail on the situation you just mentioned.

I think @jsanda can help us here in coming up with a reliable solution, but I'd prefer to have it as part of another task.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling a55c7bf on jpkrohling:JPK-Dockerfiles into 49dbd3c on uber:master.

@jpkrohling
Copy link
Contributor Author

@yurishkuro , @black-adder , I just updated this PR, rebasing it on the latest master and adding a couple of fixes based on the comment reviews. If this looks good, I'll squash it and it would be ready for merging.

There are still a couple of items open for Cassandra, mostly related to tuning it to be more resilient and reliable (replication factor, schema creation, ...) , but I'd like to have that as another follow up task, instead of blocking this PR. I'll then ask @jsanda for help on that task, as he's far more experienced with Cassandra than I am :-)

@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling 972b25b on jpkrohling:JPK-Dockerfiles into 437ec41 on uber:master.

@yurishkuro
Copy link
Member

  • I am not clear why the build failed, no output in the Travis logs.
  • By default the new images are published to jaegertracing/jaeger-{component}, I assume we need to create repos on Docker hub?
  • The images are missing any sort of tags, usually we at least add -t $REPO:$COMMIT
  • Can we exclude changes to jaeger-ui from this PR?

@jsanda
Copy link

jsanda commented May 22, 2017

You will certainly run into problems by just pinging the the pod. Cassandra start up times will vary, particularly when you take commit log replay into consideration. That can slow down start up significantly. I suggest using nodetool statusbinary. You can grep the output for running.

To avoid overly long start up times and to avoid commit log corruption, I strongly recommend running nodetool drain on shutdown.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling 99238f7 on jpkrohling:JPK-Dockerfiles into 7a965e9 on uber:master.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling 99238f7 on jpkrohling:JPK-Dockerfiles into 7a965e9 on uber:master.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling b122fce on jpkrohling:JPK-Dockerfiles into 7a965e9 on uber:master.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling 56eb6f5 on jpkrohling:JPK-Dockerfiles into 7a965e9 on uber:master.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling fee8371 on jpkrohling:JPK-Dockerfiles into 7a965e9 on uber:master.

@jpkrohling
Copy link
Contributor Author

I updated the PR to use another command for checking Cassandra's state. I decided not to use nodetool, as it would require exposing the JMX port just for this purpose. If we do need JMX in the future, that's fine, but it sounded too much for this specific purpose.

I am not clear why the build failed, no output in the Travis logs.

It works now. I had to add a dedicated conditional on .travis.yml/install for $DOCKER, instead of using a || on the $ALL_IN_ONE part.

By default the new images are published to jaegertracing/jaeger-{component}, I assume we need to create repos on Docker hub?

They are created on demand on the first build, provided the account has push permissions on the organization.

The images are missing any sort of tags, usually we at least add -t $REPO:$COMMIT

It's actually done on the upload-to-docker.sh script, called at the end of the script. It's the same script the other ones execute.

Can we exclude changes to jaeger-ui from this PR?

I tried every git trick I know, but the only thing that helped was to do a diff against master, create a new branch off of master and apply the new diff onto this new branch :/

I also opened #175 to track the points to improve in our usage of Cassandra.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling 2abe6ef on jpkrohling:JPK-Dockerfiles into 7a965e9 on uber:master.

@jpkrohling
Copy link
Contributor Author

Rebased with the latest master. This is ready for a re-review or eventual merge.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling 585a670 on jpkrohling:JPK-Dockerfiles into 894cf6e on uber:master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants