Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved image for Docker Hub #1692

Merged
merged 1 commit into from
Dec 25, 2016
Merged

Improved image for Docker Hub #1692

merged 1 commit into from
Dec 25, 2016

Conversation

ankoh
Copy link
Contributor

@ankoh ankoh commented Nov 6, 2016

Should resolve #1578.
Image size: 237.1 MB

It behaves very similar to the official Postgres image:
https://hub.docker.com/_/postgres/
https://github.com/docker-library/postgres/tree/e4942cb0f79b61024963dc0ac196375b26fa60dd/9.6

I added an easier way to bind a custom pipelinedb.conf into the image.
You can test the image with the following commands:

docker-compose -f pkg/docker/hub/tests/.yml build
docker-compose -f pkg/docker/hub/tests/.yml up

docker-compose -f pkg/docker/hub/tests/.yml rm -vf

E.g.:
docker-compose -f pkg/docker/hub/tests/custom_init.yml build
docker-compose -f pkg/docker/hub/tests/custom_init.yml up

docker-compose -f pkg/docker/hub/tests/custom_init.yml rm -vf

Please note that you need a newer version of docker-compose (for version 2 syntax).

I'd love to see an alpine linux based image in the future as they are much smaller.
(Postgres: 264MB [official] vs 31MB [alpine based])
As noted in #1578 one needs to patch a tiny glibc dependency. (Alpine ships with musl)

@derekjn
Copy link
Contributor

derekjn commented Nov 6, 2016

@ankoh thank you so much for contributing, this is awesome! In addition to my other comments, generally speaking I don't think there's a need to maintain separate images for DockerHub and development. Let's just keep one image and pull everything out of dev/ and hub/ up into docker/. Thanks again!

--ingroup pipeline \
pipeline

COPY docker-entrypoint.sh /usr/local/bin/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't there be a docker-entrypoint.sh in this directory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure! COPY docker-entrypoint.sh references pkg/docker/hub/docker-entrypoint.sh.
the path is relative to the Dockerfile

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah i see oops. It's gitignored somewhere. Sry I haven't seen that. I'll push in a sec.

@@ -0,0 +1 @@
select version();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see hello_world.sql being used anywhere. I'm assuming it's there for convenience to manually run against a running container, but if it's not referenced anywhere let's leave it out.

Copy link
Contributor Author

@ankoh ankoh Nov 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pkg/docker/hub/tests/custom_init.yml mounts pkg/docker/hub/tests/init to /docker-entrypoint-init.d
the entrypoint runs all scripts in this folder, so if you run the custom_init.yml with docker-compose, you will see the version query in the docker logs.

@ankoh
Copy link
Contributor Author

ankoh commented Nov 6, 2016

@derekjn Sorry, that was my fault. I haven't realised that the most important file was gitignored. :-)

custom_init.yml should yield something like this:

screen shot 2016-11-06 at 23 09 00

@usmanm
Copy link
Collaborator

usmanm commented Nov 7, 2016

@ankoh: Can you also keep install_extras.sh? A lot of our use prefer the Docker container because it simplifies running these extensions. I will look into the Alpine Linux distro.

@ankoh
Copy link
Contributor Author

ankoh commented Nov 7, 2016

@usmanm Sure, but then I'd propose to do it via arguments in the Dockerfile
(i.e.: https://docs.docker.com/engine/reference/builder/#arg).

That might in fact be an interesting addition.
I'm thinking of something like:
ARG WITH_KAFKA=0
ARG WITH_CSTORE=0
...
or just WITH_EXTRAS=0

That way you could keep a 200 MB (or 30MB in case of alpine) base image but still A) offer an easy way to build specific images on your own and B) maintain specific docker tags (e.g. pipelinedb/pipelinedb:0.9.5-kafka or pipelinedb/pipelinedb:0.9.5-extras).

A kafka build could then look like:
docker build --tag pipelinedb/pipelinedb:0.9.6-kafka --build-arg WITH_KAFKA=1

What do you think?
I'd love to see the primary image being just postgres + pipelinedb stuff to keep things clean and promote a drop-in replacement.

I've already built an alpine image with all the dependencies that just fails at this single backtrace symbol. I could open a separate issue if you are interested in looking at that.

@usmanm
Copy link
Collaborator

usmanm commented Nov 7, 2016

I like the build args idea more. If you could share the Dockerfile with the base image of Alpine Linux, I'll figure out how to patch it/install glibc.

@derekjn
Copy link
Contributor

derekjn commented Nov 18, 2016

@ankoh any progress on this? It would be great if we could get this into the next release. We're happy to help with the finishing touches if you'd like, just let us know!

@ankoh
Copy link
Contributor Author

ankoh commented Nov 19, 2016

@derekjn I'm sorry I switched back to Postgres in my project and lost focus on this PR. I'll look at it later or tomorrow.

@ankoh
Copy link
Contributor Author

ankoh commented Nov 22, 2016

@derekjn @usmanm I will do something like this:
screen shot 2016-11-22 at 23 55 52

The (verbose) build command would look like:
docker build -t pipelinedb/pipelinedb:latest --build_arg WITH_CONTRIB_MODULES="hstore fuzzystrmatch" --build_arg WITH_CSTORE=1 --build_arg WITH_KAFKA=1 .
while the vanilla pipelinedb is just
docker build -t pipelinedb/pipelinedb:latest .

@usmanm
Copy link
Collaborator

usmanm commented Nov 25, 2016

That sounds good!

@ankoh
Copy link
Contributor Author

ankoh commented Nov 27, 2016

@derekjn @usmanm Here is a first draft for the Dockerfile with build arguments.
I currently don't have the time to test everything, so you definitely want to try out whether the contrib-modules, kafka and cstore are working.
I think the "test -z" guard is a quite beautiful solution to the problem of installing all these extras with build arguments.
Nevertheless in case of the contrib modules, the overall image build does NOT fail if one of the modules fails.
One might need a nicer solution to break out of the for loop in line 70 for this.

The fuzzystrmatch module failed to install because it expected a different method signature for 'levenshtein_with_costs'. So I assume you changed the calculation of the levenshtein distance? That requires some investigation.

Image sizes:
(1) vanilla 237.1 MB
(2) vanilla + cstore 252.0 MB (+ 14.9 MB)
(3) vanilla + kafka 292.7 MB (+ 55.6 MB)
(4) vanilla + contrib 249.7 MB (+ 12.6 MB)
(5) vanilla + all 317.5 MB (+ 80.4 MB)

Commands:
(1) docker build -t pipelinedb/pipelinedb:latest pkg/docker
(2) docker build -t pipelinedb/pipelinedb:latest --build-arg WITH_KAFKA=1 pkg/docker
(3) docker build -t pipelinedb/pipelinedb:latest --build-arg WITH_CSTORE=1 pkg/docker
(4) docker build -t pipelinedb/pipelinedb:latest --build-arg WITH_CONTRIB_MODULES="postgres_fdw citext hstore pg_trgm
file_fdw tablefunc" pkg/docker
(5) docker build -t pipelinedb/pipelinedb:latest --build-arg WITH_KAFKA=1 --build-arg WITH_CSTORE=1 --build-arg WITH_CONTRIB_MODULES="postgres_fdw citext hstore pg_trgm
file_fdw tablefunc" pkg/docker

@mhafellner
Copy link

@ankoh Really appreciate somebody puts more attention towards making an easy to use docker container for PipelineDB! One question I have: Is there a particular reason you left out the kinesis extension? If not, it would be awesome to see it added here as well to make this complete.

@ankoh
Copy link
Contributor Author

ankoh commented Nov 28, 2016

@mhafellner Sure, I only considered the previous install_extras script.
I've added the option WITH_KINESIS.
As the AWS SDK requires a newer version of cmake the overall build time is increased quite a bit when using WITH_KINESIS...
(NOTE 1: the Dockerfile downloads the 3.7.0 pre-compiled cmake tarball. One could of course also just build from source or use a third-party PPA)
(NOTE 2: pipelinedb_kinesis still uses 0.10.9 of the AWS SDK!)

Image size:
vanilla + kinesis 288.9 MB (+ 51,8 MB)

Command:
docker build -t pipelinedb/pipelinedb:latest --build-arg WITH_KINESIS=1 pkg/docker

With all these options enabled the build time is horribly long, not to mention the network traffic..
But you probably should go that path because it wraps every extension in a separate minimal docker layer....
=> People that are only interested in the raw PipelineDB don't need to worry much about the other stuff.

EDIT: Compilation fails with the most recent version of the AWS SDK, so you might want to check that at some point :).

@derekjn
Copy link
Contributor

derekjn commented Nov 28, 2016

@ankoh thank you so much, this is looking great! And @mhafellner thank you for your feedback and input as well. I'm going to spend some time getting a feel for using the new Dockerfile and I'll provide any final feedback before we get this thing merged, which will be very soon.

@derekjn
Copy link
Contributor

derekjn commented Nov 28, 2016

Also looking into the pipeline_kinesis build failure.

@EliSnow
Copy link

EliSnow commented Nov 30, 2016

👍 For an alpine linux based image

@derekjn derekjn force-pushed the master branch 2 times, most recently from 824ecb5 to 3cbeaf8 Compare December 10, 2016 03:01
@derekjn derekjn merged commit f6f00f3 into pipelinedb:master Dec 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dockerfile should set listen_addresses to '*' in pipelinedb.conf
5 participants