Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Docker Image directory which deploys to Dockerhub #464

Closed
fbertsch opened this issue Apr 9, 2019 · 3 comments
Closed

Create Docker Image directory which deploys to Dockerhub #464

fbertsch opened this issue Apr 9, 2019 · 3 comments

Comments

@fbertsch
Copy link
Contributor

fbertsch commented Apr 9, 2019

The PodOperator is great because it allows us to use specific environments for different jobs. The idea here is for us to store Airflow Job Dockerfiles in telemetry-airflow. These will be built alongside the CI. We can optionally build them in CI nightly, or as an Airflow step, if we need that.

These images will live in /images, which does not need to be included in the telemetry-airflow:latest docker image.

Each image will have it's own directory. The image will be pushed to the telemetry-airflow repository in Dockerhub, but with the tag $DIRNAME. So for example:

/images
  /mozilla-schema-generator
    - Dockerfile
    - docker-compose.yml
    - requirements.txt

That image would be built and pushed to telemetry-airflow:mozilla-schema-generator. The PodOperator would use that tag to run the associated task.

cc @whd, @haroldwoo for feedback

@whd
Copy link
Member

whd commented Apr 16, 2019

This structure would be a departure from our SOP. Why not have the Dockerfile and image generation CI configured in each container/projects's own repository? i.e. https://github.com/mozilla/mozilla-schema-generator should be built on CI and its Dockerfile and circleci configs should live within its own repo, uploading to Dockerhub at mozilla/mozilla-schema-generator:TAG. Likewise for any other projects that need to be used on airflow via the pod operator. This method of container management is how we currently maintain all Docker-based projects operated by Cloud and Data Operations.

Containers to be run by the pod operator on airflow likely don't have anything intrinsically to do with airflow itself. This change seems similar to me to putting the telemetry-batch-view jar uploading process and CI into telemetry-airflow. It is my understanding that the pod operator is generic and simply takes arbitrary containers to execute on pods. Unless we need custom images that change properties (e.g. the Dockerfile is an extension of mozilla-schema-generator's Dockerfile with some airflow-specific configuration) I don't think this is a structure we should adopt, and even then I would avoid this pattern if possible, instead making the standard image flexible enough to support usage both in a standard way and with airflow via docker entrypoints and additional executables/configuration.

There is the additional consideration of the reliability of docker images that needs to be addressed. Typically we mirror Dockerhub images via Jenkins to our internal container registry for production workflows. We would probably want a similar mechanism for containers used via pod operator, without necessarily mirroring via Jenkins, but this is tangential and something that can be considered separately from the main discussion.

@fbertsch
Copy link
Contributor Author

Thanks for the feedback @whd. We'll close this and figure out an alternative solution.

@haroldwoo
Copy link
Contributor

Do we still need dockerhub for this use case? We could configure circleci to push images to gcr directly so they are readily available for podoperators.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants