Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: add docker container #27

Merged
merged 26 commits into from Mar 18, 2022
Merged

chore: add docker container #27

merged 26 commits into from Mar 18, 2022

Conversation

ivan-aksamentov
Copy link
Member

@ivan-aksamentov ivan-aksamentov commented Mar 17, 2022

This adds docker container image and CI that builds and publishes it.

Notable hacks in the Dockerfile:

  • I had to download julia in advance in order to run some jula before make. I use the same command as in makefile. This should go.
  • The PATH variable contains julia version, because julia is installed into a directory containing version number. This will make it hard to upgrade the version.
  • I added instantiate and build calls.
  • I also install Conda.jl and then ete3

Work items:

  • Pangraph crashes when running. Runtime dependencies, env vars etc need to be figured, so that the image can actually do something useful.
  • Dev docs
  • User docs

@ivan-aksamentov ivan-aksamentov marked this pull request as ready for review March 18, 2022 01:17
Here we only keep `artifacts` and `conda` from `~/.julia` directory, instead of the whole thing.
fixed up recipe dependencies so that everything runs in correct order
removed unneeded elements from dockerfile after makefile simplified.
Copy link
Collaborator

@nnoll nnoll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Tested commands on my end and everything works as expected.

@nnoll
Copy link
Collaborator

nnoll commented Mar 18, 2022

@ivan-aksamentov, do you want to verify that my last commit works on your end? It's an attempt to simplify the build

@ivan-aksamentov
Copy link
Member Author

@nnoll I cleaned up some more and added a few things, notably running as non-root user:

docker run --rm -it \
--name "pangraph-$(date +%s)" \
--user="$(id -u):$(id -g)" \
--volume="$(pwd):/workdir" \
--workdir=/workdir \
neherlab/pangraph:latest \
bash -c "pangraph build --circular --alpha 0 --beta 0 /workdir/data/synthetic/test.fa | tee graph.json && pangraph polish graph.json | tee graph2.json"

This ensures that files written not as root, but as user $(id -u):$(id -g), e.g. 1000:1000 on a single-user Ubuntu.

@ivan-aksamentov
Copy link
Member Author

ivan-aksamentov commented Mar 18, 2022

Overall the current thing works. I have nothing to add.

My impression is that Julia ecosystem is not ready for production quite yet. Science is great, but if you cannot ship it or if it's a pain to use, then it's not very useful. The 30 min single-threaded compilation is awful.

Regarding the PR, if there are no defects introduced to the usual dev and non-docker experience, then feel free to merge and make a release (see docs/dev/releasing.md). Don't forget to increment the version in Project.toml. There is already a mismatch between it and GitHub Releases. Perhaps worth adding this step to the dev docs or to automate somehow.

The followup work may address the following:

  • Docker dev experience: the entire dockerfile is basically a single 30-min-long make target. This means that the docker build always runs almost fully, from scratch, and it takes 2 days to do even the simplest things with it.

    It would be nice to split the single COPY and single RUN make command into multiple COPY and RUN commands, to take advantage of docker layer caching.

    In particular, the filename-based make targets, like $(jc) need to have hardcoded names, like install-julia so that they can be referred to conveniently from outside. And there should be a well-defined order of invocation of these targets.

  • Container size is almost 1 GB. The /root/.julia/conda/3/lib directory is about 800 Mb. Removing conda entirely would be great. But if not, pruning the libraries selectively, would be another way. For example, there is entire Qt 5 distribution there for some reason. There is an excellent tool to explore layers of a docker image: dive

  • Research if there is a away to avoid the need for /root/.julia/artifacts/ (e.g. whether they can be included into the pangraph/ dir somehow).

Screenshot of dive:

01

@nnoll nnoll merged commit 36f72d5 into master Mar 18, 2022
@nnoll nnoll deleted the chore/docker branch March 18, 2022 21:27
@ivan-aksamentov
Copy link
Member Author

ivan-aksamentov commented Mar 18, 2022

CI seems to be working as expected:

https://github.com/neherlab/pangraph/runs/5607003707?check_suite_focus=true

the new tag 0.5.0 is up and the latest tag points to it - both have the same digest:

https://hub.docker.com/r/neherlab/pangraph/tags

01

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

installation issue Problems installing..
2 participants