Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: add docker container #27

Merged
merged 26 commits into from
Mar 18, 2022
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
e76e839
chore: add docker container
ivan-aksamentov Mar 17, 2022
f825867
chore(docker): fix missing ete3 python dependency
ivan-aksamentov Mar 17, 2022
1003f5b
chore: mv minimap2 wrapping code to deps/
nnoll Mar 17, 2022
ec35779
chore: stop task spawning
nnoll Mar 17, 2022
8d5113c
chore(docker): sort out the runtime docker dependencies
ivan-aksamentov Mar 17, 2022
b32ba31
chore(docker): remove docker run step
ivan-aksamentov Mar 17, 2022
8e121d7
chore(ci): tag and push docker image
ivan-aksamentov Mar 17, 2022
c5d30c3
Merge remote-tracking branch 'origin/master' into chore/docker
ivan-aksamentov Mar 17, 2022
599a4f7
docs(dev): add dev docs for building docker and releasing
ivan-aksamentov Mar 17, 2022
0ab1dee
Merge remote-tracking branch 'origin/master' into chore/docker
ivan-aksamentov Mar 17, 2022
46360ca
docs(dev): clarify docker image verification
ivan-aksamentov Mar 17, 2022
3481780
docs(readme): add docker badges
ivan-aksamentov Mar 17, 2022
259d81d
docs(dev): clarify releasing docs
ivan-aksamentov Mar 17, 2022
71678e6
docs(dev): clarify releasing docs even more
ivan-aksamentov Mar 17, 2022
4a1530c
docs(dev): clarify releasing docs again
ivan-aksamentov Mar 17, 2022
d904903
Merge remote-tracking branch 'origin/master' into chore/docker
ivan-aksamentov Mar 18, 2022
cb4bf60
docs: add user docs for docker
ivan-aksamentov Mar 18, 2022
3344b0f
docs(dev): add a note on docker image optimization
ivan-aksamentov Mar 18, 2022
710021b
chore(docker): remove extra files from julia home dir
ivan-aksamentov Mar 18, 2022
d873335
chore: reenable multithreaded
nnoll Mar 18, 2022
9fe5f9b
chore: simplify makefile and dockerfile
nnoll Mar 18, 2022
bc6e4e7
chore(docker): cleanup
ivan-aksamentov Mar 18, 2022
f86eb92
chore(docker): cleanup
ivan-aksamentov Mar 18, 2022
79fd62c
chore(docker): use the same base image for both stages in Dockerfile
ivan-aksamentov Mar 18, 2022
239b7f2
chore(docker): allow running container as non-root
ivan-aksamentov Mar 18, 2022
2501f47
chore(docker): dockerignore the dockerfile to avoid full rebuilds
ivan-aksamentov Mar 18, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
14 changes: 14 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
*
!/.env
!/Makefile
!/Manifest.toml
!/Project.toml
!/bin/
!/compile.jl
!/data/
!/deps/
!/example_datasets/
!/script/
!/src/
!/trace.jl
!/vendor/minimap2/
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
\.cache
.dep
.local
data
Expand All @@ -13,6 +14,9 @@ pangraph.tar.gz
bin
tutorial

deps/minimap2/build
deps/minimap2/products

*.aux
*.bbl
*.blg
Expand Down
65 changes: 65 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Stage: builder image
# This istage builds use a lot of dependencies and produce the binaries. The results will be copied
# to another image and the builder image will be discarded.
FROM ubuntu:20.04 as builder

SHELL ["bash", "-c"]


RUN set -euxo pipefail \
&& export DEBIAN_FRONTEND=noninteractive \
&& apt-get update -qq --yes \
&& apt-get install -qq --no-install-recommends --yes \
build-essential \
ca-certificates \
curl \
make \
mafft \
>/dev/null \
&& apt-get autoremove --yes >/dev/null \
&& apt-get clean autoclean >/dev/null \
&& rm -rf /var/lib/apt/lists/*

# TODO: We need to set the PATH to Julia bin dir. However the version is hardwired into the path.
# We need to install Julia to a version-neutral dir.
ENV PATH="/build_dir/bin:/build_dir/vendor/julia/bin:$PATH"

COPY bin /build_dir/bin

RUN set -euxo pipefail \
&& mkdir -p /build_dir/vendor

COPY . /build_dir/

RUN set -euxo pipefail \
&& cd /build_dir \
&& jc=$(which julia) make


# Stage: production image
# We start over, from clean debian image, and copy the binaries from the builder stage.
FROM debian:11 as prod

# Copy pangraph from the builder stage
COPY --from=builder /build_dir/pangraph/ /usr/

# Copy julia dependencies from the builder stage
COPY --from=builder /root/.julia/artifacts /root/.julia/artifacts
COPY --from=builder /root/.julia/conda/3/bin /root/.julia/conda/3/bin
COPY --from=builder /root/.julia/conda/3/lib /root/.julia/conda/3/lib

SHELL ["bash", "-c"]

RUN set -euxo pipefail \
&& export DEBIAN_FRONTEND=noninteractive \
&& apt-get update -qq --yes \
&& apt-get install -qq --no-install-recommends --yes \
mafft \
mash \
>/dev/null \
&& apt-get autoremove --yes >/dev/null \
&& apt-get clean autoclean >/dev/null \
&& rm -rf /var/lib/apt/lists/*


CMD ["/usr/bin/pangraph"]
52 changes: 42 additions & 10 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -5,44 +5,49 @@
version := 1.7.2

ifeq ($(jc),)
jc := ./vendor/julia-$(version)/bin/julia
jc := ./vendor/julia/bin/julia
endif

jflags := -q --project=.
julia := julia $(jflags)
jflags := --project=.
srcs := $(wildcard src/*.jl src/*/*.jl)
# julia := julia $(jflags)

datadir := data/synthetic
testdatum := $(datadir)/test.fa

all: pangraph install
all: pangraph

install: pangraph/bin/pangraph
ln -s $$(pwd)/$< bin/pangraph

environment:
bin/setup-pangraph
environment: $(jc)
$(jc) $(jflags) -e 'import Pkg; Pkg.instantiate();'
$(jc) $(jflags) -e 'import Pkg; Pkg.add(name="Conda"); import Conda; Conda.add("ete3", channel="etetoolkit")' \
$(jc) $(jflags) -e 'import Pkg; Pkg.build();'

pangraph: pangraph/bin/pangraph

$(datadir):
mkdir -p $@

$(testdatum): | $(jc) $(datadir)
$(jc) $(jflags) -e 'import Pkg; Pkg.instantiate(); Pkg.build()'
$(testdatum): | environment $(jc) $(datadir)
$(jc) $(jflags) -e 'using PanGraph; PanGraph.Simulation.test()'

# TODO: look for ARM vs x86
# TODO: julia gets installed into a directory containing version number. This makes it impossible to refer to the
# installation outside of this file.
$(jc):
ifeq ($(shell uname -s),Linux)
cd vendor && \
curl -L https://julialang-s3.julialang.org/bin/linux/x64/$(basename $(version))/julia-$(version)-linux-x86_64.tar.gz -o julia-$(version)-linux-x86_64.tar.gz && \
tar xzf julia-$(version)-linux-x86_64.tar.gz
tar xzf julia-$(version)-linux-x86_64.tar.gz && \
mv julia-$(version) julia
else
ifeq ($(shell uname -s),Darwin)
cd vendor && \
curl -L https://julialang-s3.julialang.org/bin/mac/x64/$(basename $(version))/julia-$(version)-mac64.tar.gz -o julia-$(version)-mac64.tar.gz && \
tar xzf julia-$(version)-mac64.tar.gz
tar xzf julia-$(version)-mac64.tar.gz && \
mv julia-$(version) julia
else
$(error unsupported host system)
endif
Expand All @@ -61,3 +66,30 @@ clean:
rm -rf pangraph pangraph.tar.gz

include script/rules.mk


export CONTAINER_NAME=neherlab/pangraph

SHELL=bash
.ONESHELL:
docker:
set -euxo pipefail

# If $RELEASE_VERSION is set, use it as an additional docker tag
export DOCKER_TAGS="--tag $${CONTAINER_NAME}:latest"
if [ ! -z "$${RELEASE_VERSION:-}" ]; then
export DOCKER_TAGS="$${DOCKER_TAGS} --tag $${CONTAINER_NAME}:$${RELEASE_VERSION}"
fi

docker build \
--target prod \
--build-arg UID=$(shell id -u) \
--build-arg GID=$(shell id -g) \
$${DOCKER_TAGS} \
.

docker-push:
set -euxo pipefail
: "$${RELEASE_VERSION:?The RELEASE_VERSION environment variable is required.}"
docker push ${CONTAINER_NAME}:${RELEASE_VERSION}
docker push ${CONTAINER_NAME}:latest
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# pangraph

[![Documentation](https://img.shields.io/badge/Documentation-Link-blue.svg)](https://neherlab.github.io/pangraph/)
![Docker Image Version (latest semver)](https://img.shields.io/docker/v/neherlab/pangraph?label=docker)
![Docker Pulls](https://img.shields.io/docker/pulls/neherlab/pangraph)

> a bioinformatic toolkit to align large sets of closely related genomes into a graph data structure

Expand Down
File renamed without changes.
File renamed without changes.
33 changes: 33 additions & 0 deletions docs/dev/buiding-docker.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
## 👷 Building pangraph Docker image locally

### Install dependencies

- Install bash and make.

- Install Docker: https://docs.docker.com/get-docker/

- Optionally setup Docker so that it runs without `sudo`: https://docs.docker.com/engine/install/linux-postinstall/


### Build Docker image locally

Run:

```bash
make docker
```

This will build the Docker image tagged `neherlab/pangraph` (more precisely `neherlab/pangraph:latest`). If already exists, it will replace the existing image with that tag. The build will take some time.

If completed successfully, then the image can be used right away. Refer to user documentation. Skip the "Pull Docker image" step.


### Explore contents, layers and optimize image size

You could use [dive tool](https://github.com/wagoodman/dive) to see what's inside an image:

```bash
dive neherlab/pangraph:<tag>
```

Each [layer](https://stackoverflow.com/questions/31222377/what-are-docker-image-layers) reflects `FROM`, `COPY` and `RUN` commands and the files that have been added to the overlay file system of the image. This can be used to find redundant files. You could then further optimize `Dockerfile` and make the image smaller.
74 changes: 74 additions & 0 deletions docs/dev/releasing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
## 🆕 Releasing pangraph

### Releasing a new version

Continuous integration (CI) will build a new version of the Docker container (see `Dockerfile`) on every pushed git tag.

Make sure you are on a correct branch and commit. Most of the time you want to release code from `master`:

```bash
git checkout master
```

In order to create and push a git tag, run:

```
git tag $RELEASE_VERSION
git push origin --tags
```

where `$RELEASE_VERSION` is a valid [semantic version](https://semver.org/), without a `v` prefix (i.e. `1.2.3` is correct, `v1.2.3` is not).

The CI workflow will build the container image and will push it to Docker Hub. The image will be tagged with:

- `latest` (and will overwrite existing `latest` tag there)
- `$RELEASE_VERSION`

Both tags should point to the same image, i.e. their sha hashes should be exactly the same.

This image version can then be referred to as:

- `neherlab/pangraph:$RELEASE_VERSION`
- `neherlab/pangraph:latest`
- `neherlab/pangraph` (which is the same as `neherlab/pangraph:latest`)

for example in `docker pull` and `docker run` commands.


### Monitoring and debugging CI build

The status of the builds can be seen on GitHub Actions page:

https://github.com/neherlab/pangraph/actions

### Verifying CI build

After CI build successfully finishes, check Docker Hub to ensure that the new tag is present and that the "latest" tag is updated and points to the same hash:

https://hub.docker.com/r/neherlab/pangraph

Pull and run the new version to make sure it works as expected:

```bash
docker pull neherlab/pangraph:$RELEASE_VERSION

docker run --rm -it \
--name "pangraph-$(date +%s)" \
--volume="$(pwd)/path-to-fasta:/workdir" \
--workdir=/workdir neherlab/pangraph:$RELEASE_VERSION \
bash -c "pangraph build --circular --alpha 0 --beta 0 /workdir/test.fa"
```

Here we mount local directory `path-to-fasta` as `/workdir` so that pangraph can read the `/workdir/test.fa"` file.

> 👷 TODO: implement automated tests


### Modifying continuous integration workflow

See `.github/workflows/build.yml`


### Modifying Docker image

See `Dockerfile`
49 changes: 49 additions & 0 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,55 @@ The documentation, and source code, uses the following terminology:

There are multiple ways to install PanGraph (either the library or just command line interface)

### Using Docker

Docker container image for PanGraph is available on Docker Hub: https://hub.docker.com/r/neherlab/pangraph

- Install Docker

Install Docker as described on the official website: https://docs.docker.com/get-docker/

Optionally setup Docker so that it runs without `sudo` on Linux: https://docs.docker.com/engine/install/linux-postinstall/

- Pull a version of the image

To obtain the latest version, run:

```bash
docker pull neherlab/pangraph:latest
```

To obtain a specific version, for example `1.2.3`, run:

```bash
docker pull neherlab/pangraph:1.2.3
```

- Run PanGraph container

Issue `docker run` command:

```bash
docker run --rm -it \
--name "pangraph-$(date +%s)" \
--volume="$(pwd):/workdir" \
--workdir=/workdir neherlab/pangraph:latest \
bash -c "pangraph build --circular --alpha 0 --beta 0 /workdir/data/synthetic/test.fa"
```

Here we mount current directory `.` (expressed as absolute path, using `pwd` shell command) as `/workdir` into the container so that pangraph can read the local
file `./data/synthetic/test.fa` as `/workdir/data/synthetic/test.fa"`:

```
. -> /workdir
./data/synthetic/test.fa -> /workdir/data/synthetic/test.fa
```

The `--name` flag sets the name of the container and the `date` command there ensures that a unique name is created on every run. This is optional. The `--rm` flag deletes the container (but not the image) after run.

Replace `:latest` with a specific version if desired. The `:latest` tag can also be omitted, as it is the default.


### From Julia REPL
```julia
(@v1.x) pkg> add https://github.com/neherlab/pangraph.git
Expand Down