Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix devnet Mithril Docker images #1272

Closed
2 tasks done
jpraynaud opened this issue Oct 3, 2023 · 13 comments · Fixed by #1324
Closed
2 tasks done

Fix devnet Mithril Docker images #1272

jpraynaud opened this issue Oct 3, 2023 · 13 comments · Fixed by #1324
Assignees
Labels
devX 🌞 Developer experience

Comments

@jpraynaud
Copy link
Member

jpraynaud commented Oct 3, 2023

Why

The current devnet can launch Docker images for the Mithril nodes. There are several drawbacks that we would need to address to enhance the DevX.

What

  • Make build times (much) faster for the Docker images
  • Fix missing parameters that have not been implemented yet

How

  • Implement missing parameters for aggregator node
  • Accelerate build times for Docker images (use Docker.ci images instead and pack a pre-built binary)
@jpraynaud jpraynaud added devX 🌞 Developer experience to-groom 🤔 Needs grooming and removed to-groom 🤔 Needs grooming labels Oct 3, 2023
@TrevorBenson
Copy link
Contributor

TrevorBenson commented Oct 15, 2023

@jpraynaud I took a stab at this. I've been building/running cardano-node in containers for a few years, and also mithril-signer in containers on the preprod and preview networks. Unfortunately, both my preview and preprod signers have been offline for 1-2 epochs, so I wouldn't be able to simply restart them and quickly observe a certificate being signed. I figured I'd at least post the branch here for review in case anyone else wants to test building, or running the image on either network before I get a chance.

The branch of my fork it can be reviewed from is: fix-devnet-mithril-docker-images. It only has the Dockerfile for the mithril-signer image changed. If this would be welcome and not conflict with ongoing work by others I could apply this to other container images as well, though I'm mostly familiar with the signer and client as I haven't set up a full aggregator beyond the automatic devnet tests.

Mithril Signer

Observed Results

  • Build times (on my laptop at least) of around 40-60 seconds
  • Reduced container size by nearly 50%
  • Fewer image layers

Can seem counterintuitive to have fewer layers when often articles suggest more. However, the image build is slightly faster with fewer layers even with identical commands, although not the primary reason or benefit. The image size is reduced from 338MB to 173MB, or reduced ~48%.

Changes Review

  • Re-ordered commands to reduce cache invalidations.
  • Installation of release binaries combined into single layers (commands).
    • One layer for cardano-cli, compared to the original 4 layers.
    • One layer for mithril-signer.

I can answer questions about how the changes affect the cache, build times, and container sizes if anyone is interested. I haven't tried with Buildah or Buildkit yet to create a complete scratch image. That is another option that could reduce the size but could increase build times, and/or slightly complicate the build process for developers.

@TrevorBenson
Copy link
Contributor

TrevorBenson commented Oct 16, 2023

The preview network passed enough epochs to have signed a certificate, so I've stopped and restarted the mithril-signer from the new image, just waiting to confirm the next signature I participate in.

I committed Dockerfiles for both mithril-client and mithril-aggregator that use the release binaries. I tried the devnet from the current documentation, but I ran into

Error: configuration deserialize error

Caused by:
    missing field `cardano_node_version`

I don't recall if I saw this when I tested devnet long ago. It seems familiar, but if worked around this before I'm not finding my notes explaining how I did it.

@TrevorBenson
Copy link
Contributor

TrevorBenson commented Oct 16, 2023

The building takes under 60 seconds for each image. The client reduced 138MB down to 122MB, dropping 16MB or 11.5%. The aggregator reduced 367MB down to 188MB, dropping 179MB or, again, 48%.

From release binaries and adjustments to slim containers.

$ podman images
REPOSITORY                                 TAG                   IMAGE ID      CREATED        SIZE
localhost/mithril/mithril-aggregator       latest                362cbc64c029  4 hours ago    188 MB
localhost/mithril/mithril-client           latest                da7d171d8c41  4 hours ago    122 MB
localhost/mithril/mithril-signer           latest                f6831e29909a  4 hours ago    173 MB

current ghcr.io images

# podman images
REPOSITORY                                       TAG            IMAGE ID      CREATED       SIZE
ghcr.io/input-output-hk/mithril-aggregator       latest         9119b9e817c7  2 weeks ago   367 MB
ghcr.io/input-output-hk/mithril-client           latest         de66c901dc86  2 weeks ago   138 MB
ghcr.io/input-output-hk/mithril-signer           latest         c45b801379a4  2 weeks ago   338 MB

@TrevorBenson
Copy link
Contributor

Here is a certificate signed from the mithril-signer image f6831e29909a: 4b3468138da933e8b7ad80537398426197c65fd62465c9227bcd3c36b6699654 by POOL_ID pool1vezalga3ge0mt0xf4txz66ctufk6nrmemhhpshwkhedk5jf0stw.

@TrevorBenson
Copy link
Contributor

Implement missing parameters for aggregator node

I'm unsure what this part meant or would have attempted to address it.

I can open a PR with the branch as is and during the review discuss how it should be handled or if it should be split into a separate PR. As well as if I overlooked anything that may not have been spelled out in the issue.

@jpraynaud
Copy link
Member Author

@TrevorBenson thanks for your feedback 👍

We have also noticed that there is a configuration problem with the cardano_node_version that is not specified and a fix is available in this PR #1297: once it is merged, you will no longer see the missing field cardano_node_version error message.

We currently have 2 docker images for each node:

  • Dockerfile: the ones that are used in the devnet, that compiles the nodes with the Rust compiler during the Docker build phase. They are used only by developers who want to test their code on a real Cardano network locally. These are the images we want to optimize because they take way too long to build, which does not currently provide a good developer experience.
  • Dockerfile.ci: the ones that are used in the CI/CD and released here, and that are available for use on the Mithril networks (here is a GitHub Action that does this job). As you can see, the Dockerfile.ci is very close to what you have proposed on the branch main...chaincrucial:mithril:fix-devnet-mithril-docker-images#diff-35e380a47d20db39f2d7ec92bb1f15e79e55c4d710643cb7cf79596267189b36, and the build is very fast. (We intend to use this Docker image with the devnet as explained in the summary of this issue)

It looks like we can significantly optimize the size of the images we release with the Dockerfile.ci file, and a quick test showed that the one line command for the setup of the Cardano cli already removes 150MB on the image size 👏

Feel free to create a PR that implements the optimizations for the Dockerfile.ci file 🙂

@TrevorBenson
Copy link
Contributor

TrevorBenson commented Oct 16, 2023

Will do. Should the current commits be included, or dropped and let the developers do complete rust builds for devnet?

Should I presume a buildx cache for the workflow is of interest as well?

@TrevorBenson
Copy link
Contributor

TrevorBenson commented Oct 16, 2023

@jpraynaud
I cleaned up the current commits to bring them more in line with the existing Dockerfile style as well as reduce cache invalidation a bit more. I've updated the Dockerfile.ci files as well.

I haven't modified the release workflow to include the Docker Buildx cache or additional changes that might be useful when using a cache.

I did temporarily rebase onto your branch to test on the devnet, and that looks good as well. If you feel this is ready I will open a PR for further review discussion and suggestions.

TrevorBenson added a commit to chaincrucial/mithril that referenced this issue Oct 16, 2023
TrevorBenson added a commit to chaincrucial/mithril that referenced this issue Oct 16, 2023
@TrevorBenson
Copy link
Contributor

The two Dockerfiles are very close to each other at this point.

Differences:

  • adduser does not always use --no-create-home
  • sqlite3 installed for devnet Dockerfile, but not for Dockerfile.ci
  • chmod +x / chmod a+x sometimes handled in more than one line.
  • ENTRYPOINT level of verbosity is, sometimes, higher for Dockerfile than Dockerfile.ci

Otherwise, they are almost identical enough to use a single file instead of two separate ones.

TrevorBenson added a commit to chaincrucial/mithril that referenced this issue Oct 16, 2023
TrevorBenson added a commit to chaincrucial/mithril that referenced this issue Oct 16, 2023
TrevorBenson added a commit to chaincrucial/mithril that referenced this issue Oct 17, 2023
@jpraynaud jpraynaud self-assigned this Oct 18, 2023
TrevorBenson added a commit to chaincrucial/mithril that referenced this issue Oct 18, 2023
TrevorBenson added a commit to chaincrucial/mithril that referenced this issue Oct 18, 2023
TrevorBenson added a commit to chaincrucial/mithril that referenced this issue Oct 18, 2023
@jpraynaud
Copy link
Member Author

Thanks @TrevorBenson for your contribution!

The size of the Docker images of the signer and the aggregator dropped significantly👏

$ docker images | grep -e "2337.0-7d5a683" -e "2342.0-pre-e5ae362" 
ghcr.io/input-output-hk/mithril-aggregator   2342.0-pre-e5ae362   77d3ab8036ed   31 minutes ago   215MB
ghcr.io/input-output-hk/mithril-signer       2342.0-pre-e5ae362   57d864a659a1   31 minutes ago   186MB
ghcr.io/input-output-hk/mithril-client       2342.0-pre-e5ae362   05093eb75606   31 minutes ago   134MB
ghcr.io/input-output-hk/mithril-aggregator   2337.0-7d5a683       9119b9e817c7   3 weeks ago      363MB
ghcr.io/input-output-hk/mithril-signer       2337.0-7d5a683       c45b801379a4   3 weeks ago      334MB
ghcr.io/input-output-hk/mithril-client       2337.0-7d5a683       de66c901dc86   3 weeks ago      134MB

@TrevorBenson
Copy link
Contributor

TrevorBenson commented Oct 19, 2023

Thanks @TrevorBenson for your contribution!

The size of the Docker images of the signer and the aggregator dropped significantly👏

$ docker images | grep -e "2337.0-7d5a683" -e "2342.0-pre-e5ae362" 
ghcr.io/input-output-hk/mithril-aggregator   2342.0-pre-e5ae362   77d3ab8036ed   31 minutes ago   215MB
ghcr.io/input-output-hk/mithril-signer       2342.0-pre-e5ae362   57d864a659a1   31 minutes ago   186MB
ghcr.io/input-output-hk/mithril-client       2342.0-pre-e5ae362   05093eb75606   31 minutes ago   134MB
ghcr.io/input-output-hk/mithril-aggregator   2337.0-7d5a683       9119b9e817c7   3 weeks ago      363MB
ghcr.io/input-output-hk/mithril-signer       2337.0-7d5a683       c45b801379a4   3 weeks ago      334MB
ghcr.io/input-output-hk/mithril-client       2337.0-7d5a683       de66c901dc86   3 weeks ago      134MB

Great. One thing stands out as odd to me. The 2342 build is marginally smaller for signer 2MB (offset by +300K for aggregator) than the 2337 build, yet the result using the compiled version (vs. the tarball for release binaries):

  • aggregator image 12.5% larger, 215MB instead of 188MB
  • signer 8% larger, 186MB instead of 173MB
  • client zero reduction in size (134MB) vs. 122MB
$ tar tzvf mithril-2342.0-pre-linux-x64.tar.gz |grep -e mithril-aggregator -e mithril-signer
-rw-rw-rw- runner/docker 36212416 2023-10-19 00:31 mithril-aggregator
-rw-rw-rw- runner/docker 21656120 2023-10-19 00:31 mithril-signer
$ tar tzvf mithril-2337.0-linux-x64.tar.gz |grep -e mithril-aggregator -e mithril-signer
-rw-rw-rw- runner/docker 36543824 2023-09-21 05:20 mithril-aggregator
-rw-rw-rw- runner/docker 21851360 2023-09-21 05:20 mithril-signer

I wouldn't expect that to be the final result given the binaries should be identical to what is in the pre release tarball. I'm not overly concerned since we at least made progress on shrinking them, however it does make me go 🤔

EDIT: I use podman, and subsequently buildah. I don't recall that resulting in any real size differences in the past, but suppose it is possible it is something different on my side and not actually a difference between compiled and tarball.

@jpraynaud
Copy link
Member Author

Thanks @TrevorBenson for your contribution!
The size of the Docker images of the signer and the aggregator dropped significantly👏

$ docker images | grep -e "2337.0-7d5a683" -e "2342.0-pre-e5ae362" 
ghcr.io/input-output-hk/mithril-aggregator   2342.0-pre-e5ae362   77d3ab8036ed   31 minutes ago   215MB
ghcr.io/input-output-hk/mithril-signer       2342.0-pre-e5ae362   57d864a659a1   31 minutes ago   186MB
ghcr.io/input-output-hk/mithril-client       2342.0-pre-e5ae362   05093eb75606   31 minutes ago   134MB
ghcr.io/input-output-hk/mithril-aggregator   2337.0-7d5a683       9119b9e817c7   3 weeks ago      363MB
ghcr.io/input-output-hk/mithril-signer       2337.0-7d5a683       c45b801379a4   3 weeks ago      334MB
ghcr.io/input-output-hk/mithril-client       2337.0-7d5a683       de66c901dc86   3 weeks ago      134MB

Great. One thing stands out as odd to me. The 2342 build is marginally smaller for signer 2MB (offset by +300K for aggregator) than the 2337 build, yet the result using the compiled version (vs. the tarball for release binaries):

* aggregator image 12.5% larger, 215MB instead of 188MB

* signer 8% larger, 186MB instead of 173MB

* client zero reduction in size (134MB) vs. 122MB
$ tar tzvf mithril-2342.0-pre-linux-x64.tar.gz |grep -e mithril-aggregator -e mithril-signer
-rw-rw-rw- runner/docker 36212416 2023-10-19 00:31 mithril-aggregator
-rw-rw-rw- runner/docker 21656120 2023-10-19 00:31 mithril-signer
$ tar tzvf mithril-2337.0-linux-x64.tar.gz |grep -e mithril-aggregator -e mithril-signer
-rw-rw-rw- runner/docker 36543824 2023-09-21 05:20 mithril-aggregator
-rw-rw-rw- runner/docker 21851360 2023-09-21 05:20 mithril-signer

I wouldn't expect that to be the final result given the binaries should be identical to what is in the pre release tarball. I'm not overly concerned since we at least made progress on shrinking them, however it does make me go 🤔

I think you misread the results 🙂

  • aggregator size is -40% : 363MB to 215MB,
  • signer size is -45%: 334MB to 186MB
  • client same size: 134MB to 3134MB (normal as there are no layer optimization for the cardano cli)

@TrevorBenson
Copy link
Contributor

@jpraynaud I'm referring to the original sizes I observed with the Dockerfile builds:

#1272 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devX 🌞 Developer experience
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants