Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shrink container image with --no-cache option on apk #459

Merged
merged 1 commit into from
Jun 26, 2023
Merged

Shrink container image with --no-cache option on apk #459

merged 1 commit into from
Jun 26, 2023

Conversation

OctopusET
Copy link
Contributor

This option shrinks container image more than half.
result: 1.95 GB -> 882 MB

@apyrgio apyrgio self-requested a review June 26, 2023 10:58
@apyrgio apyrgio added this to the 0.4.2 milestone Jun 26, 2023
@apyrgio
Copy link
Contributor

apyrgio commented Jun 26, 2023

Hey @OctopusET, thanks a lot for the contribution. We really want to reduce the size of the container image, so I tested it locally as well. I found that the original image from main was 838MiB, and with your changes, it got reduced to 836MiB.

I was confused a bit so I checked out the size of the APK index under /var/cache/apk, and in my case, it was just 2MiB, which explains the size difference:

$ ls -lh /var/cache/apk/
total 2M     
-rw-r--r--    1 root     root      422.1K Jun 26 11:02 APKINDEX.49104001.tar.gz
-rw-r--r--    1 root     root        1.4M Jun 26 11:02 APKINDEX.884188d8.tar.gz

So, I have two questions here:

  1. Could it be that the size difference that you're seeing is attributed to disabling image compression the first time? 1.95 GiB is the size of the uncompressed image, that's why I'm asking.
  2. Using --no-cache is indeed a reasonable setting for container images, and it uncovered an issue in our Dockerfile, if we also add it to apk add. If you're ok with the following code, can you add it to your branch, so that I can merge it?
diff --git a/Dockerfile b/Dockerfile
index c247982..5e0f1c6 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -4,7 +4,7 @@ ARG TESSDATA_CHECKSUM=990fffb9b7a9b52dc9a2d053a9ef6852ca2b72bd8dfb22988b0b990a70
 
 # Install dependencies
 RUN apk --no-cache -U upgrade && \
-    apk add \
+    apk --no-cache add \
     ghostscript \
     graphicsmagick \
     libreoffice \
@@ -26,7 +26,6 @@ RUN apk --no-cache -U upgrade && \
 RUN mkdir tessdata && cd tessdata \
     && TESSDATA_VERSION=$(wget -O- -nv https://api.github.com/repos/tesseract-ocr/tessdata/releases/latest \
         | sed -n 's/^.*"tag_name": "\([0-9.]\+\)".*$/\1/p') \
-    && apk --purge del jq \
     && wget https://github.com/tesseract-ocr/tessdata/archive/$TESSDATA_VERSION/tessdata-$TESSDATA_VERSION.tar.gz \
     && echo "$TESSDATA_CHECKSUM  tessdata-$TESSDATA_VERSION.tar.gz" | sha256sum -c \
     && tar -xzvf tessdata-$TESSDATA_VERSION.tar.gz -C . \

@OctopusET
Copy link
Contributor Author

Hello, thank you for the comment.

  1. I'm not sure, but I checked the size with podman images. I just tried re-build whole image without/with my PR. It still shows 1.9GB. I think --no-cache option doesn't help shrinking size that much. I think I got around 900MB because podman somehow used the compressed image or it's just part of the image. But still I think --no-cache option would be good for the container image maintenance.

  2. Thank you for the changes, I forgot to add it. I just added it.

It's not related to this PR.

This PR was made during I'm working on the adding a hwp supports.
But I have a problem, I managed to convert hwp, hwpx to pdf with extension (I mentioned on this comment.) in the container. But dangerzone fails when it's running pdfinfo.
I'd like to discuss about this, should I just open Draft PR first?

@apyrgio
Copy link
Contributor

apyrgio commented Jun 26, 2023

This PR was made during I'm working on the adding a hwp supports.

That's exciting!

But I have a problem, I managed to convert hwp, hwpx to pdf with extension (I mentioned on this #243 (comment).) in the container. But dangerzone fails when it's running pdfinfo.
I'd like to discuss about this, should I just open Draft PR first?

Sure, go ahead.

@apyrgio apyrgio merged commit 5b58576 into freedomofpress:main Jun 26, 2023
26 of 28 checks passed
@OctopusET
Copy link
Contributor Author

I just posted my draft PR #460!

Just curious, what was that line for?

&& apk --purge del jq

@apyrgio
Copy link
Contributor

apyrgio commented Jun 27, 2023

I just posted my draft PR #460!

Great, thanks! I'll comment there soon.

Just curious, what was that line for?

&& apk --purge del jq

Oh, good question. In order to get the Tesseract data version, I initially installed jq, parsed the version with it, and then deleted it. Once I figured out we can use sed instead of jq, I ditched the jq approach, but not the line that removes it.

This didn't cause any problem, up to the point where we removed the APK index. By doing so, this line would fail. So, I removed it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants