Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerfile enhancements #703

Merged
merged 4 commits into from
Feb 21, 2021

Conversation

superdude264
Copy link

#697

Shrunk the container from 1.66 GB to 912.5 MB. Most of the change is achieved in the first commit, which maintains total parity of output with the previous version. Later commits are remove unused artifacts, volume, & disable logging into the container (which in production environments would cause it to fail over after running for long periods).

Rob Lewis added 2 commits January 28, 2021 15:40
Maintains all artifacts & volumes from previous version.

* Move unzip to builder.
* Clear apt-cache in runner.
This artifact is not needed to run the service.
Copy link
Collaborator

@lfoppiano lfoppiano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, although I would not change the config file

grobid-service/config/config.yaml Outdated Show resolved Hide resolved
@coveralls
Copy link

coveralls commented Jan 29, 2021

Coverage Status

Coverage remained the same at 38.164% when pulling 23fe7fe on superdude264:dockerfile-enhancements into bfc10f7 on kermitt2:master.

@superdude264
Copy link
Author

Please let me know if there are any other changes you guys would like me to make to this MR.

Copy link
Collaborator

@lfoppiano lfoppiano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me

@superdude264
Copy link
Author

Should I upgrade to the base images to Java 11? I was able to do that locally with no issues. Java version >11 had issues though.

@superdude264
Copy link
Author

Additionally, tini is no longer required for Docker 1.13+: https://github.com/krallin/tini

You can see in the screenshot below that docker-init (built-in tini) is forked to tini.

image

@lfoppiano @kermitt2 Do you guys want me to continue pushing updates here?

@lfoppiano
Copy link
Collaborator

lfoppiano commented Feb 5, 2021

@superdude264 For backward compatibility maybe we can keep tiny for the time being and remove it later on. Regarding further improvements, is fine for me, indeed it depends on the changes.

@superdude264
Copy link
Author

@lfoppiano The only other change I have to upgrade the runtime image to Java 11.

@lfoppiano
Copy link
Collaborator

@superdude264 you can include the update to java 11 in the current PR I think

@superdude264
Copy link
Author

Here you go @lfoppiano. I left the builder image as-is and upgraded the runtime image to Java 11. The JVM now utilizes the memory assigned to the container, so there isn't a need to set the memory size in $JAVA_OPTS. There are some new informational warnings on startup:

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.google.inject.internal.cglib.core.$ReflectUtils$1 (file:/opt/grobid/grobid-service/lib/guice-4.1.0.jar) to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain)
WARNING: Please consider reporting this to the maintainers of com.google.inject.internal.cglib.core.$ReflectUtils$1
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release

I have not found a way to correct or silence these warning in our production setup, so I just left them.

Final image size is ~950 MB.

@kermitt2
Copy link
Owner

Thank you @superdude264 (and Luca!), it works fine for me too.

I started to adapt the changes to Dockerfile.delft to have something consistent (the DL dockerfile is more complicated and leads to a considerably larger image), which I will commit separately. If you can have a look to it too maybe, it would be great, otherwise no problem - it will be a very fat image anyway ;)

@kermitt2 kermitt2 merged commit 870c305 into kermitt2:master Feb 21, 2021
@kermitt2
Copy link
Owner

@superdude264 I have a question, you removed the volume for the GROBID tmp space:

VOLUME ["/opt/grobid/grobid-home/tmp"]

I've read somewhere that writing in a volume is faster than in the container directly, but maybe this is not the case. Do you have any hints on this?

@superdude264 superdude264 deleted the dockerfile-enhancements branch February 21, 2021 23:57
@superdude264
Copy link
Author

@kermitt2 Re: Dockerfile.delf, point me to the branch or tag me in an MR and I will take a look.

I will look into the question regarding write speed to a container vs a volume.

kermitt2 added a commit that referenced this pull request Feb 22, 2021
@kermitt2
Copy link
Owner

Thanks! see #721

@superdude264
Copy link
Author

RE: Removal of VOLUME ["/opt/grobid/grobid-home/tmp"]

The VOLUME command creates a volume on the host (most of the time; see caveats below) and the Docker documentation advocates using volumes for write-heavy paths here & here.

However, I would recommend using a tmpfs mount by adding --tmpfs /opt/grobid/grobid-home/tmp to the docker run commands in the GROBID with containers documentation. This will back that directory using tmpfs and result in better performance than writing to disk.

Additionally, it may be worthwhile to have grobid default to writing to the system temporary directory (using Files.createTempDirectory()), as this will normally be setup w/ tmpfs (in most Linux distributions running on bare metal) or container operators will create a tmpfs mount there as a best practice.

Caveats

I have not tested the above because I run grobid in AWS Fargate, which only supports network-mounted volumes (AWS Elastic File System) and so ignores VOLUME directives in Dockerfiles. Additionally, I do my development on macOS and tmpfs is Linux-only.

kermitt2 added a commit that referenced this pull request Mar 19, 2021
Aligned Deep Learning Dockerfile with PR #703
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants