Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aligned Deep Learning Dockerfile with PR #703 #721

Merged
merged 3 commits into from Mar 19, 2021

Conversation

kermitt2
Copy link
Owner

@kermitt2 kermitt2 commented Feb 22, 2021

I tried to adapt the change to the CRF-only dockerfile to the one using DeLFT, see PR #703 . This is reducing the image in a similar manner from 10.3GB to 9.8GB, but of course this is less impressive at this scale :)

Still using JRE 8 in the current form.

@kermitt2 kermitt2 mentioned this pull request Feb 22, 2021
@coveralls
Copy link

coveralls commented Feb 22, 2021

Coverage Status

Coverage decreased (-0.2%) to 38.94% when pulling d225477 on improved-dockerfile-delft into 14fe601 on master.

Comment on lines 73 to 77
RUN rm -rf /opt/grobid/grobid-home/pdf2xml/lin-32
RUN rm -rf /opt/grobid/grobid-home/pdf2xml/mac-64
RUN rm -rf /opt/grobid/grobid-home/pdf2xml/win-*
RUN rm -rf /opt/grobid/grobid-home/lib/lin-32
RUN rm -rf /opt/grobid/grobid-home/lib/win-*

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this up into the builder.

In Docker, each RUN, ADD, or COPY command create a tarball behind the scenes and all of these are bundled together to form a stage (e.g builder or runner). The final stage is what determines the size of the resulting container. Thus it's better to remove files in the builder, then copy to the runner.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what a good advice, this is brilliant ! I decreased the size of the crf image to 650MB doing that.

Dockerfile.delft Outdated
Comment on lines 79 to 80
# below to allow logs to be written in the container (not advised for production!)
# RUN mkdir -p logs

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've found that even if this directory is not present, log files will still be written to the container. To not write the logs /opt/grobid/grobid-service/config/config.yaml has to be updated so that the only appender is CONSOLE. This was discussed in #703, but that change was not made.

Dockerfile.delft Outdated
Comment on lines 82 to 83
# the following might not be necessary - but should result in faster temp file write than directly in the container
VOLUME ["/opt/grobid/grobid-home/tmp"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you ! I am removing the VOLUME and opening an issue for using Files.createTempDirectory().
The GROBID tmp is not really read/write intensive - just write/read/delete the PDF and the ALTO file, which are negligible as compared to the whole process.

@lfoppiano lfoppiano added this to the 0.6.2 milestone Mar 15, 2021
@kermitt2
Copy link
Owner Author

Reduction to 9.56GB so far... I think we can decrease by a bit more than 2GB by managing better the embeddings, but I leave that for a future release!

@kermitt2 kermitt2 merged commit a9a2aa5 into master Mar 19, 2021
@lfoppiano lfoppiano deleted the improved-dockerfile-delft branch March 21, 2021 01:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants