New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aligned Deep Learning Dockerfile with PR #703 #721
Conversation
RUN rm -rf /opt/grobid/grobid-home/pdf2xml/lin-32 | ||
RUN rm -rf /opt/grobid/grobid-home/pdf2xml/mac-64 | ||
RUN rm -rf /opt/grobid/grobid-home/pdf2xml/win-* | ||
RUN rm -rf /opt/grobid/grobid-home/lib/lin-32 | ||
RUN rm -rf /opt/grobid/grobid-home/lib/win-* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move this up into the builder.
In Docker, each RUN, ADD, or COPY command create a tarball behind the scenes and all of these are bundled together to form a stage (e.g builder or runner). The final stage is what determines the size of the resulting container. Thus it's better to remove files in the builder, then copy to the runner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what a good advice, this is brilliant ! I decreased the size of the crf image to 650MB doing that.
Dockerfile.delft
Outdated
# below to allow logs to be written in the container (not advised for production!) | ||
# RUN mkdir -p logs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've found that even if this directory is not present, log files will still be written to the container. To not write the logs /opt/grobid/grobid-service/config/config.yaml
has to be updated so that the only appender is CONSOLE
. This was discussed in #703, but that change was not made.
Dockerfile.delft
Outdated
# the following might not be necessary - but should result in faster temp file write than directly in the container | ||
VOLUME ["/opt/grobid/grobid-home/tmp"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #703 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you ! I am removing the VOLUME and opening an issue for using Files.createTempDirectory()
.
The GROBID tmp is not really read/write intensive - just write/read/delete the PDF and the ALTO file, which are negligible as compared to the whole process.
Reduction to 9.56GB so far... I think we can decrease by a bit more than 2GB by managing better the embeddings, but I leave that for a future release! |
I tried to adapt the change to the CRF-only dockerfile to the one using DeLFT, see PR #703 . This is reducing the image in a similar manner from 10.3GB to 9.8GB, but of course this is less impressive at this scale :)
Still using JRE 8 in the current form.