Skip to content

Commit

Permalink
Merge pull request #721 from kermitt2/improved-dockerfile-delft
Browse files Browse the repository at this point in the history
Aligned Deep Learning Dockerfile with PR #703
  • Loading branch information
kermitt2 committed Mar 19, 2021
2 parents a998649 + d225477 commit a9a2aa5
Show file tree
Hide file tree
Showing 2 changed files with 22 additions and 33 deletions.
49 changes: 19 additions & 30 deletions Dockerfile.delft
Expand Up @@ -21,13 +21,10 @@ FROM openjdk:8u275-jdk as builder
USER root

RUN apt-get update && \
apt-get -y --no-install-recommends install apt-utils libxml2
apt-get -y --no-install-recommends install unzip

WORKDIR /opt/grobid-source

RUN mkdir -p .gradle
VOLUME /opt/grobid-source/.gradle

# gradle
COPY gradle/ ./gradle/
COPY gradlew ./
Expand All @@ -41,9 +38,22 @@ COPY grobid-core/ ./grobid-core/
COPY grobid-service/ ./grobid-service/
COPY grobid-trainer/ ./grobid-trainer/

# cleaning unused native libraries before packaging
RUN rm -rf grobid-home/pdf2xml/lin-32
RUN rm -rf grobid-home/pdf2xml/mac-64
RUN rm -rf grobid-home/pdf2xml/win-*
RUN rm -rf grobid-home/lib/lin-32
RUN rm -rf grobid-home/lib/win-*
RUN rm -rf grobid-home/lib/mac-64

RUN ./gradlew clean assemble --no-daemon --info --stacktrace

WORKDIR /opt
WORKDIR /opt/grobid
RUN unzip -o /opt/grobid-source/grobid-service/build/distributions/grobid-service-*.zip && \
mv grobid-service* grobid-service
RUN unzip -o /opt/grobid-source/grobid-home/build/distributions/grobid-home-*.zip && \
chmod -R 755 /opt/grobid/grobid-home/pdf2xml
RUN rm -rf grobid-source

# -------------------
# build runtime image
Expand All @@ -60,35 +70,21 @@ ENV LANG C.UTF-8
RUN apt-get update && \
apt-get -y --no-install-recommends install apt-utils build-essential gcc libxml2 unzip curl \
openjdk-8-jre-headless ca-certificates-java \
# git \
musl gfortran \
python3 python3-pip python3-setuptools python3-dev \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

WORKDIR /opt

COPY --from=builder /opt/grobid-source/grobid-core/build/libs/grobid-core-*-onejar.jar ./grobid/grobid-core-onejar.jar
COPY --from=builder /opt/grobid-source/grobid-service/build/distributions/grobid-service-*.zip ./grobid-service.zip
COPY --from=builder /opt/grobid-source/grobid-home/build/distributions/grobid-home-*.zip ./grobid-home.zip
WORKDIR /opt/grobid

RUN unzip -o ./grobid-service.zip -d ./grobid && \
mv ./grobid/grobid-service-* ./grobid/grobid-service
COPY --from=builder /opt/grobid .

RUN unzip ./grobid-home.zip -d ./grobid && \
mkdir -p /opt/grobid/grobid-home/tmp
RUN rm *.zip
RUN rm -rf /opt/grobid/grobid-home/pdf2xml/lin-32
RUN rm -rf /opt/grobid/grobid-home/pdf2xml/mac-64
RUN rm -rf /opt/grobid/grobid-home/pdf2xml/win-*
RUN rm -rf /opt/grobid/grobid-home/lib/lin-32
RUN rm -rf /opt/grobid/grobid-home/lib/win-*

# below to allow logs to be written in the container
# RUN mkdir -p logs

VOLUME ["/opt/grobid/grobid-home/tmp"]

RUN python3 -m pip install pip --upgrade

# install DeLFT via pypi
Expand All @@ -99,13 +95,6 @@ RUN mkdir -p /data \
&& ln -s /data /opt/grobid/data \
&& ln -s /data ./data

# install DeLFT by cloning the repo - only for dev time!
#RUN git clone https://github.com/kermitt2/delft
#WORKDIR /opt/delft
#RUN pip3 install -r requirements.txt
# cleaning useless delft data
#RUN rm -rf data/sequenceLabelling data/textClassification data/test data/models/sequenceLabelling data/models/textClassification .git

# disable python warnings (and fix logging)
ENV PYTHONWARNINGS="ignore"

Expand All @@ -119,8 +108,8 @@ ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini /tini
RUN chmod +x /tini
ENTRYPOINT ["/tini", "-s", "--"]

RUN chmod -R 755 /opt/grobid/grobid-home/pdf2xml
RUN chmod 777 /opt/grobid/grobid-home/tmp
#RUN chmod -R 755 /opt/grobid/grobid-home/pdf2xml
#RUN chmod 777 /opt/grobid/grobid-home/tmp

# install jep (and temporarily the matching JDK)
ENV TEMP_JDK_HOME=/tmp/jdk-${JAVA_VERSION}
Expand Down
6 changes: 3 additions & 3 deletions doc/Grobid-docker.md
Expand Up @@ -50,7 +50,7 @@ The process for retrieving and running the image is as follow:
> docker pull grobid/grobid:${latest_grobid_version}
```

- Run the container (note the new version running on 8070, however it will be mapped on the 8080 of your host):
- Run the container:

```bash
> docker run --rm --gpus all --init grobid/grobid:${latest_grobid_version}
Expand All @@ -61,7 +61,7 @@ The image will automatically uses the GPU and CUDA version available on your hos
To specify to use only certain GPUs (see the [nvidia container toolkit user guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/user-guide.html#gpu-enumeration) for more details):

```bash
> docker run --rm --gpus '"device=1,2"' --init -p 8070:8080 -p 8071:8081 grobid/grobid:${latest_grobid_version}
> docker run --rm --gpus '"device=1,2"' --init -p 8080:8070 -p 8081:8071 grobid/grobid:${latest_grobid_version}
```

You can run the image on CPU by omitting the `-gpus` parameters.
Expand All @@ -83,7 +83,7 @@ Grobid web services are then available as described in the [service documentatio
The simplest way to pass a modified configuration to the docker image is to mount the property file `grobid.properties` when running the image. Modify the config file `grobid/grobid-home/config/grobid.properties` according to your requirements on the host machine and mount it when running the image as follow:

```bash
docker run --rm --gpus all --init -p 8070:8080 -p 8071:8081 -v /home/lopez/grobid/grobid-home/config/grobid.properties:/opt/grobid/grobid-home/config/grobid.properties:ro grobid/grobid:0.6.2-SNAPSHOT
docker run --rm --gpus all --init -p 8080:8070 -p 8081:8071 -v /home/lopez/grobid/grobid-home/config/grobid.properties:/opt/grobid/grobid-home/config/grobid.properties:ro grobid/grobid:0.6.2-SNAPSHOT
```

You need to use an absolute path to specify your modified `grobid.properties` file.
Expand Down

0 comments on commit a9a2aa5

Please sign in to comment.