Skip to content

Commit

Permalink
Merge pull request #482 from elifesciences/support-environment-variab…
Browse files Browse the repository at this point in the history
…le-override

added support for environment variable overrides
  • Loading branch information
kermitt2 committed Apr 21, 2020
2 parents c259743 + 56728fe commit 65c8110
Show file tree
Hide file tree
Showing 5 changed files with 160 additions and 20 deletions.
54 changes: 34 additions & 20 deletions doc/Grobid-docker.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
#GROBID and containers (Docker)
# GROBID and containers (Docker)

Docker is an open-source project that automates the deployment of applications inside software containers.
The documentation on how to install it and start using it can be found [here](https://docs.docker.com/engine/understanding-docker/).
Docker is an open-source project that automates the deployment of applications inside software containers.
The documentation on how to install it and start using it can be found [here](https://docs.docker.com/engine/understanding-docker/).

GROBID can be instantiated and run using Docker. The image information can be found [here](https://hub.docker.com/r/lfoppiano/grobid/).

We assume in the following that docker is installed and working on your system. Note that the default memory available for your container might need to be increased for using all the available Grobid services, in particular on `macos`, see the Troubleshooting section below.
We assume in the following that docker is installed and working on your system. Note that the default memory available for your container might need to be increased for using all the available Grobid services, in particular on `macos`, see the Troubleshooting section below.

The process for fetching and running the image is as follow:
The process for fetching and running the image is as follow:

- Pull the image from docker HUB

Expand All @@ -28,36 +28,48 @@ For instance, latest stable version:
```

(alternatively you can also get the image ID)

```bash
> docker images | grep lfoppiano/grobid | grep ${latest_grobid_version}
> docker run -t --rm --init -p 8080:8070 -p 8081:8071 $image_id_from_previous_command
```

- Access the service:
- Access the service:
- open the browser at the address `http://localhost:8080`
- the health check will be accessible at the address `http://localhost:8081`

Grobid web services are then available as described in the [service documentation](https://grobid.readthedocs.io/en/latest/Grobid-service/).

##Troubleshooting
## Configuration using Environment Variables

Properties from the `grobid-home/config/grobid.properties` can be overridden using environment variables.
Given a property key, the corresponding environment variable is the property key converted to upper case and the dot (`.`) replaced by two underscores `__`. (Property keys must be all lower case)

e.g. to configure `grobid.nb_threads` use `GROBID__NB_THREADS`.

###Out of memory or container being killed while processing
```bash
> docker run -t --rm --init -p 8080:8070 -p 8081:8071 \
--env GROBID__NB_THREADS=10 \
lfoppiano/grobid:${latest_grobid_version}
```

This is usually be due to insufficient memory allocated to the docker machine. Depending on the intended usage, we recommend to allocate 4GB of RAM to structure entirely all the PDF content (`/api/processFulltextDocument`), otherwise 2GB are sufficient to extract only header information, and 3GB for citations. In case of more intensice usage and batch parallel processing, allocating 6 or 8GB is recommended.
## Troubleshooting

### Out of memory or container being killed while processing

On `macos`, see for instance [here](https://stackoverflow.com/questions/32834082/how-to-increase-docker-machine-memory-mac/39720010#39720010) on how to increase the RAM from the Docker UI.
This is usually be due to insufficient memory allocated to the docker machine. Depending on the intended usage, we recommend to allocate 4GB of RAM to structure entirely all the PDF content (`/api/processFulltextDocument`), otherwise 2GB are sufficient to extract only header information, and 3GB for citations. In case of more intensice usage and batch parallel processing, allocating 6 or 8GB is recommended.

On `macos`, see for instance [here](https://stackoverflow.com/questions/32834082/how-to-increase-docker-machine-memory-mac/39720010#39720010) on how to increase the RAM from the Docker UI.

The memory can be verified directly using the docker desktop application or via CLI:

```
```bash
> docker-machine inspect
```

You should see something like:
You should see something like:

```
```json
{
"ConfigVersion": 3,
"Driver": {
Expand All @@ -73,7 +85,7 @@ You should see something like:
"VBoxManager": {},
"HostInterfaces": {},
"CPU": 1,
"Memory": 2048, #<---- Memory: 2Gb
"Memory": 2048, #<---- Memory: 2Gb
"DiskSize": 204800,
"NatNicType": "82540EM",
"Boot2DockerURL": "",
Expand Down Expand Up @@ -105,17 +117,17 @@ See for instance [here](https://stackoverflow.com/a/36982696) for allocating to

For more information see the [GROBID main page](https://github.com/kermitt2/grobid/blob/master/Readme.md).

###pdfalto zombie processes
### pdfalto zombie processes

When running docker without an init process, the pdfalto processes will be hang as zombie eventually filling up the machine. The docker solution is to use `--init` as parameter when running the image, however we are discussing some more long-term solution compatible with Kubernetes for example.

The solution shipped with the current Dockerfile, using tini (https://github.com/krallin/tini) should provide the correct init process to cleanup
killed processes.
The solution shipped with the current Dockerfile, using [tini](https://github.com/krallin/tini) should provide the correct init process to cleanup
killed processes.

##Building an image
## Building an image

The following part is normally only for development purposes. You can use the official stable docker images from the docker HUB as described above.
However if you are interested in using the master version of Grobid in container, building a new image is the way to go.
However if you are interested in using the master version of Grobid in container, building a new image is the way to go.

The docker build for a particular version (here for example the latest stable version `0.5.6`) will clone the repository using git, so no need to custom builds. Only important information is the version which will be checked out from the tags.

Expand All @@ -129,12 +141,14 @@ Similarly, if you want to create a docker image from the current master, develop
> docker build -t grobid/grobid:0.6.0-SNAPSHOT --build-arg GROBID_VERSION=0.6.0-SNAPSHOT .
```

In order to run the container of the newly created image for version `0.5.6`:
In order to run the container of the newly created image for version `0.5.6`:

```bash
> docker run -t --rm --init -p 8080:8070 -p 8081:8071 grobid/grobid:0.5.6
```

For testing or debugging purposes, you can connect to the container with a bash shell (logs are under `/opt/grobid/logs/`):

```bash
> docker exec -i -t {container_name} /bin/bash
```
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
package org.grobid.core.utilities;

import java.util.HashMap;
import java.util.Map;

public class EnvironmentVariableProperties {

private final Map<String, String> properties = new HashMap<>();

public EnvironmentVariableProperties(String prefix) {
this(System.getenv(), prefix);
}

public EnvironmentVariableProperties(Map<String, String> environmentVariablesMap, String prefix) {
for (Map.Entry<String, String> entry: environmentVariablesMap.entrySet()) {
if (!entry.getKey().startsWith(prefix)) {
continue;
}
String propertiesKey = getPropertiesKeyForEnvironmentVariableName(entry.getKey());
this.properties.put(propertiesKey, entry.getValue());
}
}

private static String getPropertiesKeyForEnvironmentVariableName(String name) {
return name.replace("__", ".").toLowerCase();
}

public Map<String, String> getProperties() {
return this.properties;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.Enumeration;
import java.util.Map;
import java.util.Properties;

/**
Expand Down Expand Up @@ -303,6 +304,8 @@ private void init() {
throw new GrobidPropertyException("Cannot open file of grobid properties" + getGrobidPropertiesPath().getAbsolutePath(), exp);
}

getProps().putAll(getEnvironmentVariableOverrides(System.getenv()));

initializePaths();
//checkProperties();
loadPdf2XMLPath();
Expand Down Expand Up @@ -336,6 +339,14 @@ public static String getVersion() {
return GROBID_VERSION;
}

protected static Map<String, String> getEnvironmentVariableOverrides(Map<String, String> environmentVariablesMap) {
Map<String, String> properties = new EnvironmentVariableProperties(
environmentVariablesMap, "GROBID__"
).getProperties();
LOGGER.info("environment variables overrides: {}", properties);
return properties;
}

/**
* Initialize the different paths set in the configuration file
* grobid.properties.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
package org.grobid.core.utilities;

import java.util.Collections;
import java.util.HashMap;
import java.util.Map;

import org.junit.Test;

import static org.junit.Assert.assertEquals;


public class EnvironmentVariablePropertiesTest {
private Map<String, String> environmentVariables = new HashMap<>();

@Test
public void shouldReturnEmptyPropertiesWithEmptyEnvironmentVariables() {
assertEquals(
Collections.emptyMap(),
new EnvironmentVariableProperties(
environmentVariables,
"APP__"
).getProperties()
);
}

@Test
public void shouldReturnEmptyPropertiesWithNotMatchingEnvironmentVariables() {
environmentVariables.put("OTHER__ABC", "value1");
assertEquals(
Collections.emptyMap(),
new EnvironmentVariableProperties(
environmentVariables,
"APP__"
).getProperties()
);
}

@Test
public void shouldReturnAndConvertMatchingEnvironmentVariable() {
environmentVariables.put("APP__ABC", "value1");
assertEquals(
Collections.singletonMap("app.abc", "value1"),
new EnvironmentVariableProperties(
environmentVariables,
"APP__"
).getProperties()
);
}

@Test
public void shouldReturnAndConvertMatchingNestedEnvironmentVariable() {
environmentVariables.put("APP__ABC__XYZ", "value1");
assertEquals(
Collections.singletonMap("app.abc.xyz", "value1"),
new EnvironmentVariableProperties(
environmentVariables,
"APP__"
).getProperties()
);
}

@Test
public void shouldReturnAndConvertMatchingEnvironmentVariableWithUnderscore() {
environmentVariables.put("APP__ABC_XYZ", "value1");
assertEquals(
Collections.singletonMap("app.abc_xyz", "value1"),
new EnvironmentVariableProperties(
environmentVariables,
"APP__"
).getProperties()
);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@

import java.io.File;
import java.io.IOException;
import java.util.Collections;

import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertFalse;
Expand Down Expand Up @@ -55,6 +56,16 @@ public void testLoadGrobidProperties_PathNoContext_shouldThrowException() throws
GrobidProperties.loadGrobidPropertiesPath();
}

@Test
public void shouldReturnAndConvertMatchingEnvironmentVariable() throws Exception {
assertEquals(
Collections.singletonMap("grobid.abc", "value1"),
GrobidProperties.getEnvironmentVariableOverrides(
Collections.singletonMap("GROBID__ABC", "value1")
)
);
}

@Test
public void testNativeLibraryPath() throws IOException {
// File expectedFile = new File(MockContext.GROBID_HOME_PATH
Expand Down

0 comments on commit 65c8110

Please sign in to comment.