Skip to content

Commit

Permalink
Release GN v4.4.0 (#107)
Browse files Browse the repository at this point in the history
* GN v4.4.0
* Allow to set the application web context path using the new `WEBAPP_CONTEXT_PATH` (defaults to `/geonetwork`)
* Move the exploded war to `/opt/geonetwork`
* Add a new `GN_CONFIG_PROPERTIES` for passing additional configuration options to GN in the form of
  Java properties, for example `GN_CONFIG_PROPERTIES=-Dldap.base.provider.url=ldap://ldap:389`
* `ES_HOST` is no longer mandatory and now defaults to `localhost`

* Update
  * Java 11
  * Fix Config properties
  * Use OGC API records 4.2.5 for now
  * Fix OGC API records fail to start because of empty GN database
  * Move config to JVM args
  * Add Jetty http forward module for X-Forwarded header.
  * Move env to variable for future easier configuration of multiple instances.
  * Jetty / Increase max form keys

This is a parameter to customize depending on size of metadata records (eg. many languages or many contacts)
* Jetty config / Allows larger form.
* Java options to be able to return metrics in Java 11.
* Clustering / Add instruction for testing and add a simple load balancer.
* Nginx / Add body size parameter and fix hard coded scheme.
* Move to traefik.
Main idea was to more easily set up load balancing with sticky session.

Co-authored-by: Joachim Nielandt <joachim.nielandt@vlaanderen.be>

* Add health check.
Main idea is to avoid errors on some case (eg. OGC API records error starting on an empty database) and start services in order.

Co-authored-by: Joachim Nielandt <joachim.nielandt@vlaanderen.be>

* Readme update for traefik change.
* Monitoring / Load traefik log using filebeat. Removing Apache and Nginx config.
* Update multiple instances limitations.
* Jetty / Update version and fix sending mail on java 11.

Co-authored-by: Joachim Nielandt <joachim.nielandt@vlaanderen.be>

* Add timezone config and use separate schemapublication dir (avoid issue
when starting multiple instances using the same data dir - XSD are
copied to this folder on startup and copy may clash)

---------

Co-authored-by: Francois Prunayre <fx.prunayre@gmail.com>
Co-authored-by: Joachim Nielandt <joachim.nielandt@vlaanderen.be>
Co-authored-by: joachimnielandt <joachim.nielandt@gmail.com>
  • Loading branch information
4 people committed Oct 25, 2023
1 parent 3dd8a6f commit 317a9e7
Show file tree
Hide file tree
Showing 18 changed files with 1,187 additions and 0 deletions.
45 changes: 45 additions & 0 deletions 4.4.0/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
FROM jetty:9-jdk11

ENV DATA_DIR /catalogue-data
ENV WEBAPP_CONTEXT_PATH /geonetwork
ENV GN_CONFIG_PROPERTIES -Dgeonetwork.dir=${DATA_DIR} \
-Dgeonetwork.formatter.dir=${DATA_DIR}/data/formatter \
-Dgeonetwork.schema.dir=/opt/geonetwork/WEB-INF/data/config/schema_plugins \
-Dgeonetwork.indexConfig.dir=/opt/geonetwork/WEB-INF/data/config/index


ENV JAVA_OPTS -Djava.security.egd=file:/dev/./urandom -Djava.awt.headless=true \
-Xms512M -Xss512M -Xmx2G -XX:+UseConcMarkSweepGC

USER root
RUN apt-get -y update && \
apt-get -y install --no-install-recommends \
curl \
unzip && \
rm -rf /var/lib/apt/lists/* && \
mkdir -p ${DATA_DIR} && \
chown -R jetty:jetty ${DATA_DIR} && \
mkdir -p /opt/geonetwork && \
chown -R jetty:jetty /opt/geonetwork

USER jetty
ENV GN_FILE geonetwork.war
ENV GN_VERSION 4.4.0
ENV GN_DOWNLOAD_MD5 36638cfd380942801ff2038792ee54a9

RUN cd /opt/geonetwork/ && \
curl -fSL -o geonetwork.war \
https://sourceforge.net/projects/geonetwork/files/GeoNetwork_opensource/v${GN_VERSION}/${GN_FILE}/download && \
echo "${GN_DOWNLOAD_MD5} *geonetwork.war" | md5sum -c && \
unzip -q geonetwork.war && \
rm geonetwork.war

COPY jetty/geonetwork_context_template.xml /usr/local/share/geonetwork/geonetwork_context_template.xml
COPY ./docker-entrypoint.sh /geonetwork-entrypoint.sh

RUN java -jar /usr/local/jetty/start.jar --create-startd --add-module=http-forwarded

ENTRYPOINT ["/geonetwork-entrypoint.sh"]
CMD ["java","-jar","/usr/local/jetty/start.jar"]

VOLUME [ "${DATA_DIR}" ]
54 changes: 54 additions & 0 deletions 4.4.0/Dockerfile.local
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
FROM jetty:9-jdk11 as base

USER root
RUN apt-get update && apt-get install -y --no-install-recommends unzip \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* \
&& mkdir -p /opt/geonetwork \
&& chown -R jetty:jetty /opt/geonetwork

COPY geonetwork.war /tmp

USER jetty
RUN unzip /tmp/geonetwork.war -d /opt/geonetwork



FROM jetty:9-jdk11 as final

ENV GN_FILE geonetwork.war
ENV GN_VERSION 4.4.0

ENV DATA_DIR /catalogue-data
ENV WEBAPP_CONTEXT_PATH /geonetwork


# This variable can be used to define additional config options in the way of Java System properties
# (e.g. "-Des.protocol=http -Des.port=9200 -Des.index.records=geo-records")
ENV GN_CONFIG_PROPERTIES -Dgeonetwork.dir=${DATA_DIR} \
-Dgeonetwork.formatter.dir=${DATA_DIR}/data/formatter \
-Dgeonetwork.schema.dir=/opt/geonetwork/WEB-INF/data/config/schema_plugins \
-Dgeonetwork.indexConfig.dir=/opt/geonetwork/WEB-INF/data/config/index

# JAVA_OPTS can be used to configue JVM specific options, like max memory, debugger port and method...
ENV JAVA_OPTS -Djava.security.egd=file:/dev/./urandom -Djava.awt.headless=true \
-Xms512M -Xss512M -Xmx2G -XX:+UseConcMarkSweepGC

USER root
RUN apt-get update && apt-get install -y --no-install-recommends unzip \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
RUN mkdir -p /catalogue-data \
&& chown -R jetty:jetty /catalogue-data

USER jetty

COPY jetty/geonetwork_context_template.xml /usr/local/share/geonetwork/geonetwork_context_template.xml
COPY --from=base /opt/geonetwork /opt/geonetwork

COPY ./docker-entrypoint.sh /geonetwork-entrypoint.sh

RUN java -jar /usr/local/jetty/start.jar --create-startd --add-to-start=http-forwarded

ENTRYPOINT ["/geonetwork-entrypoint.sh"]
CMD ["java","-jar","/usr/local/jetty/start.jar"]
244 changes: 244 additions & 0 deletions 4.4.0/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,244 @@
# Version 4.4.0

## Running with integrated Elasticsearch

1. Clone this repository

```bash
git clone https://github.com/geonetwork/docker-geonetwork.git
cd docker-geonetwork/4.4.0
```

2. Run the docker-composition from the current directory:

```bash
docker-compose up
```

3. Open http://geonetwork.localhost/geonetwork/ in a browser

## Build docker image

If not published, you can build the image locally using:

```bash
docker build . -t geonetwork:4.4.0
```

## Running with custom geonetwork.war

This directory includes two Dockerfiles:

* `Dockerfile` is canonical one used to generate the Docker Hub official
image. It downloads GeoNetwork 4.4.0-0 WAR file from sourceforge.
* `Dockerfile.local` needs a `geonetwork.war` file next to it to build
the image.

It also includes two docker-compose configuration files.* `docker-compose.yml` uses official GeoNetwork image from Docker Hub.

* `docker-compose.dev.yml` can be applied to override the image used in
`docker-compose.yml` and build the GeoNetwork image using `Dockerfile.local`.

### Pre-built image

To use the pre-built image you can use the `docker-compose.yml` file provided
in this directory:

```bash
docker-compose up
```

### Local image

To be able to generate an elasticsearch-ready docker image, you will have:

1. Build your geonetwork.war (https://geonetwork-opensource.org/manuals/trunk/en/maintainer-guide/installing/installing-from-source-code.html#the-quick-way)

2. Clone this repository

```bash
git clone https://github.com/geonetwork/docker-geonetwork.git
cd docker-geonetwork/4.4.0
```

3. Get the generated webapp in the current directory, name it `geonetwork.war`

```shell
cp ../../core-geonetwork/web/target/geonetwork.war .
```

4. Run the docker-composition from the current directory:

```bash
docker-compose -f docker-compose.yml -f docker-compose.dev.yml up --build
```

5. Open http://geonetwork.localhost/geonetwork/ in a browser

## Running with a custom Database

See "Connecting to a postgres database" https://hub.docker.com/_/geonetwork

```bash
docker run --name geonetwork -d -p 8080:8080 \
-e GEONETWORK_DB_TYPE=postgres \
-e GEONETWORK_DB_HOST=my-db-host \
-e GEONETWORK_DB_PORT=5434 \
-e GEONETWORK_DB_USERNAME=postgres \
-e GEONETWORK_DB_PASSWORD=mysecretpassword \
-e GEONETWORK_DB_NAME=mydbname \
geonetwork:4.4.0
```

## Running with remote Elasticsearch

```bash
docker run --name geonetwork -d -p 8080:8080 \
-e "GN_CONFIG_PROPERTIES=-Des.host=elasticsearch \
-Des.protocol=http \
-Des.port=9200 \
-Des.url=http://elasticsearch:9200 \
-Dgeonetwork.ESFeaturesProxy.targetUri=http://elasticsearch:9200/gn-features/{_} " \
geonetwork:4.4.0
```

If you have error connecting to the remote Elasticsearch, check the configuration in `config/elasticsearch.yml`:

```yaml
network.host: my-elasticsearch-host
discovery.seed_hosts: []
```

## Running with custom Elasticsearch index names

Add the following options to `GN_CONFIG_PROPERTIES`:

```bash
-Des.index.records=geo-records
-Des.index.features=geo-features
-Des.index.searchlogs=geo-searchlogs
-Dgeonetwork.ESFeaturesProxy.targetUri=http://elasticsearch:9200/geo-features/{_}
```

## Running with remote Elasticsearch with authentication

Add the `-Des.username=esUserName -Des.password=esPassword` options to `GN_CONFIG_PROPERTIES`.

If using the WFS features harvesting, add the
`-Dgeonetwork.ESFeaturesProxy.username=esReadOnlyUsername -Dgeonetwork.ESFeaturesProxy.password=esPassword` options to `GN_CONFIG_PROPERTIES`.

## Running with remote Kibana

Add the `-Dgeonetwork.HttpDashboardProxy.targetUri=http://kibana:5601` options to `GN_CONFIG_PROPERTIES`.

## Running with remote OGC API Records

Add the `-Dgeonetwork.MicroServicesProxy.targetUri=http://ogc-api-records-service:8080` options to `GN_CONFIG_PROPERTIES`.

## Running with custom security mode

Add the `-Dgeonetwork.security.type=` to set the authentication mode. See available security modes in <https://github.com/geonetwork/core-geonetwork/blob/main/web/src/main/webapp/WEB-INF/config-security/config-security.xml#L43-L64> and configuration options in <https://github.com/geonetwork/core-geonetwork/blob/main/web/src/main/webapp/WEB-INF/config-security/config-security.properties>. See also <https://geonetwork-opensource.org/manuals/4.0.x/en/administrator-guide/managing-users-and-groups/authentication-mode.html>.

eg. LDAP configuration:

```bash
-Dgeonetwork.security.type=ldap
-Dldap.host=ldap
-Dldap.port=389
-Dldap.base=dc=geonetwork-opensource,dc=org
-Dldap.base.dn=dc=geonetwork-opensource,dc=org
-Dldap.security.principal=cn=admin,dc=geonetwork-opensource,dc=org
-Dldap.security.credentials=secret
-Dldap.base.search.base=ou=directory
-Dldap.sync.user.search.base=ou=directory
-Dldap.base.dn.pattern=uid={0},ou=directory
```

eg. CAS configuration

```bash
-Dcas.baseURL=http://localhost:8080/cas
-Dcas.login.url=http://localhost:8080/cas/login
-Dcas.ticket.validator.url=http://cas:8080/cas
-Dgeonetwork.https.url=http://localhost:8080/geonetwork
```

## Running with a custom context path

To run the application in a custom context path, for example in <http://geonetwork.localhost/catalogue> instead of the default <http://geonetwork.localhost/geonetwork> use the `WEBAPP_CONTEXT_PATH` environment variable:

```yaml
environment:
WEBAPP_CONTEXT_PATH: /catalogue
```

## Configure the default language

To configure the default application language and bypass browser language detection when redirecting from the base URL use:

```bash
-Dlanguage.default=fre
-Dlanguage.forceDefault=true
```

## Running behind a proxy

If the catalogue needs to use proxy for HTTP calls, use Java environment variables:

```bash
-Dhttp.proxyHost=<proxyAddress>
-Dhttp.proxyPort=<proxyPort>
-Dhttps.proxyHost=<proxyAddress>
-Dhttps.proxyPort=<proxyPort>
-Dhttp.nonProxyHosts=<nonProxyHosts>
-Dhttp.proxyUser=<proxyUser>
-Dhttp.proxyPassword=<proxyPassword>
```

## Clustering (experimental)

The clustering mode allows to start more than one GeoNetwork instance.
To enable it use the `scaled` profile. In this mode:

* only one node will be in charge of the harvester scheduler and process the scheduled harvesting tasks
* any node can take a harvesting task manually triggered from the harvesting console
* webserver is configured with sticky session (ie. a user stay on the same node)

First, start the main composition which will start all services (including the main node). Then start new instances with:

```bash
docker-compose --profile scaled up --scale geonetwork-replica=2 -d
```

Known limitations:

* Harvester / Scheduler needs to be refreshed when the database harvester configuration is modified
(the harvesting node refresh the schedule every 2 minutes as a stopgap solution)
* Harvester / Replica can't access the main node harvester log files
* Harvester / Running state is not visible on other nodes
* Settings / When saving application settings, some modules need to be updated:
* log level configuration,
* DOI configuration,
* proxy configuration (use Java environment variable instead of database configuration)
* Thesaurus / Local thesaurus modified in one node are not updated on others.

## Monitoring

A composition is also available for monitoring metrics and logs for the webserver and the database.

First start the composition without monitoring containers.
In Kibana go to `Manage space` and create a `catalogue-monitor` space.
This space will be populated with default dashboards by metricbeat and filebeat.

Once the space created, use the following to start metricbeat and filebeat:

```bash
docker-compose -f docker-compose.yml -f docker-compose.monitoring.yml up --build
```

Metricbeat and filebeat needs to authenticate to push into Kibana (GeoNetwork is checking access). Adapt password
if needed in configuration files for `setup.kibana.username` and `setup.kibana.password`.

Once started, sample dashboards analyzing the GeoNetwork API usage are available in `catalogue-log-dashboard.ndjson`.

![Dashboard](catalogue-log-dashboard.png)
Loading

0 comments on commit 317a9e7

Please sign in to comment.