ElasticSearch + Facetview complete Docker Stack Orchestration
This repo is DEPRECATED: We now deploy via EEA Rancher catalog templates.
1.1 eeasearch [repo], [docker] - Node.js frontend to an ElasticSearch cluster
- This container listens on port 3000 and provides a readonly API endpoint to the elasticsearch cluster.
- The rendering is done by using jquery.facetview.js
- The base image has support for automatic sync jobs and for running index management commands
More details on the source repository
1.2 pam [repo], - Node.js frontend to an ElasticSearch cluster
- This container listens on port 3010 and provides a readonly API endpoint to the elasticsearch cluster.
- The rendering is done by using jquery.facetview.js
- The base image has support for running index management commands
More details on the source repository
1.3 aide [repo], - Node.js frontend to an ElasticSearch cluster
- This container listens on port 3020 and provides a readonly API endpoint to the elasticsearch cluster.
- The rendering is done by using jquery.facetview.js
- The base image has support for running index management commands
More details on the source repository
1.4 esmaster [repo], [docker] - Elastic master configurated node
- This node can't do anything besides cluster management. Thus, it has a low chance of getting shut down.
1.5 esclient [repo] [docker] - Elastic HTTP client configured node
- This node is the only one that can accept, parse, scatter and gather HTTP query requests.
- The actual work is being performed by the esworkers
1.6 esworker [repo], [docker] - Elastic Data storage nodes
- Two configurated nodes for data replication.
- These nodes hold the data and execute the actual queries received from the esclient.
- In addition, these are the only nodes that can run the River process. Thus, if the river process brings down the node (e.g. consumes too much memory), the other node will be able to serve the data.
1.7 dataw[1|2] - Data Volume Containers
- Lightweight containers holding the data stored in the workers.
- These containers make the data easy to backup and be restored independent of the esworker container's faith.
1.8 datam - Data Container for Master node
- This container doesn't store any indexed data, but it stores information about the worker nodes, required by the master node
git clone --recurse https://github.com/eea/eea.docker.searchservices
cd eea.docker.searchservices
docker-compose up -d
To see all commands an elastic app can do type docker-compose run --rm eeasearch help
.
Troubleshooting: Data is not indexed? Sometimes during the indexing and even after finishing it queries on the new index throws an error. Restarting elasticsearch solves the problem:
# Restarting the elastic workers if the index is not built
docker-compose restart esworker1
docker-compose restart esworker2
Now go to the <serverip>:9200/_plugin/head/ to see if the index is being built.
Also you can try to increment the ES_HEAP_SIZE for the clients in the docker-compose.yml.
All elastic search apps run a create index at startup if they haven't indexes or not have data.
You can stop this feature adding AUTO_INDEXING=false
into environment section of the docker-compose.yml
...
environment:
- AUTO_INDEXING=false
...
After you can run the follow steps to index
# Wait a while for the elastic cluster to get initialized
# Start indexing data
docker-compose run --rm eeasearch create_index
# Check the logs
docker-compose logs
# If the river is not indexing just perform a couple of reindex commands
docker-compose run --rm eeasearch reindex
# Go to this host:3000 to see that data is being harvested
# And the same for PAM
# Start indexing data
docker-compose run --rm pam create_index
# And the same for AIDE
# Start indexing data
docker-compose run --rm aide create_index
# Check the logs
docker-compose logs
# If the river is not indexing just perform a couple of reindex commands
docker-compose run --rm pam reindex
# Go to this host:3010 to see that data is being harvested for pam
# Go to this host:3020 to see that data is being harvested for aide
The data is kept persistent by using two explicit data containers.
The data is mounted in /usr/share/elasticsearch/data
Follow te steps from the "Backup, restore, or migrate data volumes" section
in the Docker documentation
Change the tags in this repo to match the image version you want to upgrade to. Then, push the changes on this repo. On the host runnig this compose-file do:
docker-compose stop # stop the running containers
git pull origin master # and get the docker-compose-prod.yml containing the latests tags
# Before this step you should backup the data containers if the update procedure fails
docker-compose pull # get the images and their tags
docker images | grep eeacms # inspect that the new images have been downloaded
docker-compose rm -vf eeasearch aide pam # remove the old containers befor start
docker-compose up -d --no-recreate # start the running containers
Possible problems
In some cases the containers cannot be stopped because for some reason they have no names. This happens mostly for the elastic containers. Running
docker ps -a
Displays the list of containers but some of them have no names. First these containers should be removed with
docker rm --force <container_id>
Second the containers should be rebuilt with
docker-compose up -d --no-recreate
Given a webapp and the fact that you can access esclient from your office you can reindex the data or force a sync using this command.
Assuming that esclient:9200 is available at http://some-staging:80/elasticsearch/
and
you have permission to perform PUT POST and DELETE over that endpoint from your office, you can run
this oneliner to reindex the data from a given app.
docker run --rm -e elastic_host=some-staging -e elastic_path=/elasticsearch/ -e elastic_port=80 eeacms/eeasearch reindex
To see a list of all available commands run:
docker run --rm -e elastic_host=some-staging -e elastic_path=/elasticsearch/ -e elastic_port=80 eeacms/eeasearch help
By default
elastic_path
is/
andelastic_port
is9200
. So you can omit them if esclient is accessible on port9200
at path/
.
TL;DR - it won't work with docker-compose scale because the overhead is in worker nodes which need additional ops to be scaled.
By default, ElasticSearch breaks an index into 5 shards (holding different parts of the data). Each shard will have one replica. If we have 4 workers with this setup, then shards could be distributed as such:
- Node1: Shard 0 Primary, Shard 1 Replia, Shard 3 Primary
- Node2: Shard 0 Replica, Shard 1 Primary, Shard 2 Primary
- Node3: Shard 4 Replica, Shard 3 Replica
- Node4: Shard 4 Primary, Shard 2 Replica
If Node3 and Node4 are scaled down, Shard 4 will get lost and it would be hard to recover.
-
Scaling up will not automatically move shards to other nodes in order to better distribute the jobs.
-
Scaling down will not move shards to remaining nodes to keep availability.
-
Running on the same host would increase the number of parallel disk accesses which can trash the cache, resulting in poor performance.
-
Worker nodes perform most of the work. If something runs slow it's a high change that something is taking too long on the workers, not the client or the master.
Maintaining a more complex ElasticSearch Cluster means distributing it over more hosts and performing careful operations for scaling so data is not lost. Just don't do docker scale over elastic nodes.
The provided docker-compose-prod.yml in this repo is already configured to run within Rancher PaaS.
Make sure you have the appropriate labels on the docker hosts in your Rancher cluster. See docker-compose-prod.yml and look for labels io.rancher.scheduler.affinity:host_label.
Go to your Rancher Web interface and generate your API key (API & Keys for "..." Environment):
$ export RANCHER_URL=<(Endpoint URL)>
$ export RANCHER_ACCESS_KEY=<(ACCESS KEY)>
$ export RANCHER_SECRET_KEY=<(SECRET KEY)>
$ git clone https://github.com/eea/eea.docker.searchservices.git
$ cd eea.docker.searchservices
$ rancher-compose up
The above will automatically create a stack named eea-docker-searchservices and run it. Now look at the exposed rancher loadbalancer and configure your DNS/proxy to point to it.
Perform this steps to be able to easily make changes to any of the EEA maintained parts of this stack.
- bash :)
- python (>= 2)
- git :)
- maven (for building the EEA RDF River plugin)
sudo apt-get install maven
and a Java environment - npm (>= 2.8.4) for building and publishing the base node.js webapp module
- Follow these steps to install the needed versions on a Debian based system [TODO]
- Docker (>=1.5) and docker-compose (>=1.3.0)
- Follow these steps to install them [TODO]
- To easily run the commands ad your user into the docker group and re-login for the changes to take effect.
This repository glues together all the components of the stack and also offers a template for a development docker-compose file. Change directory to your home or working folder and clone the project using:
user@host ~/ $ git clone --recursive git@github.com:eea/eea.docker.searchservices.git
Building the elastic containers from sources is rarely used, and takes lot of time, so we have 2 options:
- use the elastic images from dockerhub
- build the images from sources
Run docker-compose -f docker-compose-dev.yml up
to start all services.
Check http://localhost:9200 or http://localhost:9200/_plugin/head/ to see if elastic is up and running. When it's up, you can go to http://localhost:3000, http://localhost:3010 and http://localhost:3020 then make yourself a coffee, everything works now.
Run docker-compose -f docker-compose-dev-elastic.yml up
to start all services.
Run docker-compose -f docker-compose-dev.yml run --rm eeasearch create_index
to create the index for EEASearch
Run docker-compose -f docker-compose-dev.yml run --rm pam create_index
to create the index for PAM
Run docker-compose -f docker-compose-dev.yml run --rm aide create_index
to create the index for AIDE
Assuming you have tested locally and implemented the needed features, depending on the code you changed, perform the following steps to make the changes available in Docker Registry.
You can also use repo specific docker-compose.yml files if the changes affect only a part of the stack.
Note: make sure that all the applications using this package work with your new changes before publishing anything.
First, you need to publish the new version of the package.
- Open package.json and increment the version
- Commit your changes
- Commit a new tag
This repository will not automatically build the eeacms/eeasearch (and other apps) Docker images.
- Go to https://registry.hub.docker.com/u/eeacms/eeasearch/ and trigger a build.
- Wait for the build to complete
- Perform these steps to deploy
Note: make sure that all the applications using the river work with your new changes before publishing anything.
First, you need to add a new release of the river.
- Open pom.xml and increment the version
- Run
mvn clean install
to make a new build - Commit your changes
- Go to the releases tab
- Click on draft a new release
- Fill in the tag version and release name as the version you added in pom.xml This is needed because the Dockerfile expects this naming scheme
- Attach
eea.elasticsearch.river.rdf/target/releases/eea-rdf-river-plugin-version.zip
as a binary release - Complete the release
This repository will not automatically build the eeacms/elastic Docker images.
- Go to https://registry.hub.docker.com/u/eeacms/elastic/ and trigger a build.
- Current naming scheme for the tags is $ES_VERSION-$RIVER_VERSION
- Wait for the build to complete
- Perform these steps to deploy
Pushing to master will automatically trigger a build with the :latest tag. Make sure that you are building with the correct tags and wait for the builds to complete bofore performing these steps.
All elastic applications will display in the page footer information about the current index and container, like below:
Application data last refreshed 05 April 2016 12:52 PM. Version info eeacms/pam:v2.7.3 and git tag number v2.8 on 718b1e09d6a0.
- 05 April 2016 12:52 PM - the date when index was updated/rebuilt
- eeacms/pam:v2.7.3 - current image version used; this is an optional value that can be specified in the docker compose file like below:
environment:
- VERSION_INFO=eeacms/pam:v2.7.3
- v2.8 - current git tag number (based on git describe --tags)
- 718b1e09d6a0 - container id (HOSTANME environment variable)