Infinispan Flink demo

This folder contains a sample job and scripts to demonstrate usage of org.infinispan.hadoop.InfinispanInputFormat and org.infinispan.hadoop.InfinispanOutputFormat in other scenarios different from Hadoop MapReduce, by running an Apache Flink job against data stored in the cache.


  • Linux or MacOS X
  • Docker should be installed and running.Check it with docker --version
  • Samples built: run mvn clean install in the samples/ directory

Note for MacOS users

Add a route so that containers can be reached via their IPs:

sudo route -n add `docker-machine ip default`

Preparing the Infinispan cluster

Run the script ./run-clusters.sh to launch a two node Infinispan cluster and a two node Flink cluster.

The Flink admin console can be found at:


Populating the cache

A simple file with 1k random phrases can be generated using:

docker exec -it master /usr/local/sample/target/scripts/generate.sh 1000

Inspect it using:

docker exec -it master more /file.txt

and populate the cache using the command line:

docker exec -it master sh -c "java -cp /usr/local/sample/target/app.jar org.infinispan.hadoop.sample.util.ControllerCache --host ispn-1 --cachename phrases --populate --file /file.txt"

Executing the job

To execute the job org.infinispan.hadoop.flink.sample.WordFrequency that reads data from the phrases cache and prints a histogram of the number of words per phrase:

docker exec -it master sh -c "/usr/local/flink/bin/flink run  /usr/local/sample/target/app.jar ispn-1"

Changing the job

The master docker container automatically maps the current folder to /usr/local/sample/ inside the container; should you want to change the job, it's enough to rebuild the uber jar and re-run the job to pick up changes


To remove all the docker containers created in this sample:

docker-compose stop
docker network rm sample