Skip to content
Demonstration of MapR for Industrial IoT
Branch: master
Clone or download
Latest commit b86e955 Feb 26, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information.
Grafana statically define data source because Grafana doesn't recognize the _… Jul 12, 2018
images add images to streamsets demo Dec 5, 2018
notebooks add legend to plot Dec 12, 2018
sample_dataset use gzip instead of zip Apr 19, 2018
src/main change about_to_fail to 30s May 1, 2018
webapp add Building Control mockup webapp May 3, 2018
.gitignore add datasets for notebooks Apr 24, 2018
LICENSE Create LICENSE May 30, 2018 add images to streamsets demo Dec 5, 2018 add opentsdb options for jquery May 14, 2018
pom.xml rename the project from factory-iot-tutorial to predictive-maintenance Apr 26, 2018 rename the project from factory-iot-tutorial to predictive-maintenance Apr 26, 2018 prevent data generator from stopping May 15, 2018 add comment title May 2, 2018 add comment title May 2, 2018 add comment title May 2, 2018 add comment title May 2, 2018 add comment title May 2, 2018

MapR for Predictive Maintenance

This project is intended to show how to build Predictive Maintenance applications on MapR. Predictive Maintenance applications place high demands on data streaming, time-series data storage, and machine learning. Therefore, this project focuses on data ingest with MapR Streams, time-series data storage with MapR-DB and OpenTSDB, and feature engineering with MapR-DB and Apache Spark.


Predictive Maintenance requires a cutting edge data platform in order to handle fast streams of IoT data, with the processing required for on-the-fly feature engineering, and the flexibility required for data science and machine learning.

Ingesting Factory IoT Data

Predictive Maintenance applications rely heavily on ingesting multiple data sources, each with their own format and throughput. MapR Streams can ingest data, regardless of format or speed, with standard Kafka and RESTful APIs.

Machine Learning on Factory IoT Data

The "predictive" aspects of Predictive Maintenance applications are usually realized through machine learning. Feature engineering is often considered the most important aspect of machine learning (as opposed to neural network design, for example). Feature engineering places the high demands on the data layer because the amount of data that IoT data streams generate. The tendency for failures to occur infrequently and without warning means vast amounts of raw time-series data must be stored. Not only must it be stored, but it must also be possible to retroactively update the lagging features necessary in order to label failures for the purposes of supervised machine learning. MapR-DB and Spark can work together to provide the capabiltieis required to put machine learning into practice for predictive maintance.

In summary:

  • MapR Streams provide a convenient way to ingest IoT data because it is scalable and provides convenient interfaces.
  • The integration of MapR DB with Spark provides a convenient way to label lagging features needed for predicting failures via supervised Machine Learning.
  • Drill provides a convenient way to load ML data sets into Tensorflow for unsupervised and supervised machine learning

Implementation Summary

There are two objectives relating to predictive maintenance implemented in this project. The first objective is to visualize time-series data in an interactive real-time dashboard in Grafana. The second objective is to make raw data streams and derived features available to machine learning frameworks, such as Tensorflow, in order to develop algorithms for anomaly detection and predictive maintenance. These two objects are realized using two seperate data flows:

  1. The first flow, located on the top half of the image below, is intended to persist IoT data and label training data for sequence prediction and anomaly detection of time-series data in Tensorflow.

  1. The second flow, located on the bottom half, is intended to persist time-series IoT data in OpenTSDB for visualization in a Grafana dashboard.

Put together, these data pipelines look like this. The APIs used for reading and writing data are shown in red.

Preliminary Steps

These steps explain how to setup this tutorial using the MapR Container for Developers on MacOS.

Allocate 12GB to Docker

This tutorial requires a lot of memory. We recommend allocating 12GB RAM, 4GB swap, and 2 CPUs to the Docker Community Edition for MacOS.

Start the MapR sandbox

Download and run the ./ script.

git clone
cd mapr-db-60-getting-started

Run the script

Run the script to install Spark, OpenTSDB, Grafana, and some other things necessary to use the sample applications in this tutorial. SSH to the sandbox container, with password "mapr" and run the following commands. This should take about 20 minutes.

ssh -p 2222 root@localhost
chmod 700 ./
sudo ./

Import the Grafana dashboard

sudo /opt/mapr/server/ -R -OT `hostname -f`
sudo /opt/mapr/opentsdb/opentsdb-2.4.0/etc/init.d/opentsdb start

Open Grafana data sources, with a URL like http://maprdemo:3000/datasources/edit/1, and add OpenTSDB as a new data source.

Load the Grafana/IoT_dashboard.json file using Grafana's dashboard import functionality, and specify "MaprMonitoringOpenTSDB" as the data source, as shown below:

Predictive Maintenance Demo Procedure

For learning or debugging purposes you should run each of the following steps manually but if you just want to see data show up in Grafana then just run ./

Step 1 - Simulate HVAC data stream:

We have provided a dataset captured from a real-world heating, ventilation, and air conditioning (HVAC) system in the sample_dataset folder. Run the following command to replay that HVAC stream to /apps/factory:mqtt.

cat ~/predictive-maintenance/sample_dataset/mqtt.json | while read line; do echo $line | sed 's/{/{"timestamp":"'$(date +%s)'",/g' | /opt/mapr/kafka/kafka-*/bin/ --topic /apps/factory:mqtt --broker-list; sleep 1; done

Step 2 - Save IoT data stream to MapR-DB:

In the next step we'll save the IoT data stream to OpenTSDB so we can visualize it in Grafana, but in this step we save that stream to MapR-DB so we can apply labels necessary for supervised machine learning, as discussed in Step 4.

Run the following command to persist messages from stream /apps/factory:mqtt to MapR-DB table /apps/mqtt_records.

/opt/mapr/spark/spark-*/bin/spark-submit --class com.mapr.examples.MqttConsumer ~/predictive-maintenance/target/predictive-maintenance-1.0-jar-with-dependencies.jar /apps/factory:mqtt /apps/mqtt_records

Run this command to see how the row count increases:

/opt/mapr/drill/drill-*/bin/sqlline -u jdbc:drill: -n mapr
    select count(*) from dfs.`/apps/mqtt_records`;

Step 3 - Save IoT data stream to OpenTSDB:

In this step we save the IoT data stream to OpenTSDB so we can visualize it in Grafana.

Update localhost:4242 with the hostname and port of your OpenTSDB server before running the following command:

/opt/mapr/kafka/kafka-*/bin/ --new-consumer --topic /apps/factory:mqtt --bootstrap-server not.applicable:0000 | while read line; do echo $line | jq -r "to_entries | map(\"\(.key) \(.value | tostring)\") | {t: .[0], x: .[]} | .[]" | paste -d ' ' - - | awk '{system("curl -X POST --data \x27{\"metric\": \""$3"\", \"timestamp\": "$2", \"value\": "$4", \"tags\": {\"host\": \"localhost\"}}\x27 http://localhost:4242/api/put")}'; echo -n "."; done

After you have run that command you should be able to visualize the streaming IoT data in Grafana. Depending on where you have installed Grafana, this can be opened with a URL like http://maprdemo:3000:

Step 4 - Update lagging features in MapR-DB for each failure event:

This process will listen for failure events on a MapR Streams topic and retroactively label lagging features in MapR-DB when failures occur, as well as render the failure event in Grafana. Update "http://localhost:3000" with the hostname and port for your Grafana instance.

/opt/mapr/spark/spark-*/bin/spark-submit --class com.mapr.examples.UpdateLaggingFeatures ~/predictive-maintenance/target/predictive-maintenance-1.0-jar-with-dependencies.jar /apps/factory:failures /apps/mqtt_records http://localhost:3000

This particular step probably says the most about the value of MapR, because consider this: if you have a factory, instrumented by IoT devices reporting hundreds of metrics, per machine, per second, and you're tasked with the challenge of saving all that data until one day, often months into the future, you finally have a machine fail. At that point, you have to retroactively go back and update all those records as being "about to fail" or "x days to failure" so that you can use that data for training models to predict those lagging features. That's one heck of a DB update, right? The only way to store all that data is with a distributed database. This is what makes Spark and MapR-DB such a great fit. Spark - the distributed processing engine for big data, and MapR-DB - the distributed data store for big data, working together to process and store lots of data with speed and scalability.

Step 5 - Simulate a failure event:

To simulate a device failure, run this command:

echo "{\"timestamp\":"$(date +%s -d '60 sec ago')",\"deviceName\":\"Chiller1\"}" | /opt/mapr/kafka/kafka-*/bin/ --topic /apps/factory:failures --broker-list

This will trigger the Spark process you ran in the previous step to update lagging features in the MapR-DB table "/apps/mqtt_records". Once it sees the event you simulated, you should see it output information about the lagging features it labeled, like this:

Update Lagging Features screenshot

Step 6 - Validate that lagging features have been updated:

$ mapr dbshell
find /apps/mqtt_records --where '{ "$eq" : {"_Chiller1AboutToFail":"true"} }' --f _id,_Chiller1AboutToFail,timestamp

Here are a few examples commands to look at that table with mapr dbshell:

$ mapr dbshell
find /apps/mqtt_records --f timestamp
find --table /apps/mqtt_records --orderby _id --fields _Chiller2RemainingUsefulLife,_id
find --table /apps/mqtt_records --where '{"$gt" : {"_id" : "1523079964"}}' --orderby _id --fields _Chiller2RemainingUsefulLife,_id
find --table /apps/mqtt_records --where '{"$gt" : {"timestamp" : "1523079964"}}' --fields _Chiller2RemainingUsefulLife,timestamp

Here's an example of querying IoT data records table with Drill:

/opt/mapr/drill/drill-*/bin/sqlline -u jdbc:drill: -n mapr
    select * from dfs.`/apps/mqtt_records` limit 2;

Step 7 - Synthesize a high speed data stream:

This command simulates a high speed data stream from a vibration sensor sampling once per 10ms.

java -cp ~/predictive-maintenance/target/predictive-maintenance-1.0-jar-with-dependencies.jar com.mapr.examples.HighSpeedProducer /apps/fastdata:vibrations 10

Why are we simulating a vibration sensor?

Degradation in machines often manifests itself as a low rumble or a small shake. These unusual vibrations give you the first clue that a machine is nearing the end of its useful life, so it's very important to detect those anomalies. Vibration sensors measure the displacement or velocity of motion thousands of times per second. Analyzing those signals is typically done in the frequency domain. An algorithm called "fast Fourier transform" (FFT) can sample time-series vibration data and identify its component frequencies. In the next step you will run a command the converts the simulated vibration data to the frequency domain with an FFT and raises alarms when vibration frequencies vary more than a predefined threshold.

vibration analysis

This demonstrates the capacity of MapR to ingest and process high speed streaming data. Depending on hardware, you will probably see MapR Streams processing more than 40,000 messages per second in this step.

Step 8 - Process high speed data stream:

This will calculate FFTs on-the-fly for the high speed streaming data, and render an event in Grafana when FFTs changed more than 25% over a rolling window. This simulates anomaly detection for a vibration signal. Update "http://localhost:3000" with the hostname and port for your Grafana instance.

/opt/mapr/spark/spark-*/bin/spark-submit --class com.mapr.examples.StreamingFourierTransform ~/predictive-maintenance/target/predictive-maintenance-1.0-jar-with-dependencies.jar /apps/fastdata:vibrations 25.0 http://localhost:3000

Step 9 - Visualize data in Grafana

By now you should be able to see streaming IoT data, vibration faults, and device failures in the Grafana dashboard.

Step 10 - Explore IoT data with Apache Drill

Apache Drill is a Unix service that unifies access to data across a variety of data formats and sources. MapR is the primary contributor to Apache Drill, so naturally it is included in the MapR platform. Here's how you can use it to explore some of the IoT data we've gathered so far in this tutorial.

Open the Drill web interface. If you're running MapR on your laptop then that's probably at http://localhost:8047. Here are a couple useful queries:

Show how many messages have arrived cumulatively over days of the week:

SELECT _day_of_week_long, count(_day_of_week_long) FROM dfs.`/apps/mqtt_records` group by _day_of_week_long;

Count how many faults have been detected:

SELECT _id, _day_of_week_long, _Chiller1AboutToFail, ROW_NUMBER() OVER (PARTITION BY _Chiller1AboutToFail ORDER BY _id) as fault FROM dfs.`/apps/mqtt_records`
SELECT * from x WHERE _Chiller1AboutToFail = 'true' and fault = 1;

Drill can also be used to load data from MapR-DB into data science notebooks. Examples of this are shown in the following section.

References for Machine Learning techniques for Predictive Maintenance

This tutorial focuses on data engineering - i.e. getting data in the right format and in the right place in order to take advantage of machine learning (ML) for predictive maintenance applications. The details of ML are beyond the scope of this tutorial but to better understand ML techniques commonly used for predictive maintenance, check out the provided Jupyter notebook for LSTM predictions for About To Fail. This notebook not only talks about how to use LSTM but also how to generate a sample dataset with logsynth that resembles what you might see in a real factory. This notebook is great because it explains how to experiment with LSTMs entirely on your laptop.


Here are some of the other notebooks included in this repo that demonstrate ML concepts for predictive maintenance:

  • LSTM predictions for Remaining Useful Life shows how to train a point regression model using an LSTM neural network in Keras to predict the point in time when an airplane engine will fail.
  • LSTM time series predictions from OpenTSDB shows how to train a model to predict the next value in a sequence of numbers, which in this case, is a time series sequence of numbers stored in OpenTSDB. This notebook also shows how to load training data into Tensorflow using the REST API for OpenTSDB.
  • RNN time series predictions from OpenTSDB also shows how to train a model to predict the next value in a sequence of numbers, except it uses an RNN model instead LSTM.
  • RNN predictions for MapR-DB data via Drill also shows how to train a model to predict the next value in a sequence of numbers, except it reads time-series data from MapR-DB using Drill as a SQL engine.

StreamSets Demonstration

To give you an idea of what dataflow management tools do, I’ve prepared a simple StreamSets project that you can run on a laptop with Docker. This project demonstrates a pipeline that streams time-series data recorded from an industrial HVAC system into OpenTSDB for visualization in Grafana.

Create a docker network to bridge containers:

docker network create mynetwork

Start StreamSets, OpenTSDB, and Grafana:

docker run -it -p 18630:18630 -d --name sdc --network mynetwork \
docker run -dp 4242:4242 --name hbase --network mynetwork \
docker run -d -p 3000:3000 --name grafana --network mynetwork \

Open Grafana at http://localhost:3000 and login with admin / admin

Add http://hbase:4242 as an OpenTSDB datasource to Grafana. If you don’t know how to add a data source, refer to Grafana docs. Your datasource definition should look like this:

Download the following Grafana dashboard file:


Import that file into Grafana. If you don’t know how to import a dashboard, see Grafana docs. The Grafana import dialog should look like this:

Download, unzip, and copy the MQTT dataset to the StreamSets container:

unzip mqtt.json.gz
docker cp mqtt.json sdc:/tmp/mqtt.json

Open StreamSets at http://localhost:18630 and login with admin / admin

Download and import the following pipeline into StreamSets. If you don’t know how to import a pipeline, refer to StreamSets docs.


You will see a warning about a missing library in the “Parse MQTT JSON” stage. Click that stage and follow the instructions to install the Jython library.

Finally, run the StreamSets pipeline.

Hopefully, by setting up this pipeline and exploring StreamSets you’ll get the gist of what dataflow management tools can do.

You can’t perform that action at this time.