Data Exploration for PI System
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

Data Exploration for PI System


The project around PI System must be considered as Data Driven Projects.

The functional team is not fully qualified to test the data delivery by its own. It needs help to challenge the data quality. It needs some Continuous Control Monitoring. It need to accept and integrate the failure!

Process Representation

Process Representation


  1. Install Anaconda TODO change this to pure python.

  2. conda create --name dataexploration matplotlib pandas. TODO change this to pure python.

  3. An accessible up and running PI System with PI WEB-API.

  4. PI WEB-API should be configured for a basic authentication (username/password).

  5. The conf/credentials.yml has not been pushed for a security reason. Therefore, a conf/credentials.yaml.template is added to copy and rename to conf/credentials.yml with a username/password basic authentication.


  • PI-Web-API-Client-Python – PI Client for Python
  • csv - Read CSV files
  • pandas - Work with data structure like missed Data
  • numpy - perform calculations over Data

Data Exploration description

This model consists of several steps/scripts.

Get the Data from PI System

This script consists on getting the Data from Asset Framework and PI Data Archive. In order to make our testing fully independent of every environment and also to make our testing rules easy to prepare, we will previously insert some AFElements and PIPoints with their appropriate Data in order to work with them with our models.

Data Preparation

This script cleans and pre process the data from your sensor values,before doing any task you have to format your file from this script. Before executing this script you will have to add date time value on the top row of your file with spaces otherwise it won't work

Decision Tree Model

This script consists of unsupervised decision tree model to classify your time-serie values.

It generates a file named leak.csv with values having leaks.

Generate some missing data

Pass a csv file with "date time value" format, this script will identify the frequency and then generate the missing rows in a csv file autofill-output.csv so that you can fill the values and merge them

Merge two files together

This script will take two csv file with "date time value" format and add the filled values in the orignal file on the recquired place

Statistics on data received

This script generates some periodic statistics on the data received. The results are put into csv files. Statistics implemented:

  • Percentage of received data per hour. Therefore, low threshold and a high treshold is calculated regarding of the data received.