Based on PredictionIO Classification Engine Template (Scala-based parallelized engine)
Scala Python
Clone or download
Pull request Compare This branch is 10 commits ahead, 19 commits behind apache:develop.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
data ready for submission Apr 28, 2015
lib
project Rename pio sbt plugin file to pio-build.sbt Mar 3, 2015
src/main/scala added missing files Apr 28, 2015
.gitignore Update for 0.9.1 Mar 18, 2015
README.md Update README.md Apr 28, 2015
build.sbt build and train with deploy error Apr 4, 2015
engine.json finished and cleaned up Apr 28, 2015
query.py ready for submission Apr 28, 2015
template.json

README.md

Sparkling Water-Deep Learning Engine Template

This engine template has integrated Sparkling Water's Deep Learning Model by default.

NOTE: This template is compatible with PredictionIO 0.9.2 and currently does not support PredictionIO 0.9.3 or above because the Sparkling Water library in this template uses Apache Spark 1.2.0 while PredictinoIO 0.9.3 uses Apache Spark 1.3.0

Overview

This Engine Template demonstrates a circuit end use prediction engine. It integrates Deep Learning from the Sparkling Water library to perform energy analysis. We can query with a circuit ID and obtain a predicted circuit usage.

Usage

Event Data Requirements

By default, the engine requires the following events to be collected:

  • Circuit ID (with name "id")
  • Mean of all enegy data of this circuit (with name "mean")
  • Variance of all energy data of this circuit (with name "variance")
  • (optional) The end usage of this circuit, if available (with name "enduse")

Input Query

  • Circuit ID

Output PredictedResult

  • End Use of the circuit

Dataset Format There should be 2 csv files: The first is the source data file, with the following constraints:

  • Row 0 of the dataset must contain integers representing Circuit IDs.
  • Column 0 of the dataset must contain integers representing Time.
  • All other rows and columns should contain integers or doubles representing Energy data. Empty cells are ignored.

The file data/sample_data.csv is included for reference.

The second is the end usage file, with the following constraints:

  • Column 0 of the dataset consists of circuit IDs that exist in the source data file
  • Column 1 of the dataset is the corresponding zip code of the circuit
  • Column 2 of the dataset is the corresponding end use of each circuit

The file data/sample_enduse.csv is included for reference.

1. Run PredictionIO

If PredictionIO is not installed, install it here.

Start all components (Event Server, Elaticsearch, and HBase).

Note: If pio-start-all is not recognized, upgrade to the latest version of PredictionIO.

$ pio-start-all

Verify the status of components:

$ pio status

2. Download the Engine Template

git clone https://github.com/harry5z/template-circuit-classification-sparkling-water

3. Create a new application

$ pio app new [YourAppName]

The console output should include the App Name, App ID, and Access Key. You will need the App ID and Access Key in future steps. You can view your applications by entering pio app list.

4. Import Data to the Event Server

Install the PredictionIO Python SDK:

$ pip install predictionio

or

$ easy_install predictionio

From the root directory of your engine, run:

$ python data/import_data.py --access_key [YourAccessKeyFromStep3] --data [/path/to/your/source/file] --result [/path/to/your/enduse/file]

5. Build, Train, and Deploy the Engine

From the root directory of your engine, find engine.json and verify that the appId matches the App Id of your application from Step 3.

 ...
  "datasource": {
    "params" : {
      "appId": 1
    }
  },
  ...

Build the engine.

$ pio build

Train the engine. This may take several minutes.

$ pio train

Deploy the engine. This may take several minutes.

$ pio deploy

After deploying successfully, you can view the status of your engine at http://localhost:8000.

6. Using the Engine

To do a sample query, run python query.py from the root directory of your engine. Customize the query by modifying the JSON {"circuit": 0} in query.py. The engine will return a JSON object containing predicted energy usage.