PredictionIo: Multi Label Text Category Classifier Template
Java Scala Shell
Switch branches/tags
Nothing to show
Clone or download
Hari Charan Ayada
Latest commit 9aad11d Dec 19, 2016

##Multi Label Text Category Classifier Template

This engine template is an almost-complete implementation of an engine meant to used with PredictionIO.

This muli label Classification Engine Template has integrated Lingpipe ( algorithm by default.

Quick Start

Check the prerequisites below before setup, it will inform choices made.

  1. Install the PredictionIO framework be sure to choose HBase and Elasticsearch for storage. This template requires Elasticsearch.
  2. Make sure the PIO console and services are running, check with pio status
  3. [Install this template](git pull TEMPLATEFOLDERLOCATION)


This engine template utilizes the Lingpipe library (DynamicLMClassifier) to classify text based off of training data.

read more at:

LingPipe Licensing :

Import Sample Data

  1. Create a new app name, change appName in engine.json
  2. Run pio app new **your-new-app-name**
  3. Import sample events by running pio import --appid **your-app-id** --input **your-eventfile.json** where the appid can be retrieved with pio app list
  4. The engine.json file in the root directory of your new config template is set up for the data you just imported (make sure to create a new one for your data) Edit this file and change the appName parameter to match what you called the app in step #2
  5. Perform pio build, pio train, and pio deploy
  6. To execute some sample queries run curl -H "Content-Type: application/json" -d '{"text": "this is a good one", "locale":"EN"}' http://localhost:8000/queries.json


Event Data Requirements

ensure to read :

Events Training Data

    1. eventTime : String
    2. entityId: GUID
    3. event : String
    4  properties
        * Query: String,
        * Category: String,
        * Locale: String
    5. entityType: String

{"eventTime":"2016-03-02T09:52:49+0000","entityId": "5ec59686-84fe-4fe0-b343-27794f6a2645","event":"Autocategory","properties":{"Query":"Apple","Category":"Fruit","Locale":"en"},"entityType":"content"}

StopWords data

    1. eventTime : String
    2. entityId: GUID
    3. event : String
    4  properties
        * word: String,
        * Locale: String
    5. entityType: String


Input Query

* text: String
* locale : String
**Example**: {"text": "Apple", "locale":"EN"}'

Output: A List of PredictedResult

* Category : String
* Score : String
items ":[{" Category ":" Fruit "," Score ":" 0.845175688166216 "},{" Category ":" Tree "," Score ":" 0.7922164503364719 "]}


This file allows the user to describe and set parameters that control the engine operations. Many values have defaults so the following can be seen as the minimum.Reasonable defaults are used so try this first and add tunings or new event types and item property fields as you become more familiar.

Simple Default Values

  "id": "default",
  "description": "Default settings",
  "engineFactory": "io.prediction.lingpipe.AutoCatEngine",
  "datasource": {
    "params": {
      "appName": "Classify",
      "appId": 1,
      "entityType": "content",
      "eventType": "Autocategory",
      "categoryField": "Category",
      "queryField": "Query",
      "localeField": "Locale",
      "stopwordentityType": "resource",
      "stopwordeventType": "stopwords",
      "stopwordField": "word",
      "stopwordlocaleField": "locale"
  "algorithms": [
      "name": "algo",
      "params": {
        "modelbuildfilepath": "/opt/Templates/BuildModels/",
        "modelbuildfilename": "Lingpipe_Model"