Permalink
Browse files

Import

  • Loading branch information...
vruusmann committed May 5, 2016
0 parents commit 5e773090cecb1e81efffe9fb346d81799615ebd6
Showing with 12,567 additions and 0 deletions.
  1. +661 −0 LICENSE.txt
  2. +122 −0 README.md
  3. +222 −0 pom.xml
  4. +41 −0 src/main/java/org/jpmml/sparkml/CategoricalFeature.java
  5. +28 −0 src/main/java/org/jpmml/sparkml/ContinuousFeature.java
  6. +40 −0 src/main/java/org/jpmml/sparkml/ConverterUtil.java
  7. +40 −0 src/main/java/org/jpmml/sparkml/DecisionTreeClassificationModelConverter.java
  8. +38 −0 src/main/java/org/jpmml/sparkml/DecisionTreeRegressionModelConverter.java
  9. +40 −0 src/main/java/org/jpmml/sparkml/Feature.java
  10. +198 −0 src/main/java/org/jpmml/sparkml/FeatureSchema.java
  11. +118 −0 src/main/java/org/jpmml/sparkml/FeatureSchemaUtil.java
  12. +82 −0 src/main/java/org/jpmml/sparkml/GBTRegressionModelConverter.java
  13. +44 −0 src/main/java/org/jpmml/sparkml/LinearRegressionModelConverter.java
  14. +58 −0 src/main/java/org/jpmml/sparkml/LogisticRegressionModelConverter.java
  15. +126 −0 src/main/java/org/jpmml/sparkml/Main.java
  16. +33 −0 src/main/java/org/jpmml/sparkml/ModelConverter.java
  17. +71 −0 src/main/java/org/jpmml/sparkml/PipelineModelUtil.java
  18. +33 −0 src/main/java/org/jpmml/sparkml/PredictionModelConverter.java
  19. +52 −0 src/main/java/org/jpmml/sparkml/RandomForestClassificationModelConverter.java
  20. +51 −0 src/main/java/org/jpmml/sparkml/RandomForestRegressionModelConverter.java
  21. +76 −0 src/main/java/org/jpmml/sparkml/RegressionModelUtil.java
  22. +40 −0 src/main/java/org/jpmml/sparkml/TransformerConverter.java
  23. +236 −0 src/main/java/org/jpmml/sparkml/TreeModelUtil.java
  24. +31 −0 src/test/java/org/jpmml/sparkml/ClassificationTest.java
  25. +42 −0 src/test/java/org/jpmml/sparkml/ConverterTest.java
  26. +26 −0 src/test/java/org/jpmml/sparkml/RegressionTest.java
  27. +1,900 −0 src/test/resources/csv/Audit.csv
  28. +393 −0 src/test/resources/csv/Auto.csv
  29. +1,900 −0 src/test/resources/csv/DecisionTreeAudit.csv
  30. +393 −0 src/test/resources/csv/DecisionTreeAuto.csv
  31. +151 −0 src/test/resources/csv/DecisionTreeIris.csv
  32. +393 −0 src/test/resources/csv/GBTAuto.csv
  33. +151 −0 src/test/resources/csv/Iris.csv
  34. +393 −0 src/test/resources/csv/LinearRegressionAuto.csv
  35. +1,900 −0 src/test/resources/csv/LogisticRegressionAudit.csv
  36. +1,900 −0 src/test/resources/csv/RandomForestAudit.csv
  37. +393 −0 src/test/resources/csv/RandomForestAuto.csv
  38. +151 −0 src/test/resources/csv/RandomForestIris.csv
  39. BIN src/test/resources/ser/DecisionTreeAudit.ser
  40. BIN src/test/resources/ser/DecisionTreeAuto.ser
  41. BIN src/test/resources/ser/DecisionTreeIris.ser
  42. BIN src/test/resources/ser/GBTAuto.ser
  43. BIN src/test/resources/ser/LinearRegressionAuto.ser
  44. BIN src/test/resources/ser/LogisticRegressionAudit.ser
  45. BIN src/test/resources/ser/RandomForestAudit.ser
  46. BIN src/test/resources/ser/RandomForestAuto.ser
  47. BIN src/test/resources/ser/RandomForestIris.ser

Large diffs are not rendered by default.

Oops, something went wrong.
122 README.md
@@ -0,0 +1,122 @@
JPMML-SparkML
=============
Java library and command-line application for converting Spark ML pipelines to PMML.
# Features #
* Supported Transformer types:
* Feature transformers:
* [`feature.OneHotEncoder`] (https://spark.apache.org/docs/latest/api/java/org/apache/spark/ml/feature/OneHotEncoder.html)
* [`feature.VectorAssembler`] (https://spark.apache.org/docs/latest/api/java/org/apache/spark/ml/feature/VectorAssembler.html)
* Fitted feature transformers:
* [`feature.StringIndexerModel`] (https://spark.apache.org/docs/latest/api/java/org/apache/spark/ml/feature/StringIndexerModel.html)
* Prediction models:
* [`classification.DecisionTreeClassificationModel`] (https://spark.apache.org/docs/latest/api/java/org/apache/spark/ml/classification/DecisionTreeClassificationModel.html)
* [`classification.LogisticRegressionModel`] (https://spark.apache.org/docs/latest/api/java/org/apache/spark/ml/classification/LogisticRegressionModel.html)
* [`classification.RandomForestClassificationModel`] (https://spark.apache.org/docs/latest/api/java/org/apache/spark/ml/classification/RandomForestClassificationModel.html)
* [`regression.DecisionTreeRegressionModel`] (https://spark.apache.org/docs/latest/api/java/org/apache/spark/ml/regression/DecisionTreeRegressionModel.html)
* [`regression.GBTRegressionModel`] (https://spark.apache.org/docs/latest/api/java/org/apache/spark/ml/regression/GBTRegressionModel.html)
* [`regression.LinearRegressionModel`] (https://spark.apache.org/docs/latest/api/java/org/apache/spark/ml/regression/LinearRegressionModel.html)
* [`regression.RandomForestRegressionModel`] (https://spark.apache.org/docs/latest/api/java/org/apache/spark/ml/regression/RandomForestRegressionModel.html)
* Production quality:
* Complete test coverage.
* Fully compliant with the [JPMML-Evaluator] (https://github.com/jpmml/jpmml-evaluator) library.
# Prerequisites #
* Apache Spark version 1.6.0 or newer.
# Installation #
Enter the project root directory and build using [Apache Maven] (http://maven.apache.org/):
```
mvn clean install
```
The build produces two JAR files:
* `target/jpmml-sparkml-1.0-SNAPSHOT.jar` - Library JAR file.
* `target/converter-executable-1.0-SNAPSHOT.jar` - Example application JAR file.
# Usage #
## Library ##
Adding the JPMML-SparkML dependency to the project:
```xml
<dependency>
<groupId>org.jpmml</groupId>
<artifactId>jpmml-sparkml</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
```
Fitting a Spark ML pipeline that only makes use of supported Transformer types:
```java
DataFrame irisData = ...;
StringIndexerModel speciesIndexer = new StringIndexer()
.setInputCol("Species")
.setOutputCol("speciesIndex")
.fit(irisData);
VectorAssembler vectorAssembler = new VectorAssembler()
.setInputCols(new String[]{"Sepal_Length", "Sepal_Width", "Petal_Length", "Petal_Width"})
.setOutputCol("featureVector");
DecisionTreeClassifier classifier = new DecisionTreeClassifier()
.setLabelCol(speciesIndexer.getOutputCol())
.setFeaturesCol(vectorAssembler.getOutputCol());
IndexToString labelConverter = new IndexToString()
.setInputCol(classifier.getPredictionCol())
.setOutputCol("predictedSpecies")
.setLabels(speciesIndexer.labels());
Pipeline pipeline = new Pipeline()
.setStages(new PipelineStage[]{speciesIndexer, vectorAssembler, classifier, labelConverter});
PipelineModel pipelineModel = pipeline.fit(irisData);
```
Converting the Spark ML pipeline to PMML using the `org.jpmml.sparkml.PipelineModelUtil#toPMML(PipelineModel)` utility method:
```java
PMML pmml = PipelineModelUtil.toPMML(pipelineModel);
// Viewing the result
JAXBUtil.marshalPMML(pmml, new StreamResult(System.out));
```
Saving the Spark ML pipeline in Java serialization data format to a file `pipeline.ser` for conversion with the example application:
```java
try(OutputStream os = new FileOutputStream("pipeline.ser")){
try(ObjectOutputStream oos = new ObjectOutputStream(os)){
oos.writeObject(pipelineModel);
}
}
```
## Example application ##
The example application JAR file contains an executable class `org.jpmml.sparkml.Main`, which can be used to convert serialized `org.apache.spark.ml.PipelineModel` objects to PMML.
The example application JAR file does not include Apache Spark runtime libraries. Therefore, this executable class must be executed using Apache Spark's `spark-submit` helper script.
Converting the Spark ML pipeline serialization file `pipeline.ser` to a PMML file `pipeline.pmml`:
```
spark-submit --master local[1] --class org.jpmml.sparkml.Main target/converter-executable-1.0-SNAPSHOT.jar --ser-input pipeline.ser --pmml-output pipeline.pmml
```
Getting help:
```
spark-submit --master local[1] --class org.jpmml.sparkml.Main target/converter-executable-1.0-SNAPSHOT.jar --help
```
# License #
JPMML-SparkML is licensed under the [GNU Affero General Public License (AGPL) version 3.0] (http://www.gnu.org/licenses/agpl-3.0.html). Other licenses are available on request.
# Additional information #
Please contact [info@openscoring.io] (mailto:info@openscoring.io)
222 pom.xml
@@ -0,0 +1,222 @@
<?xml version="1.0" ?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.jpmml</groupId>
<artifactId>jpmml-sparkml</artifactId>
<version>1.0-SNAPSHOT</version>
<name>JPMML-SparkML</name>
<description>Java library and command-line application for converting Spark ML pipelines to PMML</description>
<url>http://www.jpmml.org</url>
<licenses>
<license>
<name>GNU Affero General Public License (AGPL) version 3.0</name>
<url>http://www.gnu.org/licenses/agpl-3.0.html</url>
<distribution>repo</distribution>
</license>
</licenses>
<developers>
<developer>
<id>villu.ruusmann</id>
<name>Villu Ruusmann</name>
</developer>
</developers>
<scm>
<connection>scm:git:git@github.com:jpmml/jpmml-sparkml.git</connection>
<developerConnection>scm:git:git@github.com:jpmml/jpmml-sparkml.git</developerConnection>
<url>git://github.com/jpmml/jpmml-sparkml.git</url>
<tag>HEAD</tag>
</scm>
<issueManagement>
<system>GitHub</system>
<url>https://github.com/jpmml/jpmml-sparkml/issues</url>
</issueManagement>
<dependencies>
<dependency>
<groupId>com.beust</groupId>
<artifactId>jcommander</artifactId>
<version>1.48</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.10</artifactId>
<version>1.6.0</version>
<scope>provided</scope>
<exclusions>
<exclusion>
<groupId>org.jpmml</groupId>
<artifactId>pmml-model</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.jpmml</groupId>
<artifactId>jpmml-converter</artifactId>
<version>1.0.4</version>
<exclusions>
<exclusion>
<groupId>com.sun.xml.fastinfoset</groupId>
<artifactId>FastInfoset</artifactId>
</exclusion>
<exclusion>
<groupId>javax.xml.bind</groupId>
<artifactId>jaxb-api</artifactId>
</exclusion>
<exclusion>
<groupId>org.glassfish.jaxb</groupId>
<artifactId>txw2</artifactId>
</exclusion>
<exclusion>
<groupId>org.jvnet.staxex</groupId>
<artifactId>stax-ex</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.jpmml</groupId>
<artifactId>pmml-evaluator</artifactId>
<version>1.2.13</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.jpmml</groupId>
<artifactId>pmml-evaluator</artifactId>
<version>1.2.13</version>
<type>test-jar</type>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.3</version>
<configuration>
<source>1.7</source>
<target>1.7</target>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.4.3</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<finalName>converter-executable-${project.version}</finalName>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<relocations>
<relocation>
<pattern>com.google.common</pattern>
<shadedPattern>com.shaded.google.common</shadedPattern>
</relocation>
<relocation>
<pattern>org.dmg.pmml</pattern>
<shadedPattern>org.shaded.dmg.pmml</shadedPattern>
</relocation>
<relocation>
<pattern>org.jpmml.agent</pattern>
<shadedPattern>org.shaded.jpmml.agent</shadedPattern>
</relocation>
<relocation>
<pattern>org.jpmml.model</pattern>
<shadedPattern>org.shaded.jpmml.model</shadedPattern>
</relocation>
<relocation>
<pattern>org.jpmml.schema</pattern>
<shadedPattern>org.shaded.jpmml.schema</shadedPattern>
</relocation>
</relocations>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>org.jpmml.sparkml.Main</mainClass>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-source-plugin</artifactId>
<version>2.4</version>
<executions>
<execution>
<id>attach-sources</id>
<goals>
<goal>jar</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.19.1</version>
<configuration>
<argLine>${jacoco.agent}</argLine>
<trimStackTrace>false</trimStackTrace>
</configuration>
</plugin>
<plugin>
<groupId>org.jacoco</groupId>
<artifactId>jacoco-maven-plugin</artifactId>
<version>0.7.6.201602180812</version>
<executions>
<execution>
<id>pre-unit-test</id>
<goals>
<goal>prepare-agent</goal>
</goals>
<configuration>
<propertyName>jacoco.agent</propertyName>
</configuration>
</execution>
<execution>
<id>post-unit-test</id>
<phase>prepare-package</phase>
<goals>
<goal>report</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
@@ -0,0 +1,41 @@
/*
* Copyright (c) 2016 Villu Ruusmann
*
* This file is part of JPMML-SparkML
*
* JPMML-SparkML is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* JPMML-SparkML is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU Affero General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with JPMML-SparkML. If not, see <http://www.gnu.org/licenses/>.
*/
package org.jpmml.sparkml;
import org.dmg.pmml.FieldName;
public class CategoricalFeature<V> extends Feature {
private V value = null;
public CategoricalFeature(FieldName name, V value){
super(name);
setValue(value);
}
public V getValue(){
return this.value;
}
private void setValue(V value){
this.value = value;
}
}
Oops, something went wrong.

0 comments on commit 5e77309

Please sign in to comment.