Skip to content
master
Go to file
Code

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 

README.md

PySpark2PMML

Python library for converting Apache Spark ML pipelines to PMML.

Features

This package provides Python wrapper classes and functions for the JPMML-SparkML library. For the full list of supported Apache Spark ML Estimator and Transformer types, please refer to JPMML-SparkML documentation.

Prerequisites

  • Apache Spark 2.0.X, 2.1.X, 2.2.X, 2.3.X, 2.4.X or 3.0.X.
  • Python 2.7, 3.4 or newer.

Installation

Install a release version from PyPI:

pip install pyspark2pmml

Alternatively, install the latest snapshot version from GitHub:

pip install --upgrade git+https://github.com/jpmml/pyspark2pmml.git

Configuration and usage

PySpark2PMML must be paired with JPMML-SparkML based on the following compatibility matrix:

Apache Spark version JPMML-SparkML branch JPMML-SparkML uber-JAR file
2.0.X 1.1.X (Archived) 1.1.23
2.1.X 1.2.X (Archived) 1.2.15
2.2.X 1.3.X (Archived) 1.3.15
2.3.X 1.4.X 1.4.15
2.4.X 1.5.X 1.5.8
3.0.X master 1.6.0

Launch PySpark; use the --jars command-line option to specify the location of the JPMML-SparkML uber-JAR file:

pyspark --jars /path/to/jpmml-sparkml-executable-${version}.jar

Fitting an example pipeline model:

from pyspark.ml import Pipeline
from pyspark.ml.classification import DecisionTreeClassifier
from pyspark.ml.feature import RFormula

df = spark.read.csv("Iris.csv", header = True, inferSchema = True)

formula = RFormula(formula = "Species ~ .")
classifier = DecisionTreeClassifier()
pipeline = Pipeline(stages = [formula, classifier])
pipelineModel = pipeline.fit(df)

Exporting the fitted example pipeline model to a PMML file:

from pyspark2pmml import PMMLBuilder

pmmlBuilder = PMMLBuilder(sc, df, pipelineModel) \
	.putOption(classifier, "compact", True)

pmmlBuilder.buildFile("DecisionTreeIris.pmml")

License

PySpark2PMML is licensed under the terms and conditions of the GNU Affero General Public License, Version 3.0.

If you would like to use PySpark2PMML in a proprietary software project, then it is possible to enter into a licensing agreement which makes PySpark2PMML available under the terms and conditions of the BSD 3-Clause License instead.

Additional information

PySpark2PMML is developed and maintained by Openscoring Ltd, Estonia.

Interested in using Java PMML API software in your company? Please contact info@openscoring.io

About

Python library for converting Apache Spark ML pipelines to PMML

Resources

License

Packages

No packages published

Languages

You can’t perform that action at this time.