Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InvalidFeatureException from spark context #26

Closed
acrisci opened this issue May 3, 2016 · 2 comments
Closed

InvalidFeatureException from spark context #26

acrisci opened this issue May 3, 2016 · 2 comments

Comments

@acrisci
Copy link

acrisci commented May 3, 2016

Hi,

When I try to use the evaluator from a spark context, it will not create the model manager because of pmml validation problems.

Exception in thread "main" org.jpmml.evaluator.InvalidFeatureException (at or around line 8): DataDictionary                                          [13/1951]
        at org.jpmml.evaluator.CacheUtil.getValue(CacheUtil.java:58)
        at org.jpmml.evaluator.ModelEvaluator.<init>(ModelEvaluator.java:113)
        at org.jpmml.evaluator.TreeModelEvaluator.<init>(TreeModelEvaluator.java:54)
        at org.jpmml.evaluator.ModelEvaluatorFactory.newModelManager(ModelEvaluatorFactory.java:101)
        at org.jpmml.evaluator.ModelEvaluatorFactory.newModelManager(ModelEvaluatorFactory.java:45)
        at org.jpmml.evaluator.ModelManagerFactory.newModelManager(ModelManagerFactory.java:66)
        at org.jpmml.evaluator.ModelManagerFactory.newModelManager(ModelManagerFactory.java:46)
        at com.example.Main.main(Main.java:23)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassCastException: org.dmg.pmml.DataField cannot be cast to org.dmg.pmml.Indexable
        at org.jpmml.evaluator.IndexableUtil.ensureKey(IndexableUtil.java:78)
        at org.jpmml.evaluator.IndexableUtil.buildMap(IndexableUtil.java:64)
        at org.jpmml.evaluator.ModelEvaluator$1.load(ModelEvaluator.java:538)
        at org.jpmml.evaluator.ModelEvaluator$1.load(ModelEvaluator.java:534)
        at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3542)
        at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2323)
        at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2286)
        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2201)
        at com.google.common.cache.LocalCache.get(LocalCache.java:3953)
        at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3957)
        at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4875)
        at org.jpmml.evaluator.CacheUtil.getValue(CacheUtil.java:50)
        ... 16 more

Here is my java class I am submitting:

package com.example;

import org.dmg.pmml.PMML;
import org.jpmml.evaluator.Evaluator;
import org.jpmml.evaluator.ModelEvaluatorFactory;
import org.jpmml.model.ImportFilter;
import org.jpmml.model.JAXBUtil;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;

import javax.xml.bind.JAXBException;
import javax.xml.transform.Source;

public final class Main {
    public static void main(String[] args) {
        System.out.println("hello world");

        try {
            Source transformedSource = ImportFilter.apply(new InputSource(Main.class.getResourceAsStream("/DecisionTreeIris.pmml")));
            PMML pmml = JAXBUtil.unmarshalPMML(transformedSource);
            ModelEvaluatorFactory modelEvaluatorFactory = ModelEvaluatorFactory.newInstance();
            Evaluator evaluator = modelEvaluatorFactory.newModelManager(pmml);
            evaluator.verify();
        } catch (SAXException | JAXBException e) {
            // could not parse pmml as xml
            throw new RuntimeException(e);
        }
    }
}

Submitting with spark-submit --class com.example.Main /path/to/example-assembly.jar.

It does not throw the error when I run the assembled jar like java -jar /path/to/example-assembly.jar.

DecisionTreeIris.pmml is from here.

Thanks for the project. Any help is appreciated.

@vruusmann
Copy link
Member

That's a class loading conflict.

Namely, if you load the JPMML-Evaluator library in "standalone mode", then it will look up the latest and greatest version of the JPMML-Model library. However, if you load it inside Apache Spark, then it will be forcibly paired with a legacy version of the JPMML-Model library (included in Apache Spark distribution for MLlib PMML export needs), which fails to meet several expectations (eg. doesn't know about the org.dmg.pmml.Indexable interface).

Next to the JPMML-Model library, there's a similar class loading conflict (waiting just around the corner) in relation to the Google Guava library.

To overcome this problem, you should use the "package relocation" mechanism of the Apache Maven Shade plugin. Please see the POM file of the pmml-spark-example module of the JPMML-Spark project: https://github.com/jpmml/jpmml-spark/blob/master/pmml-spark-example/pom.xml

@acrisci
Copy link
Author

acrisci commented May 4, 2016

Thanks. Using the package relocations like in that project fixed the issue 👍.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants