Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integer constants that don't fit into java.lang.Integer value range #107

Closed
pellmont opened this issue Mar 23, 2018 · 5 comments
Closed

Comments

@pellmont
Copy link

pellmont commented Mar 23, 2018

Hi Villu

I found a small inconsistency in TypeUtil which causes Problems with with large numeric String constants.

In the method getConstantDataType (Line 862) the String is checked against Long and Returns INTEGER aus Datatype.
Int the methods cast/toInteger the DataType INTEGER a range check for integer is done and if whithin range it will be transformed to an Integer.

In my opinion getConstantDataType should check against Integer (Integer.parseInt instead of Long.parseLong).

JPMML-Evaluator used: 1.4.0 (the official one, not my java6 backport)
As for the PMML: this has been generated with jpmml-sparkml 1.3.3

The PMML Fragment causing trouble looks like this:
<DerivedField name="handleInvalid(my_field)" optype="categorical" dataType="string"> <Apply function="if"> <Apply function="isIn"> <FieldRef field="my_field"/> <Constant>7680517950697</Constant>

it's generated by a StringIndexer with setHandleInvalid("keep").

btw: it would also be nice if setHandleInvalid("skip") would be supported by jpmml-sparkml

@vruusmann
Copy link
Member

vruusmann commented Mar 23, 2018

The PMML Fragment causing trouble looks like this

If I read this PMML fragment correctly, then this Constant element aims to represent a string value (not an integer value) in the first place?

<Constant dataType="string">7680517950697</Constant>

The JPMML-Converter library currently sets the Constant@dataType attribute when there's a need to distinguish between float and double values, but it should be setting it for all values. Also, that would relieve the JPMML-Evaluator library from guessing the data type over and over using the TypeUtil#getConstantDataType(String) utility method, which is rather expensive.

In my opinion getConstantDataType should check against Integer (Integer.parseInt instead of Long.parseLong).

In PMML there is only one integer data type. It was my arbitrary decision to map it to java.lang.Integer (not java.lang.Long).

This decision can be changed any time, but based on the "principle of least astonishment", it should happen during a major version upgrade (eg. from 1.4.X to 1.5.X), not between minor version upgrades (eg. from 1.4.1 to 1.4.2).

it would also be nice if setHandleInvalid("skip") would be supported by jpmml-sparkml

Moved to a separate issue jpmml/jpmml-sparkml#41

@vruusmann vruusmann changed the title Inconsistent Types in TypeUtil Integer constants that don't fit into java.lang.Integer value range Mar 23, 2018
@pellmont
Copy link
Author

Yes, the constant element would be String (from StringIndexerModel).

Setting the data type in jpmml-sparkml for constants would also definitely make sense. My first thought was this would bloat the PMML file, but considering those constants are probably limited in numbers it wouldn't matter that much. This would probably solve my problem best.
For Implementation I would probably add a Method createConstant(Object, DataType) and use this one wherever the DataType is known (actually in most places it's either always Double or always String)

The decision to take Integer rather than Long is OK for me, I didn't find any Information about the length of Integer in the PMML-Spec, so it is perfectly valid and no problem for me. But it should be consistent.

@vruusmann
Copy link
Member

You can always post-process PMML class model objects using the Visitor API. For example, setting the Constant@dataType attribute:

Visitor visitor = new org.jpmml.model.visitors.AbstractSimpleVisitor(){

  @Override
  public VisitorAction visit(Constant constant){
    if(constant.getDataType() == null){
      if(isStringValue(constant.getValue())){
        constant.setDataType(DataType.STRING);
      }
    }
  }
};

PMML pmml = ...;
visitor.applyTo(pmml);

@pellmont
Copy link
Author

Oh that's a good idea, I'll try this on Monday! Thanks!

@pellmont
Copy link
Author

pellmont commented Apr 2, 2018

Thanks Villu!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants