Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LookupTransformer TypeError when default value is not exactly the type from the mapping #403

Closed
danmar3 opened this issue Dec 19, 2023 · 2 comments

Comments

@danmar3
Copy link

danmar3 commented Dec 19, 2023

Hi,
Not sure if this is intended behavior, but the following initialization fails with TypeError:

import numpy as np
from sklearn2pmml.preprocessing import LookupTransformer

LookupTransformer({'a': 1.1, 'b': 2.0, 'c': 3.0}, np.float64(0))

This fails with:

sklearn2pmml/preprocessing/__init__.py:345, in LookupTransformer.__init__(self, mapping, default_value)
    343 	if v_type is not None:
    344 		if type(default_value) != v_type:
--> 345 			raise TypeError("Default value is not a {0}".format(v_type.__name__))
    346 self.default_value = default_value

TypeError: Default value is not a float

I think this can be fixed by relaxing the inequality in this line with not isinstance(default_value, v_type)

Thank you

@vruusmann
Copy link
Member

This type check is intended to block situations, where the mapped values and the default values are of completely different data type. For example, the former are floats, and then the latter is an int or str.

Mis-matching data types can cause some hard-to-debug issues, because they render differently when printed to text documents. Very rare (affects only a small number of values), but when it does happen, it's very tricky to debug.

Howver, in the this issue, I see that you're suggesting that the type check should be relaxed, so that in addition to strict type equality check, also type-is-a-subtype-of checks should be enabled. The case in point is that numpy.float64 should be considered a subclass of builtins.float?

import numpy

print(type(1.0)) # <class 'float'>
print(type(numpy.float64(1.0))) # <class 'numpy.float64'>

print(type(1.0) == type(numpy.float64(1.0))) # False

Makes sense. Will fix in the next release, but need to study Python's builti-ns vs Numpy type compatibility first.

@jpmml jpmml deleted a comment from SimonRbk95 Mar 18, 2024
@vruusmann
Copy link
Member

I've been thinking about this issue, and I've decided to keep the current behaviour unchanged (ie. "won't fix").

It still makes sense to require absolute type identity both on the key side and on the value side. Once we start mixing types, it becomes possible that the dict lookup-behaviour breaks down (ie. a float dict key may or may not match a numpy.float64 lookup key), or that the returned values become variable (ie. sometimes you get float values, some other time numpy.float64 values) - the effects will cascade down the pipeline, and break some unsuspecting step much farther down.

If there's a chance of mixed type objects - most notably a mix of Python and Numpy scalars - then they should be normalized to either representation. I'd personally unpack Numpy scalars using the following helper:

def _normalize(x):
  if hasattr(x, "item"):
    return x.item()
  return x

raw_mapping = ...
raw_default_value = ...

transformer = LookupTransformer(mapping = {k : _normalize(v) for k, v in raw_mapping.items()}, default_value = _normalize(raw_default_value))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants