LookupTransformer TypeError when default value is not exactly the type from the mapping #403

danmar3 · 2023-12-19T00:38:54Z

Hi,
Not sure if this is intended behavior, but the following initialization fails with TypeError:

import numpy as np
from sklearn2pmml.preprocessing import LookupTransformer

LookupTransformer({'a': 1.1, 'b': 2.0, 'c': 3.0}, np.float64(0))

This fails with:

sklearn2pmml/preprocessing/__init__.py:345, in LookupTransformer.__init__(self, mapping, default_value)
    343 	if v_type is not None:
    344 		if type(default_value) != v_type:
--> 345 			raise TypeError("Default value is not a {0}".format(v_type.__name__))
    346 self.default_value = default_value

TypeError: Default value is not a float

I think this can be fixed by relaxing the inequality in this line with not isinstance(default_value, v_type)

Thank you

The text was updated successfully, but these errors were encountered:

vruusmann · 2023-12-19T05:01:07Z

This type check is intended to block situations, where the mapped values and the default values are of completely different data type. For example, the former are floats, and then the latter is an int or str.

Mis-matching data types can cause some hard-to-debug issues, because they render differently when printed to text documents. Very rare (affects only a small number of values), but when it does happen, it's very tricky to debug.

Howver, in the this issue, I see that you're suggesting that the type check should be relaxed, so that in addition to strict type equality check, also type-is-a-subtype-of checks should be enabled. The case in point is that numpy.float64 should be considered a subclass of builtins.float?

import numpy

print(type(1.0)) # <class 'float'>
print(type(numpy.float64(1.0))) # <class 'numpy.float64'>

print(type(1.0) == type(numpy.float64(1.0))) # False

Makes sense. Will fix in the next release, but need to study Python's builti-ns vs Numpy type compatibility first.

vruusmann · 2024-03-18T13:35:59Z

I've been thinking about this issue, and I've decided to keep the current behaviour unchanged (ie. "won't fix").

It still makes sense to require absolute type identity both on the key side and on the value side. Once we start mixing types, it becomes possible that the dict lookup-behaviour breaks down (ie. a float dict key may or may not match a numpy.float64 lookup key), or that the returned values become variable (ie. sometimes you get float values, some other time numpy.float64 values) - the effects will cascade down the pipeline, and break some unsuspecting step much farther down.

If there's a chance of mixed type objects - most notably a mix of Python and Numpy scalars - then they should be normalized to either representation. I'd personally unpack Numpy scalars using the following helper:

def _normalize(x):
  if hasattr(x, "item"):
    return x.item()
  return x

raw_mapping = ...
raw_default_value = ...

transformer = LookupTransformer(mapping = {k : _normalize(v) for k, v in raw_mapping.items()}, default_value = _normalize(raw_default_value))

jpmml deleted a comment from SimonRbk95 Mar 18, 2024

vruusmann closed this as completed Mar 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LookupTransformer TypeError when default value is not exactly the type from the mapping #403

LookupTransformer TypeError when default value is not exactly the type from the mapping #403

danmar3 commented Dec 19, 2023 •

edited by vruusmann

Loading

vruusmann commented Dec 19, 2023

vruusmann commented Mar 18, 2024

LookupTransformer TypeError when default value is not exactly the type from the mapping #403

LookupTransformer TypeError when default value is not exactly the type from the mapping #403

Comments

danmar3 commented Dec 19, 2023 • edited by vruusmann Loading

vruusmann commented Dec 19, 2023

vruusmann commented Mar 18, 2024

danmar3 commented Dec 19, 2023 •

edited by vruusmann

Loading