Description
Materializing feature views that contain Array(String) columns using the Athena offline store fails intermittently with one of two errors:
Error 1: ValueError: The truth value of an empty array is ambiguous
Triggered when an entity row has no values set for an array column (e.g. tags = []).
File ".../feast/type_map.py", line 772, in _python_value_to_proto_value
elif not pd.isnull(value):
ValueError: The truth value of an empty array is ambiguous. Use `array.size > 0` to check that an array is not empty.
Error 2: TypeError: bad argument type for built-in operation
Triggered when any user has values in an array column.
File ".../feast/type_map.py", line 905, in <listcomp>
ProtoValue(**{field_name: proto_type(val=value)})
TypeError: bad argument type for built-in operation
Root Cause
Arrow/Athena deserializes Array(String) feature columns as numpy.ndarray with object dtype rather than plain Python lists. Two code paths in type_map.py do not handle this:
-
Scalar null-check (_convert_scalar_values_to_proto): The line elif not pd.isnull(value) calls pd.isnull() on a numpy array, which returns an array of bools — then not <array> raises ValueError because the truth value is ambiguous.
-
Generic list conversion (_convert_list_values_to_proto): The call proto_type(val=value) passes the raw numpy.ndarray directly to the protobuf constructor. Protobuf rejects non-list types and raises TypeError. Additionally, Arrow nullable columns can produce None elements inside the ndarray, which protobuf StringList also rejects.
-
Type validation (_validate_collection_item_types): None elements inside an ndarray fail the type(item) in valid_types check before they can be sanitized downstream.
Steps to Reproduce
- Define a
FeatureView with an Array(String) field:
Field(name="tags", dtype=Array(String))
- Materialize from Athena where some rows have non-empty arrays, some have empty arrays, and some have NULL values in array elements.
- Observe
ValueError or TypeError in feast/type_map.py.
Expected Behavior
Materialization completes successfully. Array columns from Arrow/Athena are converted to proto-safe Python lists, with None elements replaced by empty string.
Environment
- Feast version: (any version with the generic list proto conversion)
- Offline store: Athena
- Python: 3.11
- Feature column type:
Array(String) (maps to ValueType.STRING_LIST)
Fix
I opened a PR with a fix for this issue: #6324
Description
Materializing feature views that contain
Array(String)columns using the Athena offline store fails intermittently with one of two errors:Error 1:
ValueError: The truth value of an empty array is ambiguousTriggered when an entity row has no values set for an array column (e.g.
tags = []).Error 2:
TypeError: bad argument type for built-in operationTriggered when any user has values in an array column.
Root Cause
Arrow/Athena deserializes
Array(String)feature columns asnumpy.ndarraywithobjectdtype rather than plain Python lists. Two code paths intype_map.pydo not handle this:Scalar null-check (
_convert_scalar_values_to_proto): The lineelif not pd.isnull(value)callspd.isnull()on a numpy array, which returns an array of bools — thennot <array>raisesValueErrorbecause the truth value is ambiguous.Generic list conversion (
_convert_list_values_to_proto): The callproto_type(val=value)passes the rawnumpy.ndarraydirectly to the protobuf constructor. Protobuf rejects non-list types and raisesTypeError. Additionally, Arrow nullable columns can produceNoneelements inside the ndarray, which protobufStringListalso rejects.Type validation (
_validate_collection_item_types):Noneelements inside an ndarray fail thetype(item) in valid_typescheck before they can be sanitized downstream.Steps to Reproduce
FeatureViewwith anArray(String)field:ValueErrororTypeErrorinfeast/type_map.py.Expected Behavior
Materialization completes successfully. Array columns from Arrow/Athena are converted to proto-safe Python lists, with
Noneelements replaced by empty string.Environment
Array(String)(maps toValueType.STRING_LIST)Fix
I opened a PR with a fix for this issue: #6324