Skip to content

fix: Handle numpy ndarray in Array(String) materialization#6327

Open
SIDDHESH1564 wants to merge 6 commits intofeast-dev:masterfrom
SIDDHESH1564:fix/ndarray-array-string-materialization
Open

fix: Handle numpy ndarray in Array(String) materialization#6327
SIDDHESH1564 wants to merge 6 commits intofeast-dev:masterfrom
SIDDHESH1564:fix/ndarray-array-string-materialization

Conversation

@SIDDHESH1564
Copy link
Copy Markdown

@SIDDHESH1564 SIDDHESH1564 commented Apr 24, 2026

What this PR does / why we need it:

Materializing feature views that contain Array(String) columns using the Athena offline store fails with TypeError or ValueError.

Root cause:
Arrow/Athena deserializes array columns as numpy.ndarray (object dtype) instead of Python lists, which breaks assumptions in feast/type_map.py.

Issues observed:

  • _validate_collection_item_types: None elements inside ndarray fail strict type validation.
  • _convert_list_values_to_proto: Passing numpy.ndarray directly to protobuf constructors (e.g. StringList) raises TypeError. None elements are also invalid for protobuf repeated fields.
  • _convert_scalar_values_to_proto: pd.isnull(ndarray) returns an array of booleans; applying not raises ValueError ("truth value of an empty array is ambiguous").

Fix implemented:

  • Convert numpy.ndarray → Python list using .tolist() before proto conversion
  • Replace None elements with type-appropriate defaults:
    • "" (string), 0 (int), 0.0 (float), False (bool)
  • Allow None during intermediate validation in _validate_collection_item_types

Which issue(s) this PR fixes:

Fixes #6325


Checks

  • I've made sure the tests are passing.
  • My commits are signed off (git commit -s)
  • My PR title follows conventional commits format

Testing Strategy

  • Unit tests
  • Integration tests
  • Manual tests
  • Testing is not required for this change

Details:
Added TestNdarrayListConversion in test_type_map.py with regression coverage for:

  • ndarray string/int/double/bool list conversions
  • Replacement of None elements with defaults
  • Empty ndarray producing null ProtoValue
  • Mixed batch scenarios (populated array + None + empty array)

Misc

Adds regression coverage to ensure stable handling of ndarray-backed array feature columns from Athena/Arrow going forward.


Open in Devin Review

@SIDDHESH1564 SIDDHESH1564 requested a review from a team as a code owner April 24, 2026 18:25
devin-ai-integration[bot]

This comment was marked as resolved.

Signed-off-by: Siddhesh Khairnar <khairnarsiddhesh4057@gmail.com>
Signed-off-by: Siddhesh Khairnar <khairnarsiddhesh4057@gmail.com>
@SIDDHESH1564 SIDDHESH1564 force-pushed the fix/ndarray-array-string-materialization branch from db3a30b to e754d58 Compare April 24, 2026 18:39
@ntkathole
Copy link
Copy Markdown
Member

ntkathole commented Apr 25, 2026

@SIDDHESH1564 dupe of #6324 ? Feel free to review other PR

Signed-off-by: Siddhesh Khairnar <khairnarsiddhesh4057@gmail.com>
devin-ai-integration[bot]

This comment was marked as resolved.

…change

Signed-off-by: Siddhesh Khairnar <khairnarsiddhesh4057@gmail.com>
@SIDDHESH1564 SIDDHESH1564 force-pushed the fix/ndarray-array-string-materialization branch from a7d3fc5 to c77302a Compare April 25, 2026 18:43
devin-ai-integration[bot]

This comment was marked as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: TypeError / ValueError when materializing Array(String) feature views with Athena offline store

2 participants