Skip to content

Commit

Permalink
fix: Bugfix for grabbing historical data from Snowflake with array ty…
Browse files Browse the repository at this point in the history
…pe features. (#3964)

Bugfix for grabbing historical data from Snowflake with array type features that are null for an entity.



Update docs to reflect array support in Snowflake

Signed-off-by: john.lemmon <john.lemmon@medely.com>
  • Loading branch information
JohnLemmonMedely committed Feb 22, 2024
1 parent b83a702 commit 1cc94f2
Show file tree
Hide file tree
Showing 4 changed files with 40 additions and 12 deletions.
20 changes: 10 additions & 10 deletions docs/reference/data-sources/overview.md
Expand Up @@ -19,13 +19,13 @@ Details for each specific data source can be found [here](README.md).
Below is a matrix indicating which data sources support which types.

| | File | BigQuery | Snowflake | Redshift | Postgres | Spark | Trino |
| :-------------------------------- | :-- | :-- | :-- | :-- | :-- | :-- | :-- |
| `bytes` | yes | yes | yes | yes | yes | yes | yes |
| `string` | yes | yes | yes | yes | yes | yes | yes |
| `int32` | yes | yes | yes | yes | yes | yes | yes |
| `int64` | yes | yes | yes | yes | yes | yes | yes |
| `float32` | yes | yes | yes | yes | yes | yes | yes |
| `float64` | yes | yes | yes | yes | yes | yes | yes |
| `bool` | yes | yes | yes | yes | yes | yes | yes |
| `timestamp` | yes | yes | yes | yes | yes | yes | yes |
| array types | yes | yes | no | no | yes | yes | no |
| :-------------------------------- | :-- | :-- |:----------| :-- | :-- | :-- | :-- |
| `bytes` | yes | yes | yes | yes | yes | yes | yes |
| `string` | yes | yes | yes | yes | yes | yes | yes |
| `int32` | yes | yes | yes | yes | yes | yes | yes |
| `int64` | yes | yes | yes | yes | yes | yes | yes |
| `float32` | yes | yes | yes | yes | yes | yes | yes |
| `float64` | yes | yes | yes | yes | yes | yes | yes |
| `bool` | yes | yes | yes | yes | yes | yes | yes |
| `timestamp` | yes | yes | yes | yes | yes | yes | yes |
| array types | yes | yes | yes | no | yes | yes | no |
2 changes: 1 addition & 1 deletion docs/reference/data-sources/snowflake.md
Expand Up @@ -46,5 +46,5 @@ The full set of configuration options is available [here](https://rtd.feast.dev/

## Supported Types

Snowflake data sources support all eight primitive types, but currently do not support array types.
Snowflake data sources support all eight primitive types. Array types are also supported but not with type inference.
For a comparison against other batch data sources, please see [here](overview.md#functionality-matrix).
4 changes: 3 additions & 1 deletion sdk/python/feast/infra/offline_stores/snowflake.py
Expand Up @@ -463,7 +463,9 @@ def _to_df_internal(self, timeout: Optional[int] = None) -> pd.DataFrame:
Array(Float32),
Array(Bool),
]:
df[feature.name] = [json.loads(x) for x in df[feature.name]]
df[feature.name] = [
json.loads(x) if x else None for x in df[feature.name]
]

return df

Expand Down
26 changes: 26 additions & 0 deletions sdk/python/tests/unit/infra/offline_stores/test_snowflake.py
@@ -1,14 +1,18 @@
import re
from unittest.mock import ANY, MagicMock, patch

import pandas as pd
import pytest
from pytest_mock import MockFixture

from feast import FeatureView, Field, FileSource
from feast.infra.offline_stores.snowflake import (
SnowflakeOfflineStoreConfig,
SnowflakeRetrievalJob,
)
from feast.infra.online_stores.sqlite import SqliteOnlineStoreConfig
from feast.repo_config import RepoConfig
from feast.types import Array, String


@pytest.fixture(params=["s3", "s3gov"])
Expand Down Expand Up @@ -55,3 +59,25 @@ def test_to_remote_storage(retrieval_job):
mock_get_file_names_from_copy.assert_called_once_with(ANY, ANY)
native_path = mock_get_file_names_from_copy.call_args[0][1]
assert re.match("^s3://.*", native_path), "path should be s3://*"


def test_snowflake_to_df_internal(
retrieval_job: SnowflakeRetrievalJob, mocker: MockFixture
):
mock_execute = mocker.patch(
"feast.infra.offline_stores.snowflake.execute_snowflake_statement"
)
mock_execute.return_value.fetch_pandas_all.return_value = pd.DataFrame.from_dict(
{"feature1": ['["1", "2", "3"]', None, "[]"]} # For Valid, Null, and Empty
)

feature_view = FeatureView(
name="my-feature-view",
entities=[],
schema=[
Field(name="feature1", dtype=Array(String)),
],
source=FileSource(path="dummy.path"), # Dummy value
)
retrieval_job._feature_views = [feature_view]
retrieval_job._to_df_internal()

0 comments on commit 1cc94f2

Please sign in to comment.