Feast push (Redshift/DynamoDb) not work with PushMode.ONLINE_AND_OFFLINE when more than 500 columns #3282

beubeu13220 · 2022-10-10T14:06:04Z

Expected Behavior

Currently, we have a push source with Redshift Offline Store and DynamoDb Online Store.
We built our view with more than 500 columns. Around 750 columns.

We expected to ingest data in dynamo and redshift when we run
fs.push("push_source", df, to=PushMode.ONLINE_AND_OFFLINE)

Current Behavior

Push command raise an issue like [ERROR] ValueError: The input dataframe has columns ..
This issue come from get_table_column_names_and_types method in write_to_offline_store method.
In the method, we check if if set(input_columns) != set(source_columns) and raise the below issue if there are diff.

In case with more than 500 columns we get a diff because source_columns come from get_table_column_names_and_types method result where the result is define by MaxResults parameters.

Steps to reproduce

entity= Entity(
    name="entity",
    join_keys=["entity_id"],
    value_type=ValueType.INT64,
)

push_source = PushSource(
    name="push_source",
    batch_source=RedshiftSource(
        table="fs_push_view",
        timestamp_field="datecreation",
        created_timestamp_column="created_at"),
)

besoin_embedding_push_view = FeatureView(
    name="push_view",
    entities=[entity],
    schema=[Field(name=f"field_{dim}", dtype=types.Float64) for dim in range(768)],
    source=push_source 
)

fs.push("push_source", df, to=PushMode.ONLINE_AND_OFFLINE)

Specifications

Version: 0.25.0
Platform: AWS
Subsystem:

Possible Solution

In my mind, we have two solutions:

Set higher MaxResults in describe_table method
Use NextToken to iterate through results

The text was updated successfully, but these errors were encountered:

achals · 2022-10-12T16:41:12Z

Hi @beubeu13220 , I think either of the two solutions are good options. I think I'd prefer the NextToken approach simply because it's probably the most stable one.

Would you like to make a PR to add this functionality? We'd be happy to review!

beubeu13220 · 2022-10-26T14:16:54Z

Hi @achals,

Yes, I'll do that as soon as I have time.
For the moment, we use custom write_to_offline_redshift function.

beubeu13220 added kind/bug priority/p2 labels Oct 10, 2022

achals added the good first issue Good for newcomers label Oct 12, 2022

beubeu13220 mentioned this issue Dec 4, 2022

fix: Get all columns with describe table method from RedshiftData-api #3377

Merged

adchia closed this as completed in #3377 Dec 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feast push (Redshift/DynamoDb) not work with PushMode.ONLINE_AND_OFFLINE when more than 500 columns #3282

Feast push (Redshift/DynamoDb) not work with PushMode.ONLINE_AND_OFFLINE when more than 500 columns #3282

beubeu13220 commented Oct 10, 2022

achals commented Oct 12, 2022

beubeu13220 commented Oct 26, 2022 •

edited

Feast push (Redshift/DynamoDb) not work with PushMode.ONLINE_AND_OFFLINE when more than 500 columns #3282

Feast push (Redshift/DynamoDb) not work with PushMode.ONLINE_AND_OFFLINE when more than 500 columns #3282

Comments

beubeu13220 commented Oct 10, 2022

Expected Behavior

Current Behavior

Steps to reproduce

Specifications

Possible Solution

achals commented Oct 12, 2022

beubeu13220 commented Oct 26, 2022 • edited

beubeu13220 commented Oct 26, 2022 •

edited