Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The function FileOfflineStore._merge consumes unreasonable RAM when working with feature views without entities #3863

Open
haggaishachar opened this issue Dec 8, 2023 · 0 comments

Comments

@haggaishachar
Copy link

Expected Behavior

Assuming the following file:

import pandas as pd
import numpy as np

dates = pd.date_range(pd.to_datetime("1/1/2000", utc=True),
                      pd.to_datetime("1/1/2023", utc=True),
                      freq='1H')

feature = np.random.rand(len(dates))

df = pd.DataFrame(data={'event_timestamp': dates, 'feature': feature})
df.to_parquet('dataset.parquet')

With the following feature store definition:

from feast import FileSource, FeatureView, Field
from datetime import timedelta
from feast.types import Float32

source = FileSource(name="source",
                    path="dataset.parquet",
                    timestamp_field="event_timestamp")


fv = FeatureView(
    name="fv",
    schema=[
        Field(name="feature", dtype=Float32)
    ],
    source=source,
)

The below snippet should have retuend the latest 1000 records:

from feast import FeatureStore

# lets fetch last 1000 records 
entity_df = df[-1000:]

store = FeatureStore('.')

hist = store.get_historical_features(
    entity_df = entity_df,
    features = ['fv:feature']
)

# this won't work, as it consumes huge RAM in the FileOfflineStore._merge function
hist.to_df()

Current Behavior

It crashes as it generates a huge data frame in the FileOfflineStore._merge function.

Steps to reproduce

https://colab.research.google.com/drive/1yTSQPK6H2zMq3HEswLkGUShlFIuMniS8?usp=sharing

Specifications

  • Version: v0.34.1
  • Platform: Ubuntu / Colab

Possible Solution

The "if not join_keys" logic in FileOfflineStore._merge, should be reconsidered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant