-
Notifications
You must be signed in to change notification settings - Fork 961
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature View's ttl is misleading #4133
Comments
I need to spend more time thinking about this but I do agree the |
I'm not a big fan of a I think my preferred approach would be to fix the underlying issue rather than change the parameter name. the main problem here is that as the issue indicated, |
Fixing |
If you look at the documentation it says:
According to Wikipedia:
And in HTTP:
So it would be rational for the Options
|
Regarding option 2, there are situations where features will never be fetched again for a given entity key. Example: imagine that you have features calculated for a customer entity to be used in your product. However, some customers cancel their accounts on your product. You don't need to make inference and generate features for those customers anymore. Materialization will no longer update features and inference endpoints will not call the get_online_features function for those customers anymore. And then, the old data will remain in the online store forever unless some cleanup is done. |
I'm in favor of a mix between options 2 and 3:
|
Yeah, agreed. Approach (2) + (3) is the right one. Only thing left to decide is naming conventions...do you all have any opinions here? For example, we could continue to keep the name Or we could go the route of And I do like |
I would recommend we store the |
@franciscojavierarceo For some reason, I assumed an entity timestamp was not part of the online store, my bad. if we already have an entity timestamp in there and there's a ttl field in a feature view, that first point is redundant. online store can decide whether values are expired or not based on those 2. |
I believe it probably would be an improvement to permit the offline data to include an explicit "end_event_timestamp" column, which users could leverage to achieve a TTL effect if they want it. I do not think it would be good to continue calculating TTL by adding a duration to the timestamp that is currently used to keep track of processing-time logic such as incremental materializations. I think the current "TTL" based mechanic for the offline store data (including historical queries and materialization) should be regarded as a quirk or compromise of particular offline storage implementations, and should therefore be eventually moved out of Feast's general API contract in order to make room for more capable implementations. Since it operates on the same timestamp as the incremental materializations do, it means that the concept of processing time (the time that the system handled the event - for example, the time that the row was written to the store) is unrecoverably mixed/confused with event time (the time the event actually occurred in the real world). For example, an incremental materialization job is interested in knowing what has changed since the last processing time that the previous materialization job completed. But, from a query perspective, we may not care about the order in which the data arrived, or the particular delays that each row was subject to. The querier may only care about the event time. The existing TTL setting for the offline store is also inconsistent with the TTL configuration/functionality (if supported) of the online store. This can be confusing, and it also means that the online store and offline store disagree in their query responses, which ideally should not happen. Look, for example, at the Redis implementation. I am hopeful that by adding support for newer offline store technology such as Apache Iceberg (which is capable of efficiently representing non-destructive deletions and responding to time-travel queries for things like incremental materialization) and by carefully implementing Feast's corresponding online & offline stores, we can work toward a more accurate and consistent data representation (one which, for example, supports deletes not just inserts, updates, and TTLs). Basically, some data sets might find a use for TTL (to expire data in the future). But, we should regard TTL as a special case of a more general idea of facts ceasing to be true. |
Just to push back a little on Having said all that, I don't think I disagree with the practical implications of what you're saying though. For example, it might be a good idea to redesign incremental materializations in a way that takes into account processing time travel capabilities of some data sources (iceberg, delta) while disallowing their usage in other scenarios (or maybe very explicitly noting about the limitations if disallowing is too drastic). |
Is your feature request related to a problem? Please describe.
The field name
ttl
used inFeatureView
is misleading.Time to Live (TTL) is a well known term that defines a mechanism to automatically expire database records. In Redis, for example, the command
EXPIRE
defines a time to live for a given key.When I used that FeatureView.ttl configuration for the first time, I expected to set some expiration time (or TTL) for keys written into online stores. The documentation states something like this here.
However FeatureView's TTL is not used in that way. It's used to define
start_date
to extract data from offline store during materialization step here. The actual key ttl configuration used to expire entities iskey_ttl_seconds
at online store level here.I had to explain this sometimes to different people because of this misunderstanding. I think this is also kind of mentioned here and here.
Describe the solution you'd like
My suggestion is renaming this field to something like
materialization_lookback_interval
,lookback_interval
orlookback_delta
since this is the actual meaning of this configuration as far as I understand. Also update docs accordingly.The text was updated successfully, but these errors were encountered: