Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature View's ttl is misleading #4133

Open
breno-costa opened this issue Apr 23, 2024 · 11 comments
Open

Feature View's ttl is misleading #4133

breno-costa opened this issue Apr 23, 2024 · 11 comments
Labels
kind/feature New feature or request

Comments

@breno-costa
Copy link
Contributor

breno-costa commented Apr 23, 2024

Is your feature request related to a problem? Please describe.
The field name ttl used in FeatureView is misleading.
Time to Live (TTL) is a well known term that defines a mechanism to automatically expire database records. In Redis, for example, the command EXPIRE defines a time to live for a given key.

When I used that FeatureView.ttl configuration for the first time, I expected to set some expiration time (or TTL) for keys written into online stores. The documentation states something like this here.

The amount of time this group of features lives. A ttl of 0 indicates that this group of features lives forever.

However FeatureView's TTL is not used in that way. It's used to define start_date to extract data from offline store during materialization step here. The actual key ttl configuration used to expire entities is key_ttl_seconds at online store level here.

I had to explain this sometimes to different people because of this misunderstanding. I think this is also kind of mentioned here and here.

Describe the solution you'd like
My suggestion is renaming this field to something like materialization_lookback_interval, lookback_interval or lookback_delta since this is the actual meaning of this configuration as far as I understand. Also update docs accordingly.

@breno-costa breno-costa added the kind/feature New feature or request label Apr 23, 2024
@franciscojavierarceo
Copy link
Member

I need to spend more time thinking about this but I do agree the ttl at the FeatureView level is misleading as I had this exact experience in my last role and it caused me some headaches. I think renaming it to follow industry conventions would be good.

@tokoko
Copy link
Collaborator

tokoko commented Apr 23, 2024

I'm not a big fan of a lookback name tbh. I think it only makes sense if online store is the only component that we're focusing on. If we set aside online stores for a minute, during get_historical_features call the way offline store handles the rows sounds compatible to the meaning of ttl to me. When a client askს for a feature value for a specific point in time, only the values that haven't expired at that time are considered. The fact that it's doing a "lookback" to do that is an implementation detail.

I think my preferred approach would be to fix the underlying issue rather than change the parameter name. the main problem here is that as the issue indicated, ttl is not handled the same way in the offline and online flows. It's ignored in online stores 😄. I understand there are good reasons why it might be hard to actually delete expired rows from online storage during materialization, but what we can do is to discard expired values after in the online store logic itself. Once we have that, ttl wouldn't be so misleading anymore.

@franciscojavierarceo
Copy link
Member

Fixing ttl to behave as expected would be ideal. I haven't used the offline store as much but if it's using the ttl as expected then I agree with your approach.

@franciscojavierarceo
Copy link
Member

franciscojavierarceo commented Apr 23, 2024

If you look at the documentation it says:

Feature views consist of:
...
(optional) a TTL, which limits how far back Feast will look when generating historical datasets

According to Wikipedia:

Time to live (TTL) or hop limit is a mechanism which limits the lifespan or lifetime of data in a computer or network...The Time to Live is an indication of an upper bound on the lifetime of an internet datagram.

And in HTTP:

Time to live may also be expressed as the date and time on which a record expires. The Expires: header in HTTP responses, the Cache-Control: max-age header field in both requests and responses and the expires field in HTTP cookies express time-to-live in this way.

So it would be rational for the ttl for an online Feature View to behave as an "upper bound on the lifetime of data in a database."

Options

  1. We could change ttl to offline_store_ttl or offline_ttl to make this name more intuitive and explicit
  2. We could add another parameter called online_store_ttl or online_ttl and replicate the HTTP behavior by:
    • returning None or Expired along with some metadata when calling get_online_features
    • Dropping the record in the database when calling get_online_features after the read is recieved
  3. We could create a command to expire offline data or online data in batch and call it something like feast expire feature_view that users could run on some schedule

Thoughts @tokoko @HaoXuAI?

@breno-costa
Copy link
Contributor Author

breno-costa commented Apr 23, 2024

Regarding option 2, there are situations where features will never be fetched again for a given entity key.

Example: imagine that you have features calculated for a customer entity to be used in your product. However, some customers cancel their accounts on your product. You don't need to make inference and generate features for those customers anymore.

Materialization will no longer update features and inference endpoints will not call the get_online_features function for those customers anymore. And then, the old data will remain in the online store forever unless some cleanup is done.

@tokoko
Copy link
Collaborator

tokoko commented Apr 27, 2024

I'm in favor of a mix between options 2 and 3:

  • We store ttl information (expire_date for example) in the online store for every entity in the feature view during materialization.
  • online store reader treats expired feature values in an online store as if they're not there, discards them after reading from the database.
  • introduce feast cleanup command that will physically remove all expired data from an online store. (This is only relevant for an online store, in the offline store nothing's really ever expired as the user might always want to query past information). This command should be only for housekeeping and not affect online store behavior at all, in other words it will only remove the feature values from the online store that would be discarded by online store read method anyway.

@franciscojavierarceo
Copy link
Member

franciscojavierarceo commented Apr 27, 2024

Yeah, agreed. Approach (2) + (3) is the right one.

Only thing left to decide is naming conventions...do you all have any opinions here?

For example, we could continue to keep the name ttl and just make the behavior more obvious (and document it) within each respective function call (i.e., making ttl behave as expected for get_online_features and get_offline_features).

Or we could go the route of online_ttl and renaming the current ttl to offline_ttl.

And I do like feast cleanup but that may also make the user think more is happening than dropping records. Not sure.

@franciscojavierarceo
Copy link
Member

franciscojavierarceo commented Apr 29, 2024

  • We store ttl information (expire_date for example) in the online store for every entity in the feature view during materialization.

I would recommend we store the expire_date in the feature view as metadata. Changes to the expiration will be more straightforward that way.

@tokoko
Copy link
Collaborator

tokoko commented Apr 29, 2024

@franciscojavierarceo For some reason, I assumed an entity timestamp was not part of the online store, my bad. if we already have an entity timestamp in there and there's a ttl field in a feature view, that first point is redundant. online store can decide whether values are expired or not based on those 2.

@rehevkor5
Copy link

I believe it probably would be an improvement to permit the offline data to include an explicit "end_event_timestamp" column, which users could leverage to achieve a TTL effect if they want it.

I do not think it would be good to continue calculating TTL by adding a duration to the timestamp that is currently used to keep track of processing-time logic such as incremental materializations. I think the current "TTL" based mechanic for the offline store data (including historical queries and materialization) should be regarded as a quirk or compromise of particular offline storage implementations, and should therefore be eventually moved out of Feast's general API contract in order to make room for more capable implementations. Since it operates on the same timestamp as the incremental materializations do, it means that the concept of processing time (the time that the system handled the event - for example, the time that the row was written to the store) is unrecoverably mixed/confused with event time (the time the event actually occurred in the real world).

For example, an incremental materialization job is interested in knowing what has changed since the last processing time that the previous materialization job completed. But, from a query perspective, we may not care about the order in which the data arrived, or the particular delays that each row was subject to. The querier may only care about the event time.

The existing TTL setting for the offline store is also inconsistent with the TTL configuration/functionality (if supported) of the online store. This can be confusing, and it also means that the online store and offline store disagree in their query responses, which ideally should not happen. Look, for example, at the Redis implementation.

I am hopeful that by adding support for newer offline store technology such as Apache Iceberg (which is capable of efficiently representing non-destructive deletions and responding to time-travel queries for things like incremental materialization) and by carefully implementing Feast's corresponding online & offline stores, we can work toward a more accurate and consistent data representation (one which, for example, supports deletes not just inserts, updates, and TTLs). Basically, some data sets might find a use for TTL (to expire data in the future). But, we should regard TTL as a special case of a more general idea of facts ceasing to be true.

@tokoko
Copy link
Collaborator

tokoko commented Jul 17, 2024

Just to push back a little on event_timestamp interpretation: The way I think about it, it is very much an event time rather than a processing time, just the one that disallows late-arriving data and deletions at least as far as incremental materialization is concerned. I might be wrong, but I think it was at first designed with streaming feature generation in mind, a case in which using processing time as a proxy for an event time is probably almost always acceptable, but I still wouldn't go as far as calling it a processing time field. A user (having full control of an offline store table) can repopulate past events, make deletions, there's even a feature in feast that an offline store will look at only the latest row (in terms of load_timestamp) if multiple rows for the same entity with exactly the same event_timestamp are present in the offline dataset. Just the fact the we have both load_timestamp and event_timestamp is probably enough to know what (at least) the intention behind the fields are. I fully admit that most of these stuff is blatantly ignored by incremental materialization just like ttl is ignored by the online stores, but I don't think it's a good idea to reinterpret event_timestamp because of those behaviors.

Having said all that, I don't think I disagree with the practical implications of what you're saying though. For example, it might be a good idea to redesign incremental materializations in a way that takes into account processing time travel capabilities of some data sources (iceberg, delta) while disallowing their usage in other scenarios (or maybe very explicitly noting about the limitations if disallowing is too drastic).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants