Phase out Provider interface #4057

tokoko · 2024-03-30T09:41:14Z

The concept of providers has been in feast since the early days, but the latest versions of feast have barely any use for it. There are 4 providers in the project: local, aws, gcp and azure (contrib), all of which extend passthrough provider. All of them except for aws don't make any meaningful changes to the parent functionality. aws provider adds an additional logic that manages the deployment of feature server to aws lambda.

The primary reason why providers have become irrelevant is that most of the components that they provide are extensible, configurable and pluggable anyway. For example, there's no point specifying gcp as a provider when the user still has to separately configure all relevant components (online_store, offline_store) separately. I think it can in fact be more confusing to users because setting provider as gcp and choosing aws technologies as offline and online stores still function exactly the same as with an aws provider.

Another use case for providers was supposed to be that some choices for online/offline store selection would be made for the user by default, but this also doesn't make much sense as in virtually all scenarios (except sometimes maybe for local) the user still has to configure these individual components to provide additional information (e.g. aws region, names for the service instances and so on).

I think we should start phasing out all usage of providers and eventually remove them from the codebase and docs. The only functionality aws provider provides right now (aws lambda) can easily be made available without restricting it to a single provider.

The text was updated successfully, but these errors were encountered:

tokoko · 2024-06-13T04:33:16Z

@EXPEbdodla Making sure you guys are in the loop as well. Do you think provider removal will pose any problems for you?

EXPEbdodla · 2024-06-13T17:50:53Z

Thanks @tokoko for adding to this thread. Currently we are using our own provider for following use cases:

Enforcing default batch materialization engine as Spark and offline store as Spark.
For Streaming Ingestion, we are deleting unused fields from dataframe if they are not part of schema. This can be added to the main flow.
We were using aws provider before. But for feature view deletes, it triggers data deletes from Online Store (Elasticache Redis). Deletes during the apply phase are not efficient so we overwrite that in our own provider. I'm not sure if there are an plans in refining the data deletion experience when feature views are deleted.
We added support for Go Feature Server to invoke the python transformation server for ODFV use cases. When we tested it, Current implementation doesn't work at a very high scale. It hangs and all further requests to ODFV are going to hang state. For this we may use provider to redirect all our clients to default to external transformation server rather than using GOPY binding for ODFV calls.

tokoko · 2024-06-13T18:34:42Z

@EXPEbdodla thanks for a detailed reply.

For Streaming Ingestion, we are deleting unused fields from dataframe if they are not part of schema. This can be added to the main flow.

Agreed, this should probably be the default behavior.

We were using aws provider before. But for feature view deletes, it triggers data deletes from Online Store (Elasticache Redis). Deletes during the apply phase are not efficient so we overwrite that in our own provider. I'm not sure if there are an plans in refining the data deletion experience when feature views are deleted.

I think that was already made optional in #4189.

We added support for Go Feature Server to invoke the python transformation server for ODFV use cases. When we tested it, Current implementation doesn't work at a very high scale. It hangs and all further requests to ODFV are going to hang state. For this we may use provider to redirect all our clients to default to external transformation server rather than using GOPY binding for ODFV calls.

Go feature server was effectively (partially) removed upstream mostly because of similar reasons, go-python interop with arrow seems too much to nail down. If that's your experience as well, we should probably consider bringing it back with transformation server as a default odfv backend similar to java. (Although transformation server has it's own quirks that need to be worked on). Can you also take a look at #4266? Would be really useful to know more about your experience with go server/sdk.

To sum up, I think all except the first use case doesn't really necessitate a pluggable provider... And the first one doesn't feel important enough to warrant keeping it around either. Does that sound fair?

EXPEbdodla · 2024-06-20T21:28:34Z

@tokoko One of the useful way (but its not supported currently). Validation of Feast object (Entity, Datasource, Feature View) during apply() phase. In our case, we want to ensure some certain tags, support certain datasources only. Not sure if there is a way to do that currently.

tokoko · 2024-06-20T22:19:05Z

It's currently not supported, but you should be able to do that with security rules in the future (#4198). It wouldn't be an explicit project-wide restriction, but you will be able to grant permissions based on object types, tags, source types, etc..

robhowley · 2024-10-15T20:14:52Z

@tokoko One of the useful way (but its not supported currently). Validation of Feast object (Entity, Datasource, Feature View) during apply() phase. In our case, we want to ensure some certain tags, support certain datasources only. Not sure if there is a way to do that currently.

@EXPEbdodla we do this but in CI before we run apply. build the registry and run a registry linting step that inspects all the defined objects for naming conventions, tagging, etc. makes for a nice separation of concerns in the pipeline.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phase out Provider interface #4057

Phase out Provider interface #4057

tokoko commented Mar 30, 2024 •

edited

Loading

tokoko commented Jun 13, 2024

EXPEbdodla commented Jun 13, 2024

tokoko commented Jun 13, 2024

EXPEbdodla commented Jun 20, 2024

tokoko commented Jun 20, 2024

robhowley commented Oct 15, 2024

Phase out Provider interface #4057

Phase out Provider interface #4057

Comments

tokoko commented Mar 30, 2024 • edited Loading

tokoko commented Jun 13, 2024

EXPEbdodla commented Jun 13, 2024

tokoko commented Jun 13, 2024

EXPEbdodla commented Jun 20, 2024

tokoko commented Jun 20, 2024

robhowley commented Oct 15, 2024

tokoko commented Mar 30, 2024 •

edited

Loading