Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phase out Provider interface #4057

Open
tokoko opened this issue Mar 30, 2024 · 6 comments
Open

Phase out Provider interface #4057

tokoko opened this issue Mar 30, 2024 · 6 comments

Comments

@tokoko
Copy link
Collaborator

tokoko commented Mar 30, 2024

The concept of providers has been in feast since the early days, but the latest versions of feast have barely any use for it. There are 4 providers in the project: local, aws, gcp and azure (contrib), all of which extend passthrough provider. All of them except for aws don't make any meaningful changes to the parent functionality. aws provider adds an additional logic that manages the deployment of feature server to aws lambda.

The primary reason why providers have become irrelevant is that most of the components that they provide are extensible, configurable and pluggable anyway. For example, there's no point specifying gcp as a provider when the user still has to separately configure all relevant components (online_store, offline_store) separately. I think it can in fact be more confusing to users because setting provider as gcp and choosing aws technologies as offline and online stores still function exactly the same as with an aws provider.

Another use case for providers was supposed to be that some choices for online/offline store selection would be made for the user by default, but this also doesn't make much sense as in virtually all scenarios (except sometimes maybe for local) the user still has to configure these individual components to provide additional information (e.g. aws region, names for the service instances and so on).

I think we should start phasing out all usage of providers and eventually remove them from the codebase and docs. The only functionality aws provider provides right now (aws lambda) can easily be made available without restricting it to a single provider.

@tokoko
Copy link
Collaborator Author

tokoko commented Jun 13, 2024

@EXPEbdodla Making sure you guys are in the loop as well. Do you think provider removal will pose any problems for you?

@EXPEbdodla
Copy link
Contributor

Thanks @tokoko for adding to this thread. Currently we are using our own provider for following use cases:

  1. Enforcing default batch materialization engine as Spark and offline store as Spark.
  2. For Streaming Ingestion, we are deleting unused fields from dataframe if they are not part of schema. This can be added to the main flow.
  3. We were using aws provider before. But for feature view deletes, it triggers data deletes from Online Store (Elasticache Redis). Deletes during the apply phase are not efficient so we overwrite that in our own provider. I'm not sure if there are an plans in refining the data deletion experience when feature views are deleted.
  4. We added support for Go Feature Server to invoke the python transformation server for ODFV use cases. When we tested it, Current implementation doesn't work at a very high scale. It hangs and all further requests to ODFV are going to hang state. For this we may use provider to redirect all our clients to default to external transformation server rather than using GOPY binding for ODFV calls.

@tokoko
Copy link
Collaborator Author

tokoko commented Jun 13, 2024

@EXPEbdodla thanks for a detailed reply.

  1. For Streaming Ingestion, we are deleting unused fields from dataframe if they are not part of schema. This can be added to the main flow.

Agreed, this should probably be the default behavior.

  1. We were using aws provider before. But for feature view deletes, it triggers data deletes from Online Store (Elasticache Redis). Deletes during the apply phase are not efficient so we overwrite that in our own provider. I'm not sure if there are an plans in refining the data deletion experience when feature views are deleted.

I think that was already made optional in #4189.

  1. We added support for Go Feature Server to invoke the python transformation server for ODFV use cases. When we tested it, Current implementation doesn't work at a very high scale. It hangs and all further requests to ODFV are going to hang state. For this we may use provider to redirect all our clients to default to external transformation server rather than using GOPY binding for ODFV calls.

Go feature server was effectively (partially) removed upstream mostly because of similar reasons, go-python interop with arrow seems too much to nail down. If that's your experience as well, we should probably consider bringing it back with transformation server as a default odfv backend similar to java. (Although transformation server has it's own quirks that need to be worked on). Can you also take a look at #4266? Would be really useful to know more about your experience with go server/sdk.

To sum up, I think all except the first use case doesn't really necessitate a pluggable provider... And the first one doesn't feel important enough to warrant keeping it around either. Does that sound fair?

@EXPEbdodla
Copy link
Contributor

@tokoko One of the useful way (but its not supported currently). Validation of Feast object (Entity, Datasource, Feature View) during apply() phase. In our case, we want to ensure some certain tags, support certain datasources only. Not sure if there is a way to do that currently.

@tokoko
Copy link
Collaborator Author

tokoko commented Jun 20, 2024

It's currently not supported, but you should be able to do that with security rules in the future (#4198). It wouldn't be an explicit project-wide restriction, but you will be able to grant permissions based on object types, tags, source types, etc..

@robhowley
Copy link
Contributor

@tokoko One of the useful way (but its not supported currently). Validation of Feast object (Entity, Datasource, Feature View) during apply() phase. In our case, we want to ensure some certain tags, support certain datasources only. Not sure if there is a way to do that currently.

@EXPEbdodla we do this but in CI before we run apply. build the registry and run a registry linting step that inspects all the defined objects for naming conventions, tagging, etc. makes for a nice separation of concerns in the pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants