Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Implement gRPC server to ingest streaming features #3687

Merged
merged 15 commits into from Sep 7, 2023

Conversation

mehmettokgoz
Copy link
Contributor

What this PR does / why we need it:

Currently only way to ingest streaming features to Feast is using the Spark/Kafka processor. This processor has several limitations. It retrieves data in batches and forces to run all transformations in Python runtime.

The gRPC server that listens for the streaming features allows users to apply transformation on more powerful streaming engines and send it to Feast then. This feature introduces new streaming sources for Feast, in addition to Kafka and Kinesis.

Our main purpose for the gRPC ingestion service is to integrate Hazelcast streaming engine with Feast. Hazelcast provides various connectors, and supports to process those sources in real-time. This server makes possible to sink transformed data to Feast.

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mehmettokgoz
To complete the pull request process, please assign kevjumba after the PR has been reviewed.
You can assign the PR to them by writing /assign @kevjumba in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mehmettokgoz mehmettokgoz changed the title feat: gRPC server implementation for ingesting streaming features feat: Implement gRPC server to ingest streaming features Jul 17, 2023
@mehmettokgoz mehmettokgoz requested a review from adchia July 27, 2023 20:18
@mehmettokgoz
Copy link
Contributor Author

Thanks for review @adchia. Waiting for your answers.

sdk/python/feast/infra/contrib/grpc_server.py Outdated Show resolved Hide resolved
sdk/python/feast/infra/contrib/grpc_server.py Show resolved Hide resolved
sdk/python/feast/infra/contrib/grpc_server.py Outdated Show resolved Hide resolved
sdk/python/feast/infra/contrib/grpc_server.py Outdated Show resolved Hide resolved
sdk/python/feast/infra/contrib/grpc_server.py Outdated Show resolved Hide resolved
@mehmettokgoz
Copy link
Contributor Author

@adchia addressed review comments. ✅

Changes:

  • Do not force only one feature view. Proto object contains a feature_view_name field.
  • Push API is supported.
  • Health check service is added to server.
  • grpcio packages are upgraded to 1.56.2
  • CLI is refactored.

@crispin-ki
Copy link
Contributor

I saw a comment about tests failing in this PR in the community notes.

Failing python unit tests should be fixed by #3734. Once that PR is merged you can rebase this branch on top of master and the failing tests should pass.

@mehmettokgoz
Copy link
Contributor Author

Thanks @crispin-ki

Do you have any idea why linter might be failing? The errors are not related to the changes in this PR and I see it's green for your PR.

@crispin-ki
Copy link
Contributor

crispin-ki commented Aug 17, 2023

Thanks @crispin-ki

Do you have any idea why linter might be failing? The errors are not related to the changes in this PR and I see it's green for your PR.

Yeah it's a bit weird that these are failing now, seeing as you didn't touch them. I think you have a few options though:

  • logs for failing workflow (pasted at bottom of this comment too) are all E721s, so you could add E721 here so those errors are now ignored altogether
  • alternatively you could go to each line that is failing and add # noqa: E721 at the end of line similar to here but with E721 instead, so flake doesn't check that line for E721
  • alternatively you could go to each line that is failing and change them so flake is happy (in most cases change == to is)

In terms of which is preferred, I think all have pros and cons. I think just go with whichever you feel is most appropriate (or if unsure, might be one for maintainers to decide - they might in fact have a better solution )

Logs for the failing workflow:

cd /home/runner/work/feast/feast/sdk/python; python -m flake8 feast/ tests/
feast/diff/registry_diff.py:[13](https://github.com/feast-dev/feast/actions/runs/5887487203/job/15966956374?pr=3687#step:9:14)6:16: E721 do not compare types, for exact checks use `is` / `is not`, for instance checks use `isinstance()`
feast/feature_store.py:1109:12: E721 do not compare types, for exact checks use `is` / `is not`, for instance checks use `isinstance()`
feast/field.py:65:12: E721 do not compare types, for exact checks use `is` / `is not`, for instance checks use `isinstance()`
feast/infra/offline_stores/bigquery.py:728:12: E721 do not compare types, for exact checks use `is` / `is not`, for instance checks use `isinstance()`
feast/type_map.py:748:8: E721 do not compare types, for exact checks use `is` / `is not`, for instance checks use `isinstance()`
tests/unit/local_feast_tests/test_local_feature_store.py:472:20: E721 do not compare types, for exact checks use `is` / `is not`, for instance checks use `isinstance()`
tests/unit/test_feature_views.py:183:[16](https://github.com/feast-dev/feast/actions/runs/5887487203/job/15966956374?pr=3687#step:9:17): E721 do not compare types, for exact checks use `is` / `is not`, for instance checks use `isinstance()`

@mehmettokgoz
Copy link
Contributor Author

@crispin-ki

I found that it is related to new version of flake8 (which upgrades pycodestyle). Now pycodetype check for EE721 (https://github.com/PyCQA/pycodestyle/blob/main/CHANGES.txt#L12). I set flake version to >=6.0.0,<6.1.0.

mehmettokgoz and others added 12 commits September 5, 2023 01:09
Signed-off-by: mehmettokgoz <mehmet.tokgoz@hazelcast.com>
Signed-off-by: Danny C <d.chiao@gmail.com>
Signed-off-by: mehmettokgoz <mehmet.tokgoz@hazelcast.com>
Signed-off-by: Danny C <d.chiao@gmail.com>
Signed-off-by: mehmettokgoz <mehmet.tokgoz@hazelcast.com>
Signed-off-by: Danny C <d.chiao@gmail.com>
Signed-off-by: mehmettokgoz <mehmet.tokgoz@hazelcast.com>
Signed-off-by: Danny C <d.chiao@gmail.com>
Signed-off-by: mehmettokgoz <mehmet.tokgoz@hazelcast.com>
Signed-off-by: Danny C <d.chiao@gmail.com>
Signed-off-by: mehmettokgoz <mehmet.tokgoz@hazelcast.com>
Signed-off-by: Danny C <d.chiao@gmail.com>
Signed-off-by: mehmettokgoz <mehmet.tokgoz@hazelcast.com>
Signed-off-by: Danny C <d.chiao@gmail.com>
Signed-off-by: mehmettokgoz <mehmet.tokgoz@hazelcast.com>
Signed-off-by: Danny C <d.chiao@gmail.com>
Signed-off-by: mehmettokgoz <mehmet.tokgoz@hazelcast.com>
Signed-off-by: Danny C <d.chiao@gmail.com>
Signed-off-by: mehmettokgoz <mehmet.tokgoz@hazelcast.com>
Signed-off-by: Danny C <d.chiao@gmail.com>
Signed-off-by: mehmettokgoz <mehmet.tokgoz@hazelcast.com>
Signed-off-by: Danny C <d.chiao@gmail.com>
Signed-off-by: Danny C <d.chiao@gmail.com>
@adchia
Copy link
Collaborator

adchia commented Sep 5, 2023

/lgtm

@adchia
Copy link
Collaborator

adchia commented Sep 5, 2023

/lgtm

@adchia
Copy link
Collaborator

adchia commented Sep 5, 2023

/lgtm /approve

Signed-off-by: Danny C <d.chiao@gmail.com>
Signed-off-by: Danny C <d.chiao@gmail.com>
Signed-off-by: Danny C <d.chiao@gmail.com>
@adchia adchia merged commit a3fcd1f into feast-dev:master Sep 7, 2023
16 checks passed
adchia pushed a commit that referenced this pull request Sep 7, 2023
# [0.34.0](v0.33.0...v0.34.0) (2023-09-07)

### Bug Fixes

* Add NUMERIC to bq_to_feast type map ([#3719](#3719)) ([6474b4b](6474b4b))
* Fix python unit tests ([#3734](#3734)) ([e81684d](e81684d))
* Handle unknown postgres source types gracefully ([#3634](#3634)) ([d7041f4](d7041f4))
* Pin protobuf version to avoid seg fault on some machines ([028cc20](028cc20))
* Remove unwanted excessive splitting of gcs path, so expected gcs parquet paths are returned from BigQueryRetrievalJob.to_remote_storage() ([#3730](#3730)) ([f2c5988](f2c5988))
* Run store.plan() only when need it. ([#3708](#3708)) ([7bc7c47](7bc7c47))
* Saved datasets no longer break CLI registry-dump command ([#3717](#3717)) ([f28ccc2](f28ccc2))
* Update py3.8 ci requirements for cython 3.0 release ([#3735](#3735)) ([1695c13](1695c13))

### Features

* Enhance customization of Trino connections when using Trino-based Offline Stores ([#3699](#3699)) ([ed7535e](ed7535e))
* Implement gRPC server to ingest streaming features ([#3687](#3687)) ([a3fcd1f](a3fcd1f))
james-crabtree-sp pushed a commit to sailpoint/feast that referenced this pull request Sep 14, 2023
)

* Implemented gRPC server for ingesting streaming features.

Signed-off-by: mehmettokgoz <mehmet.tokgoz@hazelcast.com>
Signed-off-by: Danny C <d.chiao@gmail.com>
james-crabtree-sp pushed a commit to sailpoint/feast that referenced this pull request Sep 14, 2023
# [0.34.0](feast-dev/feast@v0.33.0...v0.34.0) (2023-09-07)

### Bug Fixes

* Add NUMERIC to bq_to_feast type map ([feast-dev#3719](feast-dev#3719)) ([6474b4b](feast-dev@6474b4b))
* Fix python unit tests ([feast-dev#3734](feast-dev#3734)) ([e81684d](feast-dev@e81684d))
* Handle unknown postgres source types gracefully ([feast-dev#3634](feast-dev#3634)) ([d7041f4](feast-dev@d7041f4))
* Pin protobuf version to avoid seg fault on some machines ([028cc20](feast-dev@028cc20))
* Remove unwanted excessive splitting of gcs path, so expected gcs parquet paths are returned from BigQueryRetrievalJob.to_remote_storage() ([feast-dev#3730](feast-dev#3730)) ([f2c5988](feast-dev@f2c5988))
* Run store.plan() only when need it. ([feast-dev#3708](feast-dev#3708)) ([7bc7c47](feast-dev@7bc7c47))
* Saved datasets no longer break CLI registry-dump command ([feast-dev#3717](feast-dev#3717)) ([f28ccc2](feast-dev@f28ccc2))
* Update py3.8 ci requirements for cython 3.0 release ([feast-dev#3735](feast-dev#3735)) ([1695c13](feast-dev@1695c13))

### Features

* Enhance customization of Trino connections when using Trino-based Offline Stores ([feast-dev#3699](feast-dev#3699)) ([ed7535e](feast-dev@ed7535e))
* Implement gRPC server to ingest streaming features ([feast-dev#3687](feast-dev#3687)) ([a3fcd1f](feast-dev@a3fcd1f))
james-crabtree-sp pushed a commit to sailpoint/feast that referenced this pull request Sep 14, 2023
)

* Implemented gRPC server for ingesting streaming features.

Signed-off-by: mehmettokgoz <mehmet.tokgoz@hazelcast.com>
Signed-off-by: Danny C <d.chiao@gmail.com>
james-crabtree-sp pushed a commit to sailpoint/feast that referenced this pull request Sep 14, 2023
# [0.34.0](feast-dev/feast@v0.33.0...v0.34.0) (2023-09-07)

### Bug Fixes

* Add NUMERIC to bq_to_feast type map ([feast-dev#3719](feast-dev#3719)) ([6474b4b](feast-dev@6474b4b))
* Fix python unit tests ([feast-dev#3734](feast-dev#3734)) ([e81684d](feast-dev@e81684d))
* Handle unknown postgres source types gracefully ([feast-dev#3634](feast-dev#3634)) ([d7041f4](feast-dev@d7041f4))
* Pin protobuf version to avoid seg fault on some machines ([028cc20](feast-dev@028cc20))
* Remove unwanted excessive splitting of gcs path, so expected gcs parquet paths are returned from BigQueryRetrievalJob.to_remote_storage() ([feast-dev#3730](feast-dev#3730)) ([f2c5988](feast-dev@f2c5988))
* Run store.plan() only when need it. ([feast-dev#3708](feast-dev#3708)) ([7bc7c47](feast-dev@7bc7c47))
* Saved datasets no longer break CLI registry-dump command ([feast-dev#3717](feast-dev#3717)) ([f28ccc2](feast-dev@f28ccc2))
* Update py3.8 ci requirements for cython 3.0 release ([feast-dev#3735](feast-dev#3735)) ([1695c13](feast-dev@1695c13))

### Features

* Enhance customization of Trino connections when using Trino-based Offline Stores ([feast-dev#3699](feast-dev#3699)) ([ed7535e](feast-dev@ed7535e))
* Implement gRPC server to ingest streaming features ([feast-dev#3687](feast-dev#3687)) ([a3fcd1f](feast-dev@a3fcd1f))
cburroughs added a commit to cburroughs/feast that referenced this pull request Jan 24, 2024
PR feast-dev#3687 added a spiffy feature to ingest streaming features, but this
came along with a large batch of depdencies.  Notable this induces a
core dependency on `protobuf>=4.21.6` while Feast itself is on
`protobuf<4.23.4,>3.20`.  This is a fiddly narrow range and excludes
all 3.x uses.

Signed-off-by: Chris Burroughs <chris.burroughs@gmail.com>
zseta pushed a commit to zseta/feast that referenced this pull request Feb 7, 2024
)

* Implemented gRPC server for ingesting streaming features.

Signed-off-by: mehmettokgoz <mehmet.tokgoz@hazelcast.com>
Signed-off-by: Danny C <d.chiao@gmail.com>
Signed-off-by: Attila Toth <hello@attilatoth.dev>
zseta pushed a commit to zseta/feast that referenced this pull request Feb 7, 2024
# [0.34.0](feast-dev/feast@v0.33.0...v0.34.0) (2023-09-07)

### Bug Fixes

* Add NUMERIC to bq_to_feast type map ([feast-dev#3719](feast-dev#3719)) ([6474b4b](feast-dev@6474b4b))
* Fix python unit tests ([feast-dev#3734](feast-dev#3734)) ([e81684d](feast-dev@e81684d))
* Handle unknown postgres source types gracefully ([feast-dev#3634](feast-dev#3634)) ([d7041f4](feast-dev@d7041f4))
* Pin protobuf version to avoid seg fault on some machines ([028cc20](feast-dev@028cc20))
* Remove unwanted excessive splitting of gcs path, so expected gcs parquet paths are returned from BigQueryRetrievalJob.to_remote_storage() ([feast-dev#3730](feast-dev#3730)) ([f2c5988](feast-dev@f2c5988))
* Run store.plan() only when need it. ([feast-dev#3708](feast-dev#3708)) ([7bc7c47](feast-dev@7bc7c47))
* Saved datasets no longer break CLI registry-dump command ([feast-dev#3717](feast-dev#3717)) ([f28ccc2](feast-dev@f28ccc2))
* Update py3.8 ci requirements for cython 3.0 release ([feast-dev#3735](feast-dev#3735)) ([1695c13](feast-dev@1695c13))

### Features

* Enhance customization of Trino connections when using Trino-based Offline Stores ([feast-dev#3699](feast-dev#3699)) ([ed7535e](feast-dev@ed7535e))
* Implement gRPC server to ingest streaming features ([feast-dev#3687](feast-dev#3687)) ([a3fcd1f](feast-dev@a3fcd1f))

Signed-off-by: Attila Toth <hello@attilatoth.dev>
cburroughs added a commit to cburroughs/feast that referenced this pull request Feb 27, 2024
PR feast-dev#3687 added a spiffy feature to ingest streaming features, but this
came along with a large batch of depdencies.  Notable this induces a
core dependency on `protobuf>=4.21.6` while Feast itself is on
`protobuf<4.23.4,>3.20`.  This is a fiddly narrow range and excludes
all 3.x uses.

Signed-off-by: Chris Burroughs <chris.burroughs@gmail.com>
cburroughs added a commit to cburroughs/feast that referenced this pull request Mar 5, 2024
PR feast-dev#3687 added a spiffy feature to ingest streaming features, but this
came along with a large batch of depdencies.  Notable this induces a
core dependency on `protobuf>=4.21.6` while Feast itself is on
`protobuf<4.23.4,>3.20`.  This is a fiddly narrow range and excludes
all 3.x uses.

Signed-off-by: Chris Burroughs <chris.burroughs@gmail.com>
cburroughs added a commit to cburroughs/feast that referenced this pull request Mar 6, 2024
PR feast-dev#3687 added a spiffy feature to ingest streaming features, but this
came along with a large batch of depdencies.  Notable this induces a
core dependency on `protobuf>=4.21.6` while Feast itself is on
`protobuf<4.23.4,>3.20`.  This is a fiddly narrow range and excludes
all 3.x uses.

Signed-off-by: Chris Burroughs <chris.burroughs@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants