Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add online_read_async for dynamodb #4244

Merged
merged 16 commits into from
Jun 5, 2024

Conversation

robhowley
Copy link
Contributor

@robhowley robhowley commented May 31, 2024

What this PR does / why we need it:

  • adds aiobotocore to aws dependencies
  • adds online_read_async impl for dynamdb online reader
  • the use of the standard blocking boto client prevents the feast sdk from being able to handle real time web traffic
  • most of the diff is in the py3.X-[ci-]requirements.txt files

Which issue(s) this PR fixes:

Fixes

robhowley and others added 6 commits May 31, 2024 09:12
Signed-off-by: robhowley <rhowley@seatgeek.com>
Signed-off-by: robhowley <rhowley@seatgeek.com>
Signed-off-by: robhowley <rhowley@seatgeek.com>
Signed-off-by: robhowley <rhowley@seatgeek.com>
feat: online_read_async for dynamodb
@robhowley robhowley changed the title feat: online_read_async for dynamodb feat: Add online_read_async for dynamodb May 31, 2024
@robhowley robhowley marked this pull request as draft May 31, 2024 13:36
Signed-off-by: robhowley <rhowley@seatgeek.com>
@@ -476,7 +476,7 @@ def test_online_retrieval_with_event_timestamps(environment, universal_data_sour


@pytest.mark.integration
@pytest.mark.universal_online_stores(only=["redis"])
@pytest.mark.universal_online_stores(only=["redis", "dynamodb"])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add dynamo to the async online store tests

@robhowley robhowley marked this pull request as ready for review May 31, 2024 13:45
@robhowley
Copy link
Contributor Author

robhowley commented May 31, 2024

fyi @tokoko @breno-costa

robhowley added 2 commits May 31, 2024 11:18
Signed-off-by: robhowley <rhowley@seatgeek.com>
Signed-off-by: robhowley <rhowley@seatgeek.com>
@robhowley robhowley marked this pull request as draft May 31, 2024 15:23
Signed-off-by: robhowley <rhowley@seatgeek.com>
robhowley added 3 commits May 31, 2024 13:20
Signed-off-by: robhowley <rhowley@seatgeek.com>
Signed-off-by: robhowley <rhowley@seatgeek.com>
…response

Signed-off-by: robhowley <rhowley@seatgeek.com>
@@ -20,32 +20,38 @@ charset-normalizer==3.3.2
# via requests
click==8.1.7
# via
# feast (setup.py)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious, what command did you run to lock these files? I'm not sure why this feast (setup.py) lines are being added here...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make lock-python-dependencies PYTHON=3.xx

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@robhowley
Copy link
Contributor Author

robhowley commented May 31, 2024

unclear what the failure in pr-local-integration-tests / integration-test-python-local is all about since it's unrelated to my changes. it has flaked on me a few times, is there a way to just rerun it?

OSError: Generic S3 error: Error after 0 retries in 19.835973ms, max_retries:10, 
retry_timeout:180s, source:error sending request for url (
http://localhost:32798/test/2622baae-5bf8-42f0-b485-15b338730efd/integration_test_gh_run_9323007162_e7eafe_5258.customer_profile/_delta_log/_last_checkpoint): 
connection error: Connection reset by peer (os error 104)

@franciscojavierarceo im assuming you have perms for a rerun? 🙏

@robhowley robhowley marked this pull request as ready for review May 31, 2024 19:21
Comment on lines +292 to +297
def to_tbl_resp(raw_client_response):
return {
"entity_id": deserialize(raw_client_response["entity_id"]),
"event_ts": deserialize(raw_client_response["event_ts"]),
"values": deserialize(raw_client_response["values"]),
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the dynamo client returns stuff in the form

{ "field_name": {"TYPE": "value"}}

as opposed to the resource which just returns

{ "field_name": value}

dynamodb has a deserializer utility to do the conversion

@breno-costa
Copy link
Contributor

@robhowley I created a benchmark repo to test redis async implementation. If you want to make use of it to test your changes. You should be able to install your Feast version here and configure the dynamodb here.

@franciscojavierarceo
Copy link
Member

I reran and it looks like a bucket issue failed. will have to dig into this more later. CC @lokeshrangineni @jeremyary @tokoko

return self._aioboto_session

def _get_aiodynamodb_client(self, region: str):
return self._get_aioboto_session().create_client("dynamodb", region_name=region)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not too sure about this, but is it a good idea to recreate client object on every call? Isn't there a performance penalty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the docs demonstrate and recommend usage within an async context manager. it is possible manually work with _exit_stack.enter_async_context at app startup/shutdown but wanted to avoid using protected methods and adding app complexity without first knowing it was really needed. seemed like an "as needed follow up" type of investigation.

@tokoko
Copy link
Collaborator

tokoko commented Jun 2, 2024

Definitely flaky, this just errored out other docs-only PR as well. I think we were missing this before because test-python-integration-local status code wasn't being reported properly due to docker info checking.

@robhowley Can you comment out this line here for now? I'll try to reenable it later...

Signed-off-by: robhowley <rhowley@seatgeek.com>
Signed-off-by: robhowley <rhowley@seatgeek.com>
setup.py Outdated
@@ -84,7 +84,7 @@
"hiredis>=2.0.0,<3",
]

AWS_REQUIRED = ["boto3>=1.17.0,<2", "docker>=5.0.2", "fsspec<=2024.1.0"]
AWS_REQUIRED = ["boto3>=1.17.0,<2", "docker>=5.0.2", "fsspec<=2024.1.0", "aiobotocore>=2.13.0"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the lower bound necessary here? maybe just put a >2,<3 unless there's some specific reason

Signed-off-by: robhowley <rhowley@seatgeek.com>
@robhowley robhowley requested a review from tokoko June 5, 2024 15:31
Copy link
Collaborator

@tokoko tokoko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks

@franciscojavierarceo franciscojavierarceo merged commit b5ef384 into feast-dev:master Jun 5, 2024
16 checks passed
shuchu pushed a commit to shuchu/feast that referenced this pull request Jun 8, 2024
franciscojavierarceo pushed a commit that referenced this pull request Jun 18, 2024
# [0.39.0](v0.38.0...v0.39.0) (2024-06-18)

### Bug Fixes

* Feast UI importlib change ([#4248](#4248)) ([5d486b8](5d486b8))
* Feature server no_feature_log argument error ([#4255](#4255)) ([15524ce](15524ce))
* Feature UI Server image won't start in an OpenShift cluster ([#4250](#4250)) ([4891f76](4891f76))
* Handles null values in data during GO Feature retrieval ([#4274](#4274)) ([c491e57](c491e57))
* Make Java gRPC client use timeouts as expected ([#4237](#4237)) ([f5a37c1](f5a37c1))
* Remove self assignment code line. ([#4238](#4238)) ([e514f66](e514f66))
* Set default values for feature_store.serve() function ([#4225](#4225)) ([fa74438](fa74438))

### Features

* Add online_read_async for dynamodb ([#4244](#4244)) ([b5ef384](b5ef384))
* Add the ability to list objects by `tags` ([#4246](#4246)) ([fbf92da](fbf92da))
* Added deadline to gRPC Java client ([#4217](#4217)) ([ff429c9](ff429c9))
* Adding vector search for sqlite ([#4176](#4176)) ([2478831](2478831))
* Change get_online_features signature, move online retrieval functions to utils ([#4278](#4278)) ([7287662](7287662))
* Feature/adding remote online store ([#4226](#4226)) ([9454d7c](9454d7c))
* List all feature views ([#4256](#4256)) ([36a574d](36a574d))
* Make RegistryServer writable ([#4231](#4231)) ([79e1143](79e1143))
* Remote offline Store  ([#4262](#4262)) ([28a3d24](28a3d24))
* Set optional full-scan for deletion ([#4189](#4189)) ([b9cadd5](b9cadd5))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants