Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Feast AWS Athena offline store (again) #3044

Merged
merged 11 commits into from Aug 10, 2022

Conversation

toping4445
Copy link
Collaborator

What this PR does / why we need it:

It enables Feast users to use S3+AWS Athena as an offline store.

  1. All unit tests passed (except for tests in test_sql_registry.py.)
    The tests above failed in my environment because of MySQLdb package-M1 compatibility issues.
  2. some integration tests passed. (passed:109, failed:19, ignored:16)
    I didn't implement some methods related to feature write. and tests with fixed S3 bucket name and related to GCP failed.
    However, all tests in test_universal_historical_retrieval.py and test_univeral_types.py related to feature extraction have passed.

Which issue(s) this PR fixes:

Fixes #

Does this PR introduce a user-facing change?:

AWS Users can choose between Redshift and S3+Athena for an offline store.

@adchia
Copy link
Collaborator

adchia commented Aug 9, 2022

I noticed there were a bunch of weird merge artifacts so I fixed them. Also removed Athena from the integration tests that run as part of CI since this is a contrib plugin.

can you sign your commits? None of them seem signed.

Also: I have no way of reproducing the test results because I don't have Athena setup. It's ok for this PR, but maybe leave a comment in the Makefile to specify that it needs the user to have their own custom athena setup?

@adchia adchia self-assigned this Aug 9, 2022
@codecov-commenter
Copy link

codecov-commenter commented Aug 9, 2022

Codecov Report

Merging #3044 (ce1710c) into master (63d541d) will increase coverage by 8.08%.
The diff coverage is 39.95%.

@@            Coverage Diff             @@
##           master    #3044      +/-   ##
==========================================
+ Coverage   67.64%   75.72%   +8.08%     
==========================================
  Files         167      202      +35     
  Lines       14696    16776    +2080     
==========================================
+ Hits         9941    12704    +2763     
+ Misses       4755     4072     -683     
Flag Coverage Δ
integrationtests 67.05% <39.51%> (-0.59%) ⬇️
unittests 58.13% <39.72%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdk/python/feast/batch_feature_view.py 100.00% <ø> (+13.33%) ⬆️
sdk/python/feast/repo_config.py 88.80% <ø> (+5.40%) ⬆️
sdk/python/feast/type_map.py 61.72% <15.38%> (-1.55%) ⬇️
sdk/python/feast/infra/utils/aws_utils.py 62.85% <30.13%> (-14.03%) ⬇️
...line_stores/contrib/athena_offline_store/athena.py 38.12% <38.12%> (ø)
...ores/contrib/athena_offline_store/athena_source.py 45.61% <45.61%> (ø)
.../contrib/athena_offline_store/tests/data_source.py 48.83% <48.83%> (ø)
...ffline_stores/contrib/athena_repo_configuration.py 50.00% <50.00%> (ø)
sdk/python/feast/__init__.py 85.71% <100.00%> (+0.52%) ⬆️
sdk/python/feast/data_source.py 81.68% <100.00%> (+23.75%) ⬆️
... and 107 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

toping4445 and others added 10 commits August 10, 2022 02:39
Signed-off-by: Youngkyu OH <toping4445@gmail.com>
…al_retrieval - 100% passed

Signed-off-by: Youngkyu OH <toping4445@gmail.com>
Signed-off-by: Youngkyu OH <toping4445@gmail.com>
Signed-off-by: Youngkyu OH <toping4445@gmail.com>
…bucket_name hardcoding to variable in AthenaDataSourceCreator

Signed-off-by: Youngkyu OH <toping4445@gmail.com>
Signed-off-by: Youngkyu OH <toping4445@gmail.com>
Signed-off-by: Danny Chiao <danny@tecton.ai>
Signed-off-by: Danny Chiao <danny@tecton.ai>
Signed-off-by: Danny Chiao <danny@tecton.ai>
Signed-off-by: Youngkyu OH <toping4445@gmail.com>
@adchia
Copy link
Collaborator

adchia commented Aug 9, 2022

/ok-to-test

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adchia, toping4445

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Youngkyu OH <toping4445@gmail.com>
@adchia
Copy link
Collaborator

adchia commented Aug 10, 2022

it’d be good to have some documentation, but this is good for an initial cut. Thanks!

@adchia
Copy link
Collaborator

adchia commented Aug 10, 2022

/lgtm

@toping4445
Copy link
Collaborator Author

Is there a documentation guide? If you share it, I will refer to it and fill it out when I have time.

@feast-ci-bot feast-ci-bot merged commit 989ce08 into feast-dev:master Aug 10, 2022
franciscojavierarceo pushed a commit to franciscojavierarceo/feast that referenced this pull request Aug 13, 2022
* fixed bugs, cleaned code, added AthenaDataSourceCreator

Signed-off-by: Youngkyu OH <toping4445@gmail.com>

* fixed bugs, cleaned code, added some methods. test_universal_historical_retrieval - 100% passed

Signed-off-by: Youngkyu OH <toping4445@gmail.com>

* fixed bugs to pass test_validation

Signed-off-by: Youngkyu OH <toping4445@gmail.com>

* changed boolean data type mapping

Signed-off-by: Youngkyu OH <toping4445@gmail.com>

* 1.added test-python-universal-athena in Makefile  2.replaced database,bucket_name hardcoding to variable in AthenaDataSourceCreator

Signed-off-by: Youngkyu OH <toping4445@gmail.com>

* format,run lint

Signed-off-by: Youngkyu OH <toping4445@gmail.com>

* revert merge changes

Signed-off-by: Danny Chiao <danny@tecton.ai>

* add entity_key_serialization

Signed-off-by: Danny Chiao <danny@tecton.ai>

* restore deleted file

Signed-off-by: Danny Chiao <danny@tecton.ai>

* modified confusing environment variable names, added how to use Athena

Signed-off-by: Youngkyu OH <toping4445@gmail.com>

* enforce AthenaSource to have a name

Signed-off-by: Youngkyu OH <toping4445@gmail.com>

Co-authored-by: toping4445 <yelo.blood@kakaopaycorp.com>
Co-authored-by: Danny Chiao <danny@tecton.ai>
Signed-off-by: Francisco Javier Arceo <arceofrancisco@gmail.com>
kevjumba pushed a commit that referenced this pull request Aug 25, 2022
# [0.24.0](v0.23.0...v0.24.0) (2022-08-25)

### Bug Fixes

* Check if on_demand_feature_views is an empty list rather than None for snowflake provider ([#3046](#3046)) ([9b05e65](9b05e65))
* FeatureStore.apply applies BatchFeatureView correctly ([#3098](#3098)) ([41be511](41be511))
* Fix Feast Java inconsistency with int64 serialization vs python ([#3031](#3031)) ([4bba787](4bba787))
* Fix feature service inference logic ([#3089](#3089)) ([4310ed7](4310ed7))
* Fix field mapping logic during feature inference ([#3067](#3067)) ([cdfa761](cdfa761))
* Fix incorrect on demand feature view diffing and improve Java tests ([#3074](#3074)) ([0702310](0702310))
* Fix Java helm charts to work with refactored logic. Fix FTS image ([#3105](#3105)) ([2b493e0](2b493e0))
* Fix on demand feature view output in feast plan + Web UI crash ([#3057](#3057)) ([bfae6ac](bfae6ac))
* Fix release workflow to release 0.24.0 ([#3138](#3138)) ([a69aaae](a69aaae))
* Fix Spark offline store type conversion to arrow ([#3071](#3071)) ([b26566d](b26566d))
* Fixing Web UI, which fails for the SQL registry ([#3028](#3028)) ([64603b6](64603b6))
* Force Snowflake Session to Timezone UTC ([#3083](#3083)) ([9f221e6](9f221e6))
* Make infer dummy entity join key idempotent ([#3115](#3115)) ([1f5b1e0](1f5b1e0))
* More explicit error messages ([#2708](#2708)) ([e4d7afd](e4d7afd))
* Parse inline data sources ([#3036](#3036)) ([c7ba370](c7ba370))
* Prevent overwriting existing file during `persist` ([#3088](#3088)) ([69af21f](69af21f))
* Register BatchFeatureView in feature repos correctly ([#3092](#3092)) ([b8e39ea](b8e39ea))
* Return an empty infra object from sql registry when it doesn't exist ([#3022](#3022)) ([8ba87d1](8ba87d1))
* Teardown tables for Snowflake Materialization testing ([#3106](#3106)) ([0a0c974](0a0c974))
* UI error when saved dataset is present in registry. ([#3124](#3124)) ([83cf753](83cf753))
* Update sql.py ([#3096](#3096)) ([2646a86](2646a86))
* Updated snowflake template ([#3130](#3130)) ([f0594e1](f0594e1))

### Features

* Add authentication option for snowflake connector ([#3039](#3039)) ([74c75f1](74c75f1))
* Add Cassandra/AstraDB online store contribution ([#2873](#2873)) ([feb6cb8](feb6cb8))
* Add Snowflake materialization engine ([#2948](#2948)) ([f3b522b](f3b522b))
* Adding saved dataset capabilities for Postgres  ([#3070](#3070)) ([d3253c3](d3253c3))
* Allow passing repo config path via flag ([#3077](#3077)) ([0d2d951](0d2d951))
* Contrib azure provider with synapse/mssql offline store and Azure registry store ([#3072](#3072)) ([9f7e557](9f7e557))
* Custom Docker image for Bytewax batch materialization ([#3099](#3099)) ([cdd1b07](cdd1b07))
* Feast AWS Athena offline store (again) ([#3044](#3044)) ([989ce08](989ce08))
* Implement spark offline store `offline_write_batch` method ([#3076](#3076)) ([5b0cc87](5b0cc87))
* Initial Bytewax materialization engine ([#2974](#2974)) ([55c61f9](55c61f9))
* Refactor feature server helm charts to allow passing feature_store.yaml in environment variables ([#3113](#3113)) ([85ee789](85ee789))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants