Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept Pandas dataframe as input for historical feature retrieval #1071

Merged

Conversation

khorshuheng
Copy link
Collaborator

@khorshuheng khorshuheng commented Oct 19, 2020

Signed-off-by: Khor Shu Heng khor.heng@gojek.com

What this PR does / why we need it:
Currently, historical feature retrieval accepts only Datasource as input. This PR enables the user to supply a Panda dataframe instead, which will then be uploaded to a staging location.

Staging location configuration for the different spark clusters are also merged into a single key in this PR.

Which issue(s) this PR fixes:

Fixes #

Does this PR introduce a user-facing change?:

Added support for supplying Pandas DataFrames when running historical retrieval

Signed-off-by: Khor Shu Heng <khor.heng@gojek.com>
Signed-off-by: Khor Shu Heng <khor.heng@gojek.com>
@khorshuheng
Copy link
Collaborator Author

/test test-end-to-end-auth

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: khorshuheng, woop

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@woop
Copy link
Member

woop commented Oct 19, 2020

/lgtm

@woop woop added the kind/feature New feature or request label Oct 19, 2020
@feast-ci-bot feast-ci-bot merged commit 029d9ab into feast-dev:master Oct 19, 2020
"event_timestamp",
"created_timestamp",
ParquetFormat(),
entity_staging_uri.path,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be full url

entity_source.to_parquet(df_export_path.name)
bucket = (
None
if entity_staging_uri.scheme == "fs"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's fs scheme? maybe file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants