Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] DataAsset uses partitioner from BatchConfig #9499

Merged

Conversation

joshua-stauffer
Copy link
Member

@joshua-stauffer joshua-stauffer commented Feb 20, 2024

This PR updates the DataAsset to use Partitioners defined in the BatchRequest rather than the Partitioner defined on the DataAsset. This change is required since a single asset can now have multiple Partitioners, as defined by multiple BatchConfigs.

changes

  • deprecates DataAsset.batch_request_options. Functionality is now available BatchConfig.get_batch_request_options(), which invokes DataAsset.get_batch_request_options_keys(...). This has been changed from a property to a method because it potentially triggers a query of the data.

  • PartitionOnConvertedDatetime is moved alongside the rest of sql partitioners to allow it to join the SqlPartitioner union, with the expectation that it will be implemented for other sql backends in the future. This is a workaround to pydantic not allowing child classes to override types from parent classes. Specifically, SqliteTableAsset and SqliteQueryAsset need to be able to add their implementation of PartitionerConvertedDateTime to the _partitioner_implementation_map.

  • Description of PR changes above includes a link to an existing GitHub issue

  • PR title is prefixed with one of: [BUGFIX], [FEATURE], [DOCS], [MAINTENANCE], [CONTRIB]

  • Code is linted - run invoke lint (uses black + ruff)

  • Appropriate tests and docs have been updated

For more information about contributing, see Contribute.

After you submit your PR, keep the page open and monitor the statuses of the various checks made by our continuous integration process at the bottom of the page. Please fix any issues that come up and reach out on Slack if you need help. Thanks for contributing!

Copy link

netlify bot commented Feb 20, 2024

Deploy Preview for niobium-lead-7998 canceled.

Name Link
🔨 Latest commit 9d42d57
🔍 Latest deploy log https://app.netlify.com/sites/niobium-lead-7998/deploys/65dcb97d24162a0008ab41d7

@ghost
Copy link

ghost commented Feb 20, 2024

👇 Click on the image for a new way to code review

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map legend

method_name: Literal[
"partition_on_converted_datetime"
] = "partition_on_converted_datetime"
date_format_string: str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do any validation on this string?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current behavior is to not validate the string, so i think it's likely okay to not. I'm not 100% how we would validate it, either, since its correctness depends on the shape of the source data.

raise ValueError(
f"Requested Partitioner `{abstract_partitioner.method_name}` is not implemented for this DataAsset. "
)
assert PartitionerClass is not None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this assert?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i originally added this because mypy wasn't correctly catching that the exception above narrows the type to non null, but i'll double check 🙇

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed in 4b0651e

@@ -45,5 +45,10 @@ def build_batch_request(
options=batch_request_options, partitioner=self.partitioner
)

def batch_request_option_keys(self) -> tuple[str, ...]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit - should this be prefixed with get_ for consistency?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll remove this method entirely. it was meant as a user-facing replacement for the asset property batch_request_options, but let's evaluate if we actually want this, and then add it as a followup PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed in 9d42d57

).add_partitioner_column_value("event_type")

)
# add_partitioner_column_value("event_type")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be updated to add an actual partitioner?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can no longer add partitioners at the asset level, so it needs to be added where this fixture is used. i can remove this 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed in 43888ed

@joshua-stauffer joshua-stauffer added this pull request to the merge queue Feb 26, 2024
Merged via the queue into develop with commit ea65c96 Feb 26, 2024
67 checks passed
@joshua-stauffer joshua-stauffer deleted the f/v1-175/asset_uses_partitioner_from_batch_request branch February 26, 2024 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants