Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Batch definition sorting #9720

Merged
merged 42 commits into from Apr 12, 2024

Conversation

tyler-hoffman
Copy link
Contributor

@tyler-hoffman tyler-hoffman commented Apr 8, 2024

Overview

Adds sort_batches_ascending property to Partitioners, including both the (soon to be removed) generic ones, as well as the asset-specific ones.

Most of the changes here are around schemas because of ^.

Important things to look for

  • DataAsset::sort_batches has been updated to just look at the partitioner tuples. I might circle back on this to make the logic a bit simpler, but for now left it with minimal changes.
  • I moved / updated some tests to separate test files specific to sql and file paths around get_batch_list_from_batch_request

What this does not do

This PR does not touch DataAsset::add_sorters or DataAsset::order_by that it sets. I plan to remove that next, but this PR is already huge enough.

  • Description of PR changes above includes a link to an existing GitHub issue
  • PR title is prefixed with one of: [BUGFIX], [FEATURE], [DOCS], [MAINTENANCE], [CONTRIB]
  • Code is linted - run invoke lint (uses ruff format + ruff check)
  • Appropriate tests and docs have been updated

For more information about contributing, see Contribute.

After you submit your PR, keep the page open and monitor the statuses of the various checks made by our continuous integration process at the bottom of the page. Please fix any issues that come up and reach out on Slack if you need help. Thanks for contributing!

Copy link

netlify bot commented Apr 8, 2024

Deploy Preview for niobium-lead-7998 canceled.

Name Link
🔨 Latest commit ec4e8bc
🔍 Latest deploy log https://app.netlify.com/sites/niobium-lead-7998/deploys/66197a9b7ffd210008a9ce9f

@tyler-hoffman tyler-hoffman changed the title F/v1 21/batch definition sorting [FEATURE] Batch definition sorting Apr 9, 2024
Copy link

codecov bot commented Apr 10, 2024

Codecov Report

Attention: Patch coverage is 81.81818% with 6 lines in your changes are missing coverage. Please review.

Project coverage is 82.69%. Comparing base (3e5c57d) to head (ec4e8bc).

Files Patch % Lines
great_expectations/datasource/fluent/interfaces.py 72.22% 5 Missing ⚠️
...pectations/datasource/fluent/invalid_datasource.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #9720      +/-   ##
===========================================
+ Coverage    82.64%   82.69%   +0.04%     
===========================================
  Files          512      512              
  Lines        46782    46813      +31     
===========================================
+ Hits         38663    38712      +49     
+ Misses        8119     8101      -18     
Flag Coverage Δ
3.10 65.00% <72.72%> (+0.01%) ⬆️
3.10 athena or clickhouse or openpyxl or pyarrow or project or sqlite or aws_creds ?
3.10 aws_deps ?
3.10 big ?
3.10 databricks ?
3.10 filesystem ?
3.10 mssql ?
3.10 mysql ?
3.10 postgresql ?
3.10 snowflake ?
3.10 spark ?
3.10 trino ?
3.11 65.00% <72.72%> (+0.01%) ⬆️
3.11 athena or clickhouse or openpyxl or pyarrow or project or sqlite or aws_creds 53.80% <69.69%> (+0.01%) ⬆️
3.11 aws_deps 48.89% <66.66%> (+<0.01%) ⬆️
3.11 big 64.01% <63.63%> (+<0.01%) ⬆️
3.11 databricks 48.08% <63.63%> (+<0.01%) ⬆️
3.11 filesystem 63.77% <81.81%> (+0.08%) ⬆️
3.11 mssql 47.29% <63.63%> (+0.01%) ⬆️
3.11 mysql 47.34% <63.63%> (+0.01%) ⬆️
3.11 postgresql 54.08% <69.69%> (+<0.01%) ⬆️
3.11 snowflake 48.60% <63.63%> (+<0.01%) ⬆️
3.11 spark 60.47% <66.66%> (+<0.01%) ⬆️
3.11 trino 53.71% <63.63%> (+<0.01%) ⬆️
3.8 65.01% <72.72%> (+0.01%) ⬆️
3.8 athena or clickhouse or openpyxl or pyarrow or project or sqlite or aws_creds 53.80% <69.69%> (+0.01%) ⬆️
3.8 aws_deps 48.90% <66.66%> (+<0.01%) ⬆️
3.8 big 64.01% <63.63%> (+<0.01%) ⬆️
3.8 databricks 48.09% <63.63%> (+<0.01%) ⬆️
3.8 filesystem 63.77% <78.78%> (+0.08%) ⬆️
3.8 mssql 47.27% <63.63%> (+0.01%) ⬆️
3.8 mysql 47.32% <63.63%> (+0.01%) ⬆️
3.8 postgresql 54.07% <69.69%> (+<0.01%) ⬆️
3.8 snowflake 48.62% <63.63%> (+<0.01%) ⬆️
3.8 spark 60.43% <66.66%> (+<0.01%) ⬆️
3.8 trino 53.70% <63.63%> (+<0.01%) ⬆️
3.9 65.00% <72.72%> (+0.01%) ⬆️
3.9 athena or clickhouse or openpyxl or pyarrow or project or sqlite or aws_creds ?
3.9 aws_deps ?
3.9 big ?
3.9 databricks ?
3.9 filesystem ?
3.9 mssql ?
3.9 mysql ?
3.9 postgresql ?
3.9 snowflake ?
3.9 spark ?
3.9 trino ?
cloud 0.00% <0.00%> (ø)
docs-basic 54.36% <72.72%> (+0.01%) ⬆️
docs-creds-needed 54.93% <72.72%> (+0.01%) ⬆️
docs-spark 54.46% <72.72%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tyler-hoffman tyler-hoffman marked this pull request as ready for review April 10, 2024 14:39
Copy link
Contributor

@billdirks billdirks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM! A couple very minor comments.

@tyler-hoffman tyler-hoffman added this pull request to the merge queue Apr 12, 2024
@tyler-hoffman tyler-hoffman removed this pull request from the merge queue due to a manual request Apr 12, 2024
@tyler-hoffman tyler-hoffman added this pull request to the merge queue Apr 12, 2024
Merged via the queue into develop with commit 5bc2545 Apr 12, 2024
69 of 70 checks passed
@tyler-hoffman tyler-hoffman deleted the f/v1_21/batch-definition-sorting branch April 12, 2024 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants