-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Directory Asset BatchDefinition API #9874
Conversation
✅ Deploy Preview for niobium-lead-7998 canceled.
|
This reverts commit 362b4f0.
d0a2e95
to
82890a3
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #9874 +/- ##
===========================================
- Coverage 78.25% 77.99% -0.26%
===========================================
Files 484 495 +11
Lines 42394 42537 +143
===========================================
+ Hits 33174 33176 +2
- Misses 9220 9361 +141
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
from __future__ import annotations | ||
|
||
from typing import Type | ||
|
||
from great_expectations.datasource.fluent.data_asset.path.file_asset import FileDataAsset | ||
from great_expectations.datasource.fluent.dynamic_pandas import _generate_pandas_data_asset_models | ||
|
||
_PANDAS_FILE_TYPE_READER_METHOD_UNSUPPORTED_LIST = ( | ||
# "read_csv", | ||
# "read_json", | ||
# "read_excel", | ||
# "read_parquet", | ||
"read_clipboard", # not path based | ||
# "read_feather", | ||
# "read_fwf", | ||
"read_gbq", # not path based | ||
# "read_hdf", | ||
# "read_html", | ||
# "read_orc", | ||
# "read_pickle", | ||
# "read_sas", # invalid json schema | ||
# "read_spss", | ||
"read_sql", # not path based & type-name conflict | ||
"read_sql_query", # not path based | ||
"read_sql_table", # not path based | ||
"read_table", # type-name conflict | ||
# "read_xml", | ||
) | ||
_FILE_PATH_ASSET_MODELS = _generate_pandas_data_asset_models( | ||
FileDataAsset, | ||
blacklist=_PANDAS_FILE_TYPE_READER_METHOD_UNSUPPORTED_LIST, | ||
use_docstring_from_method=True, | ||
skip_first_param=True, | ||
) | ||
CSVAsset: Type[FileDataAsset] = _FILE_PATH_ASSET_MODELS.get("csv", FileDataAsset) | ||
ExcelAsset: Type[FileDataAsset] = _FILE_PATH_ASSET_MODELS.get("excel", FileDataAsset) | ||
FWFAsset: Type[FileDataAsset] = _FILE_PATH_ASSET_MODELS.get("fwf", FileDataAsset) | ||
JSONAsset: Type[FileDataAsset] = _FILE_PATH_ASSET_MODELS.get("json", FileDataAsset) | ||
ORCAsset: Type[FileDataAsset] = _FILE_PATH_ASSET_MODELS.get("orc", FileDataAsset) | ||
ParquetAsset: Type[FileDataAsset] = _FILE_PATH_ASSET_MODELS.get("parquet", FileDataAsset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename module to generated_assets.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is prework to add a fluent-style batch definition API to directory data assets. In order to support this,
_FilePathDataAsset
and direct descendents have been refactored in order to allow regex-based assets to use RegexPartitioners, and directory-based assets to use column partitioners.SparkPartitioners
have been brought back asDataframePartitioners
.other refactors
_FilePathDataAsset
has been renamed toPathDataAsset
great_expectations.datasource.fluent.data_asset.data_connector
package has been moved out ofdata_asset
to thefluent
package.spark_file_path_datasource.py
and into thedata_asset.spark
packagepandas_file_path_datasource.py
and into thedata_asset.pandas
package