-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Support to include ID/PK in validation result for each row - SQL #6448
Conversation
…ithub.com/great-expectations/great_expectations into b/dx-67/bugfix-metrics-return-empty-value * 'b/dx-67/bugfix-metrics-return-empty-value' of https://github.com/great-expectations/great_expectations: [MAINTENANCE] Migrate additional methods from `BaseDataContext` to other parts of context hierarchy (#6388) [MAINTENANCE] move `zep` -> `experimental` package (#6378)
* develop: (22 commits) [BUGFIX] issue-4295-fix-issue (#6164) [DOCS] add boto3 explanations on document (#6407) [FEATURE] add multiple column metric (#6372) [MAINTENANCE] Small refactor (#6422) [MAINTENANCE] Sorting batch IDs and typehints clean up (#6421) [MAINTENANCE] Clean Up Type Hints and Minor Refactoring For Better Code Elegance/Readability (#6418) [MAINTENANCE] Implement `RendererConfiguration` (#6412) [BUGFIX] updated capitalone setup.py file (#6410) [FEATURE]: DataProfilerUnstructuredDataAssistant Integration (#6400) [FEATURE] add new metric - query template values (#5994) [MAINTENANCE] Cleanup For Better Code Elegance/Readability (#6406) [MAINTENANCE] ZEP - `GxConfig` cleanup (#6404) [MAINTENANCE] Migrate remaining methods from `BaseDataContext` (#6403) [BUGFIX] Patch key-generation issue with `DataContext.save_profiler()` (#6405) [MAINTENANCE] Migrate additional CRUD methods from `BaseDataContext` to `AbstractDataContext` (#6395) [MAINTENANCE] ZEP add yaml methods to all experimental models (#6401) [FEATURE] ZEP Config serialize as YAML (#6398) [MAINTENANCE] Remove call to verify_library_dependent_modules for pybigquery (#6394) [MAINTENANCE] Make "IDDict.to_id()" serialization more efficient. (#6389) [RELEASE] 0.15.34 (#6397) ...
👇 Click on the image for a new way to code review
Legend |
✅ Deploy Preview for niobium-lead-7998 ready!
To edit notification comments on pull requests, go to your Netlify site settings. |
* develop: [MAINTENANCE] Additional `sqlite` database fixture for `taxi_data` - All 2020 data in single table (#6455) [BUGFIX] Metrics return value no longer returns None for `unexpected_index_list` - Sql and Spark (#6392) [DOCS] add configuration of anonymous_usage_statistics for documentati… (#6293) [BUGFIX] Fix for `mssql` tests that depend on `datetime` to `string` conversion (#6449) [FEATURE] add multiple input metric (#6373) [CONTRIB] add expectation - check gaps in SCD tables (#6433) [CONTRIB] Add no days missing expectation (#6432) [CONTRIB] Feature/add two tables expectation (#6429) [CONTRIB] Add number of unique values expectation (#6425) [MAINTENANCE] Clean Up Variable Names In Test Modules, Type Hints, and Minor Refactoring For Better Code Elegance/Readability (#6444)
if dialect_name in ["sqlite", "trino", "mssql"]: | ||
params = (repr(compiled.params[name]) for name in compiled.positiontup) | ||
query_as_string = re.sub(r"\?", lambda m: next(params), str(compiled)) | ||
|
||
else: | ||
params = (repr(compiled.params[name]) for name in list(compiled.params.keys())) | ||
query_as_string = re.sub(r"%\(.*?\)s", lambda m: next(params), str(compiled)) | ||
|
||
# bigquery inserts extra '`' character for compiled statement. | ||
# clean up string before returning | ||
if dialect_name == "bigquery": | ||
query_as_string = re.sub(r"`", "", query_as_string) | ||
|
||
return query_as_string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return SQL query according to backend
|
||
# bigquery inserts extra '`' character for compiled statement. | ||
# clean up string before returning | ||
if dialect_name == "bigquery": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: adjust test to compare the query from bigquery, rather than adjust the query from bigquery to match the test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you for the synchronous review and for adding the tests for param substitution for each backend 🙇
* develop: (63 commits) [FEATURE] Support to include ID/PK in validation result for each row - SQL (#6448) [BUGFIX] Support slack channel name with webhook also (#6481) Query the database for datetime column splitter defaults (#6482) [MAINTENANCE] Move "Domain" to "great_expectations/core" to avoid circular imports; also add MetricConfiguration tests; and other clean up. (#6484) [MAINTENANCE] Reformat core expectation docstrings (#6423) [MAINTENANCE] Staging for build gallery (#6480) [MAINTENANCE] Move zep method from datasource to data asset. (#6477) [MAINTENANCE] Minor cleanup for better code readability (#6478) [MAINTENANCE] Misc updates to PR template (#6479) [CONTRIB] Add uniqueness expectation (#6473) [RELEASE] 0.15.36 (#6476) Add pretty representations for zep pydantic models (#6472) [BUGFIX] Contrib Expectation tracebacks (#6471) [BUGFIX] Add additional error checking to `ExpectationAnonymizer` (#6467) Add docstring for context.sources.add_postgres (#6459) [MAINTENANCE] fixing type hints in metrics utils module (#6469) [MAINTENANCE] Moving tutorials to great-expectations repo (#6464) [BUGFIX] Patch issue with call to `ExpectationAnonymizer` to ensure `DataContext` init events are captured (#6458) [BUGFIX] Support Table and Column Names Case Non-Sensitivity Relationship Between Snowflake, Oracle, DB2, etc. DBMSs (Upper Case) and SQLAlchemy (Lower Case) Representations (#6450) Add sorters to zep postgres datasource. (#6456) ...
Changes proposed in this pull request:
SQL implementation of
unexpected_index_column
s value that allows users to specify a primary key (PK) column for identifying rows that failed an Expectation (usually returned as part of theunexpected_index_list
). The PR also enablesunexpected_index_query
to be returned to the user, whichChanges were made to the
map_metric_provider
to take in the parameter fromresult_format
and outputunexpected_index_list
as key-value pairs of the primary key column.Note : This is only the SQL implementation, Pandas has been merged already, and Spark to follow.
#3195
What has changed?
unexpected_index_list
andunexpected_index_query
was added as validation dependencies forSqlAlchemyExecutionEngine
_sqlalchemy_map_condition_index
was added as MapMetric.unexpected_index_column_names
. Note Unlike Pandas, the current SQL implementation does not have a default index that will be returned ifunexpected_index_column_names
is not specified (In other words, in order to seeunexpected_index_list
, the user must specifyunexpected_index_column_names
)_sqlalchemy_map_condition_query
was added as MapMetric.sql_post_compile_to_string()
: Used by the_sqlalchemy_map_condition_query()
method to compile SQL select statement with post-compile parameters into a string. Logic lifted directly from the sqlalchemy documentation documentation.get_sqlalchemy_source_table_and_schema_selectable()
: Used by_sqlalchemy_map_condition_query()
metric function to return the table associated with the currentBatch
, rather than thetemp_table
that is created as part of running the query.Can you give me an Example? (heavily adopted from the Pandas example)
Given the following table
animal_names
We could run the
expect_column_values_to_be_in_set
Expectation on theanimals
column with["cat", "fish", "dog"]
as thevalue_set
(ie domestic animals).After running the ExpectationConfiguration, we would expect the
unexpected_index_list
to be["giraffe", "lion", "zebra"]
which correspond to the indices of4
,5
, and6
, the values that are not in thevalue_set
.This PR enables the following configuration, which sets
unexpected_index_columns
to bepk_1
.After running this new ExpectationConfiguration, we expect the
unexpected_index_list
to be[{"pk_1": 3}, {"pk_1": 4}, {"pk_1": 5}]
which correspond to the values in thepk_1
column of the indices of["giraffe", "lion", "zebra"]
.What if I have a lot of rows in my table?
In order to retrieve the full list unexpected values from the table, the result also contains a
unexpected_index_query
, which can be copied into a db client to retrieve all the unexpected rows.The query (along with
unexpected_index_list
) is returned as part of the result values returned by theValidator
.Definition of Done
Please delete options that are not relevant.
Thank you for submitting!