GE errors when trying to return a result when contains a pandas NA #2029

isichei · 2020-11-02T18:30:27Z

Describe the bug

If asking GE for a complete result when expect_column_values_to_not_be_null. GE will try to return a result which contains samples of the values which failed the expectation (if result is set to COMPLETE). It fails when one of the values is the new pandas NA type (GE internals fails to convert it).

To Reproduce

import great_expectations as ge
import pandas as pd

df = pd.DataFrame({"a": [1, pd.NA, 3]})
dfe = ge.dataset.PandasDataset(df)
dfe.expect_column_values_to_not_be_null("a", result_format="COMPLETE") # Will error

Error message:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "env/lib/python3.8/site-packages/great_expectations/core/__init__.py", line 2000, in __repr__
    return json.dumps(self.to_json_dict(), indent=2)
  File "env/lib/python3.8/site-packages/great_expectations/core/__init__.py", line 2026, in to_json_dict
    myself = expectationValidationResultSchema.dump(self)
  File "env/lib/python3.8/site-packages/great_expectations/marshmallow__shade/schema.py", line 556, in dump
    processed_obj = self._invoke_dump_processors(
  File "env/lib/python3.8/site-packages/great_expectations/marshmallow__shade/schema.py", line 1075, in _invoke_dump_processors
    data = self._invoke_processors(
  File "env/lib/python3.8/site-packages/great_expectations/marshmallow__shade/schema.py", line 1234, in _invoke_processors
    data = processor(data, many=many, **kwargs)
  File "env/lib/python3.8/site-packages/great_expectations/core/__init__.py", line 2096, in convert_result_to_serializable
    data.result = convert_to_json_serializable(data.result)
  File "env/lib/python3.8/site-packages/great_expectations/core/__init__.py", line 132, in convert_to_json_serializable
    new_dict[str(key)] = convert_to_json_serializable(data[key])
  File "env/lib/python3.8/site-packages/great_expectations/core/__init__.py", line 139, in convert_to_json_serializable
    new_list.append(convert_to_json_serializable(val))
  File "env/lib/python3.8/site-packages/great_expectations/core/__init__.py", line 195, in convert_to_json_serializable
    raise TypeError(
TypeError: <NA> is of type NAType which cannot be serialized.

Expected behavior
To return a result and not fail. Maybe with pd.NA or just replace it with a string interpretation of pd.NA or a None as the returned list of values (which failed).

Environment (please complete the following information):

OS: [Mac]
GE Version: 0.12.6
Pandas Version: 1.1.3

The text was updated successfully, but these errors were encountered:

eugmandel · 2020-11-04T16:45:04Z

@isichei Thank you for reporting this! If you would like to make a PR (the fix must be very small), it would be super welcome! Please let us know.

isichei · 2020-11-05T19:00:17Z

Yeah can try and have a look over the next couple of weeks. When you say the fix must be very small do you mean it will be a small fix or just that you try to keep PRs small?

If it is a bigger fix would you want me to make several small PRs?

eugmandel · 2020-11-06T10:18:50Z

@isichei I meant "must" not as in "we require the fix to be small", but as in "I think it will not take much code to fix this" :)

Looking forward to your PR!

Please enter the commit message for your changes. Lines starting

Please enter the commit message for your changes. Lines starting Co-authored-by: Eugene Mandel <eugene.mandel@gmail.com>

Signed-off-by: James Campbell <james.p.campbell@gmail.com>

* Fix/deprecate test_column_reflection_fallback * requested_tests * Add tests for splitters and tests for samplers. * Added a comment. * Comment. * Managing PartitionQuery properly. * Update great_expectations/execution_environment/data_connector/data_connector.py Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * making sure this runs first * Update stored meta batch_spec and batch_markers Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Refactoring splitting and sampling tests, partially out of the Golden Path. * Re-enable connection for sqlite, snowflake, mssql Signed-off-by: James Campbell <james.p.campbell@gmail.com> * adding spark fixture * Running Isort to fix lint errors. * linting * isort * Linting. * big query temporary table name * linting * WIP Core Concepts Signed-off-by: James Campbell <james.p.campbell@gmail.com> * lint * lint * lint * cleanup * Add docs gitignore * Typo. * Tidy and use SparkDFBatchData * Bring changes from #2029 / #2039 Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Fix merge errors. * Lint. * comment * Moving the batch_kwargs module to its original (Legacy Implementation) location. * Moving the batch_kwargs module to its original (Legacy Implementation) location. * Moving the batch_kwargs module to its original (Legacy Implementation) location. * Restoring BatchKwargs for the Legacy backward compatibility. * Linting * Moving the batch_kwargs module to its original (Legacy Implementation) location. * Moving the batch_kwargs module to its original (Legacy Implementation) location. * Moving the batch_kwargs module to its original (Legacy Implementation) location. * Moving the batch_kwargs module to its original (Legacy Implementation) location. * Moving the batch_kwargs module to its original (Legacy Implementation) location. * Moving the batch_kwargs module to its original (Legacy Implementation) location. * Switch SparkDFBatchData into try-except import block * Lint * Update how_to_configure_a_redshift_datasource.rst * small formatting change * Additional decoupling from Legacy datasource implementation. * Typo * Linting. * Anonymizers * Anonymizers * Anonymizers * Linting * add_column_row_condition fully tested * Initial tests for execution engine parent class passing * Execution Engine tests finished * Update how_to_configure_a_pandas_filesystem_datasource.rst * Update doc * Rename creating_modular_expectations.rst to how_to_create_modular_expectations.rst * Turn code into code blocks * Significant updates to formatting Almost ready for review * More formatting additions * Ready for review * WIP * Core Concepts Update Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_redshift_datasource.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_redshift_datasource.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * last bit of clean up. how-to-redshift-datasource * update how-to-write-how-to and clean up * typo * Guide finished * updates from review * Update how_to_configure_a_pandas_filesystem_datasource.rst * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_pandas_filesystem_datasource.rst * first push before PR * Update docs/conf.py * Update how_to_configure_a_pandas_filesystem_datasource.rst * Corrections such as unneeded method and needed Anonymizer. * remove legacy diff description * Update docs/guides/how_to_guides/creating_and_editing_expectations/how_to_create_parameterized_expectations_super_fast.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update docs/guides/how_to_guides/creating_and_editing_expectations/how_to_create_parameterized_expectations_super_fast.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * first push for how-to-sparkdf-filesystem doc * added some more references * first push. Before adding blurb on introspection and query * first push of doc + bugfix for query * Core Concepts Update Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Core Concepts Update Signed-off-by: James Campbell <james.p.campbell@gmail.com> * added output * Formatting issues but information much improved * Update docs/conf.py * Slight formatting improvement * first push how-to-mysql-datasource * Complete for review pt.2 * some formatting changes * Update docs/conf.py * Renaming Datasource to LegacyDatasource, DatasourceConfig to LegacyDatasourceConfig, DatasourceConfigSchema to LegacyDatasourceConfigSchema, and datasourceConfigSchema to legacyDatasourceConfigSchema. * s/StreamlinedSqlExecutionEngine/SimpleSqlalchemyDatasource/ * Renaming ExecutionEnvironment to Datasource * Update docs/guides/how_to_guides/creating_and_editing_expectations/how_to_create_modular_expectations.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update docs/guides/how_to_guides/creating_and_editing_expectations/how_to_create_modular_expectations.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update docs/guides/how_to_guides/creating_and_editing_expectations/how_to_create_modular_expectations.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update docs/guides/how_to_guides/creating_and_editing_expectations/how_to_create_modular_expectations.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update docs/guides/how_to_guides/creating_and_editing_expectations/how_to_create_modular_expectations.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Linting. * renaming execution_environment into new_datasource WIP * renaming execution_environment into new_datasource WIP * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_pandas_filesystem_datasource.rst * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_spark_filesystem_datasource.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_spark_filesystem_datasource.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * removed spark output from doc * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_pandas_s3_datasource.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * updates from Pr review (how to pandas s3) * trailing line * renamed test_asset * ExecutionEnvironment -> Datasource * Linting. * oops * dotting some i's * Merge. * Clean up. * Clean up. * Clean up. * Linting. * updates from a closer look * String literals renaming. * Remove unusued execution_environment code from base.py and add a comment to make Schema validation robust for the new and the Legacy Datasource classes alike. * Update how_to_configure_a_snowflake_datasource.rst * Add sample pngs * Add doc for How to Create Renderers for Custom Expectations * Update renderer to return typed RenderedTableContent * Add typehints * Add sample images * Add page ref * Fill out how-to guide * Crop image * lint * Remove batch_definition from get_batch and get_validator * Tidy up error handling * Add much better tests for get_validator; Switch attach_new_expectation_suite to create_expectation_suite_with_name * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_snowflake_datasource.rst * Propagate name changes through new tests * Make black happy * Apply suggestions from code review removed `role: ADMIN` and `warehouse` from config * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_snowflake_datasource.rst * Make isort happy * Add docstrings for MetricProvider and Expectation Signed-off-by: James Campbell <james.p.campbell@gmail.com> * PR Review updates Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Delete .test_durations * Delete Untitled.ipynb * Delete ge_docs_links.csv * Commit cleanup Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Ensure BatchDefinition is serializable Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Fix import and custom expectation issues Signed-off-by: James Campbell <james.p.campbell@gmail.com> * WIP custom expectations docs fixes Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Use legacy logic for default expectation values. Signed-off-by: James Campbell <james.p.campbell@gmail.com> * lint * Update link Signed-off-by: James Campbell <james.p.campbell@gmail.com> * 1. how_to_configure_a_redshift_datasource: a. Added step 5 - save the config - and modified the note in step 4 b. Fixed the class name - it was out of date 2. how_to_configure_a_snowflake_datasource: Fixed the class name in the output snippet - it was out of date * Update changelog, version for release Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Delete tmp--pyproject.toml * Spark Self-Managed WIP * Self-managed Spark WIP * Self-managed Spark documentation/UAT WIP. * Self-managed Spark documentation/UAT WIP. * reset * Pin to legacy pip Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Remove TableDataConnector -- it is a broken unimplemented module that was copied and pasted from Legacy design, but not worked on. * Self-managed Spark documentation/UAT WIP. * UAT-based updates to parameterized expectations Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Self-Managed Spark Datasource HOWTO guide. * Self-Managed Spark Datasource HOWTO guide. * Self-Managed Spark Datasource HOWTO guide. * Docs/draft docs for data connectors (#2086) Changes proposed in this pull request: This PR adds how-to guides for new-style DataConnectors and Datasources for 0.13. Co-authored-by: Abe Gong <abegong@users.noreply.github.com> Co-authored-by: William Shin <will@superconductive.com> * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_self_managed_spark_datasource.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_self_managed_spark_datasource.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_self_managed_spark_datasource.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_self_managed_spark_datasource.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update links * Add docs tab for experimental API * Add admonition pointing user to docs for experimental API * Update docs/guides/how_to_guides/creating_and_editing_expectations/how_to_create_parameterized_expectations_super_fast.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> Co-authored-by: Abe <abegong@gmail.com> Co-authored-by: William Shin <will@superconductive.com> Co-authored-by: Alex Sherstinsky <alex@superconductivehealth.com> Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com> Co-authored-by: Abe Gong <abegong@users.noreply.github.com> Co-authored-by: Rob Lim <robert.m.lim@gmail.com> Co-authored-by: gilpasternak35 <gilpasternak35@gmail.com> Co-authored-by: Eugene Mandel <eugene@superconductivehealth.com>

* Re-enable connection for sqlite, snowflake, mssql Signed-off-by: James Campbell <james.p.campbell@gmail.com> * adding spark fixture * Running Isort to fix lint errors. * linting * isort * Linting. * big query temporary table name * linting * WIP Core Concepts Signed-off-by: James Campbell <james.p.campbell@gmail.com> * lint * lint * lint * cleanup * Add docs gitignore * Typo. * Tidy and use SparkDFBatchData * Bring changes from #2029 / #2039 Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Fix merge errors. * Lint. * comment * Moving the batch_kwargs module to its original (Legacy Implementation) location. * Moving the batch_kwargs module to its original (Legacy Implementation) location. * Moving the batch_kwargs module to its original (Legacy Implementation) location. * Restoring BatchKwargs for the Legacy backward compatibility. * Linting * Moving the batch_kwargs module to its original (Legacy Implementation) location. * Moving the batch_kwargs module to its original (Legacy Implementation) location. * Moving the batch_kwargs module to its original (Legacy Implementation) location. * Moving the batch_kwargs module to its original (Legacy Implementation) location. * Moving the batch_kwargs module to its original (Legacy Implementation) location. * Moving the batch_kwargs module to its original (Legacy Implementation) location. * Switch SparkDFBatchData into try-except import block * Lint * Update how_to_configure_a_redshift_datasource.rst * small formatting change * Additional decoupling from Legacy datasource implementation. * Typo * Linting. * Anonymizers * Anonymizers * Anonymizers * Linting * add_column_row_condition fully tested * Initial tests for execution engine parent class passing * Execution Engine tests finished * Update how_to_configure_a_pandas_filesystem_datasource.rst * Update doc * Rename creating_modular_expectations.rst to how_to_create_modular_expectations.rst * Turn code into code blocks * Significant updates to formatting Almost ready for review * More formatting additions * Ready for review * WIP * Core Concepts Update Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_redshift_datasource.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_redshift_datasource.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * last bit of clean up. how-to-redshift-datasource * update how-to-write-how-to and clean up * typo * Guide finished * updates from review * Update how_to_configure_a_pandas_filesystem_datasource.rst * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_pandas_filesystem_datasource.rst * first push before PR * Update docs/conf.py * Update how_to_configure_a_pandas_filesystem_datasource.rst * Corrections such as unneeded method and needed Anonymizer. * remove legacy diff description * Update docs/guides/how_to_guides/creating_and_editing_expectations/how_to_create_parameterized_expectations_super_fast.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update docs/guides/how_to_guides/creating_and_editing_expectations/how_to_create_parameterized_expectations_super_fast.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * first push for how-to-sparkdf-filesystem doc * added some more references * first push. Before adding blurb on introspection and query * first push of doc + bugfix for query * Core Concepts Update Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Core Concepts Update Signed-off-by: James Campbell <james.p.campbell@gmail.com> * added output * Formatting issues but information much improved * Update docs/conf.py * Slight formatting improvement * first push how-to-mysql-datasource * Complete for review pt.2 * some formatting changes * Update docs/conf.py * Renaming Datasource to LegacyDatasource, DatasourceConfig to LegacyDatasourceConfig, DatasourceConfigSchema to LegacyDatasourceConfigSchema, and datasourceConfigSchema to legacyDatasourceConfigSchema. * s/StreamlinedSqlExecutionEngine/SimpleSqlalchemyDatasource/ * Renaming ExecutionEnvironment to Datasource * Update docs/guides/how_to_guides/creating_and_editing_expectations/how_to_create_modular_expectations.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update docs/guides/how_to_guides/creating_and_editing_expectations/how_to_create_modular_expectations.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update docs/guides/how_to_guides/creating_and_editing_expectations/how_to_create_modular_expectations.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update docs/guides/how_to_guides/creating_and_editing_expectations/how_to_create_modular_expectations.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update docs/guides/how_to_guides/creating_and_editing_expectations/how_to_create_modular_expectations.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Linting. * renaming execution_environment into new_datasource WIP * renaming execution_environment into new_datasource WIP * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_pandas_filesystem_datasource.rst * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_spark_filesystem_datasource.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_spark_filesystem_datasource.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * removed spark output from doc * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_pandas_s3_datasource.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * updates from Pr review (how to pandas s3) * trailing line * renamed test_asset * ExecutionEnvironment -> Datasource * Linting. * oops * dotting some i's * Merge. * Clean up. * Clean up. * Clean up. * Linting. * updates from a closer look * String literals renaming. * Remove unusued execution_environment code from base.py and add a comment to make Schema validation robust for the new and the Legacy Datasource classes alike. * Update how_to_configure_a_snowflake_datasource.rst * Add sample pngs * Add doc for How to Create Renderers for Custom Expectations * Update renderer to return typed RenderedTableContent * Add typehints * Add sample images * Add page ref * Fill out how-to guide * Crop image * lint * Remove batch_definition from get_batch and get_validator * Tidy up error handling * Add much better tests for get_validator; Switch attach_new_expectation_suite to create_expectation_suite_with_name * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_snowflake_datasource.rst * Propagate name changes through new tests * Make black happy * Apply suggestions from code review removed `role: ADMIN` and `warehouse` from config * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_snowflake_datasource.rst * Make isort happy * Add docstrings for MetricProvider and Expectation Signed-off-by: James Campbell <james.p.campbell@gmail.com> * PR Review updates Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Delete .test_durations * Delete Untitled.ipynb * Delete ge_docs_links.csv * Commit cleanup Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Ensure BatchDefinition is serializable Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Fix import and custom expectation issues Signed-off-by: James Campbell <james.p.campbell@gmail.com> * WIP custom expectations docs fixes Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Use legacy logic for default expectation values. Signed-off-by: James Campbell <james.p.campbell@gmail.com> * lint * Update link Signed-off-by: James Campbell <james.p.campbell@gmail.com> * 1. how_to_configure_a_redshift_datasource: a. Added step 5 - save the config - and modified the note in step 4 b. Fixed the class name - it was out of date 2. how_to_configure_a_snowflake_datasource: Fixed the class name in the output snippet - it was out of date * Update changelog, version for release Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Delete tmp--pyproject.toml * Spark Self-Managed WIP * Self-managed Spark WIP * Self-managed Spark documentation/UAT WIP. * Self-managed Spark documentation/UAT WIP. * reset * Pin to legacy pip Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Remove TableDataConnector -- it is a broken unimplemented module that was copied and pasted from Legacy design, but not worked on. * Self-managed Spark documentation/UAT WIP. * UAT-based updates to parameterized expectations Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Self-Managed Spark Datasource HOWTO guide. * Self-Managed Spark Datasource HOWTO guide. * Self-Managed Spark Datasource HOWTO guide. * Docs/draft docs for data connectors (#2086) Changes proposed in this pull request: This PR adds how-to guides for new-style DataConnectors and Datasources for 0.13. Co-authored-by: Abe Gong <abegong@users.noreply.github.com> Co-authored-by: William Shin <will@superconductive.com> * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_self_managed_spark_datasource.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_self_managed_spark_datasource.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_self_managed_spark_datasource.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_self_managed_spark_datasource.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Enhance self_check of Datasource class. * Linting * ExecutionEngine Config [WIP] * ExecutionEngine Config [WIP] * Update links * Add docs tab for experimental API * Add admonition pointing user to docs for experimental API * Update docs/guides/how_to_guides/creating_and_editing_expectations/how_to_create_parameterized_expectations_super_fast.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * ExecutionEngine Config * Tests for more expressive Datasource.self_check() diagnostic. * Linting. * TypeError fix for numeric utility. * Additional test. Co-authored-by: James Campbell <james.p.campbell@gmail.com> Co-authored-by: Rob Lim <robert.m.lim@gmail.com> Co-authored-by: William Shin <will@superconductive.com> Co-authored-by: Abe Gong <abegong@users.noreply.github.com> Co-authored-by: Abe <abegong@gmail.com> Co-authored-by: gilpasternak35 <gilpasternak35@gmail.com> Co-authored-by: Eugene Mandel <eugene@superconductivehealth.com>

Please enter the commit message for your changes. Lines starting Co-authored-by: Eugene Mandel <eugene.mandel@gmail.com>

* Fix/deprecate test_column_reflection_fallback * requested_tests * Add tests for splitters and tests for samplers. * Added a comment. * Comment. * Managing PartitionQuery properly. * Update great_expectations/execution_environment/data_connector/data_connector.py Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * making sure this runs first * Update stored meta batch_spec and batch_markers Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Refactoring splitting and sampling tests, partially out of the Golden Path. * Re-enable connection for sqlite, snowflake, mssql Signed-off-by: James Campbell <james.p.campbell@gmail.com> * adding spark fixture * Running Isort to fix lint errors. * linting * isort * Linting. * big query temporary table name * linting * WIP Core Concepts Signed-off-by: James Campbell <james.p.campbell@gmail.com> * lint * lint * lint * cleanup * Add docs gitignore * Typo. * Tidy and use SparkDFBatchData * Bring changes from great-expectations#2029 / great-expectations#2039 Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Fix merge errors. * Lint. * comment * Moving the batch_kwargs module to its original (Legacy Implementation) location. * Moving the batch_kwargs module to its original (Legacy Implementation) location. * Moving the batch_kwargs module to its original (Legacy Implementation) location. * Restoring BatchKwargs for the Legacy backward compatibility. * Linting * Moving the batch_kwargs module to its original (Legacy Implementation) location. * Moving the batch_kwargs module to its original (Legacy Implementation) location. * Moving the batch_kwargs module to its original (Legacy Implementation) location. * Moving the batch_kwargs module to its original (Legacy Implementation) location. * Moving the batch_kwargs module to its original (Legacy Implementation) location. * Moving the batch_kwargs module to its original (Legacy Implementation) location. * Switch SparkDFBatchData into try-except import block * Lint * Update how_to_configure_a_redshift_datasource.rst * small formatting change * Additional decoupling from Legacy datasource implementation. * Typo * Linting. * Anonymizers * Anonymizers * Anonymizers * Linting * add_column_row_condition fully tested * Initial tests for execution engine parent class passing * Execution Engine tests finished * Update how_to_configure_a_pandas_filesystem_datasource.rst * Update doc * Rename creating_modular_expectations.rst to how_to_create_modular_expectations.rst * Turn code into code blocks * Significant updates to formatting Almost ready for review * More formatting additions * Ready for review * WIP * Core Concepts Update Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_redshift_datasource.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_redshift_datasource.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * last bit of clean up. how-to-redshift-datasource * update how-to-write-how-to and clean up * typo * Guide finished * updates from review * Update how_to_configure_a_pandas_filesystem_datasource.rst * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_pandas_filesystem_datasource.rst * first push before PR * Update docs/conf.py * Update how_to_configure_a_pandas_filesystem_datasource.rst * Corrections such as unneeded method and needed Anonymizer. * remove legacy diff description * Update docs/guides/how_to_guides/creating_and_editing_expectations/how_to_create_parameterized_expectations_super_fast.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update docs/guides/how_to_guides/creating_and_editing_expectations/how_to_create_parameterized_expectations_super_fast.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * first push for how-to-sparkdf-filesystem doc * added some more references * first push. Before adding blurb on introspection and query * first push of doc + bugfix for query * Core Concepts Update Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Core Concepts Update Signed-off-by: James Campbell <james.p.campbell@gmail.com> * added output * Formatting issues but information much improved * Update docs/conf.py * Slight formatting improvement * first push how-to-mysql-datasource * Complete for review pt.2 * some formatting changes * Update docs/conf.py * Renaming Datasource to LegacyDatasource, DatasourceConfig to LegacyDatasourceConfig, DatasourceConfigSchema to LegacyDatasourceConfigSchema, and datasourceConfigSchema to legacyDatasourceConfigSchema. * s/StreamlinedSqlExecutionEngine/SimpleSqlalchemyDatasource/ * Renaming ExecutionEnvironment to Datasource * Update docs/guides/how_to_guides/creating_and_editing_expectations/how_to_create_modular_expectations.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update docs/guides/how_to_guides/creating_and_editing_expectations/how_to_create_modular_expectations.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update docs/guides/how_to_guides/creating_and_editing_expectations/how_to_create_modular_expectations.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update docs/guides/how_to_guides/creating_and_editing_expectations/how_to_create_modular_expectations.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update docs/guides/how_to_guides/creating_and_editing_expectations/how_to_create_modular_expectations.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Linting. * renaming execution_environment into new_datasource WIP * renaming execution_environment into new_datasource WIP * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_pandas_filesystem_datasource.rst * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_spark_filesystem_datasource.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_spark_filesystem_datasource.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * removed spark output from doc * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_pandas_s3_datasource.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * updates from Pr review (how to pandas s3) * trailing line * renamed test_asset * ExecutionEnvironment -> Datasource * Linting. * oops * dotting some i's * Merge. * Clean up. * Clean up. * Clean up. * Linting. * updates from a closer look * String literals renaming. * Remove unusued execution_environment code from base.py and add a comment to make Schema validation robust for the new and the Legacy Datasource classes alike. * Update how_to_configure_a_snowflake_datasource.rst * Add sample pngs * Add doc for How to Create Renderers for Custom Expectations * Update renderer to return typed RenderedTableContent * Add typehints * Add sample images * Add page ref * Fill out how-to guide * Crop image * lint * Remove batch_definition from get_batch and get_validator * Tidy up error handling * Add much better tests for get_validator; Switch attach_new_expectation_suite to create_expectation_suite_with_name * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_snowflake_datasource.rst * Propagate name changes through new tests * Make black happy * Apply suggestions from code review removed `role: ADMIN` and `warehouse` from config * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_snowflake_datasource.rst * Make isort happy * Add docstrings for MetricProvider and Expectation Signed-off-by: James Campbell <james.p.campbell@gmail.com> * PR Review updates Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Delete .test_durations * Delete Untitled.ipynb * Delete ge_docs_links.csv * Commit cleanup Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Ensure BatchDefinition is serializable Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Fix import and custom expectation issues Signed-off-by: James Campbell <james.p.campbell@gmail.com> * WIP custom expectations docs fixes Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Use legacy logic for default expectation values. Signed-off-by: James Campbell <james.p.campbell@gmail.com> * lint * Update link Signed-off-by: James Campbell <james.p.campbell@gmail.com> * 1. how_to_configure_a_redshift_datasource: a. Added step 5 - save the config - and modified the note in step 4 b. Fixed the class name - it was out of date 2. how_to_configure_a_snowflake_datasource: Fixed the class name in the output snippet - it was out of date * Update changelog, version for release Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Delete tmp--pyproject.toml * Spark Self-Managed WIP * Self-managed Spark WIP * Self-managed Spark documentation/UAT WIP. * Self-managed Spark documentation/UAT WIP. * reset * Pin to legacy pip Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Remove TableDataConnector -- it is a broken unimplemented module that was copied and pasted from Legacy design, but not worked on. * Self-managed Spark documentation/UAT WIP. * UAT-based updates to parameterized expectations Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Self-Managed Spark Datasource HOWTO guide. * Self-Managed Spark Datasource HOWTO guide. * Self-Managed Spark Datasource HOWTO guide. * Docs/draft docs for data connectors (great-expectations#2086) Changes proposed in this pull request: This PR adds how-to guides for new-style DataConnectors and Datasources for 0.13. Co-authored-by: Abe Gong <abegong@users.noreply.github.com> Co-authored-by: William Shin <will@superconductive.com> * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_self_managed_spark_datasource.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_self_managed_spark_datasource.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_self_managed_spark_datasource.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_self_managed_spark_datasource.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update links * Add docs tab for experimental API * Add admonition pointing user to docs for experimental API * Update docs/guides/how_to_guides/creating_and_editing_expectations/how_to_create_parameterized_expectations_super_fast.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> Co-authored-by: Abe <abegong@gmail.com> Co-authored-by: William Shin <will@superconductive.com> Co-authored-by: Alex Sherstinsky <alex@superconductivehealth.com> Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com> Co-authored-by: Abe Gong <abegong@users.noreply.github.com> Co-authored-by: Rob Lim <robert.m.lim@gmail.com> Co-authored-by: gilpasternak35 <gilpasternak35@gmail.com> Co-authored-by: Eugene Mandel <eugene@superconductivehealth.com>

…xpectations#2091) * Re-enable connection for sqlite, snowflake, mssql Signed-off-by: James Campbell <james.p.campbell@gmail.com> * adding spark fixture * Running Isort to fix lint errors. * linting * isort * Linting. * big query temporary table name * linting * WIP Core Concepts Signed-off-by: James Campbell <james.p.campbell@gmail.com> * lint * lint * lint * cleanup * Add docs gitignore * Typo. * Tidy and use SparkDFBatchData * Bring changes from great-expectations#2029 / great-expectations#2039 Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Fix merge errors. * Lint. * comment * Moving the batch_kwargs module to its original (Legacy Implementation) location. * Moving the batch_kwargs module to its original (Legacy Implementation) location. * Moving the batch_kwargs module to its original (Legacy Implementation) location. * Restoring BatchKwargs for the Legacy backward compatibility. * Linting * Moving the batch_kwargs module to its original (Legacy Implementation) location. * Moving the batch_kwargs module to its original (Legacy Implementation) location. * Moving the batch_kwargs module to its original (Legacy Implementation) location. * Moving the batch_kwargs module to its original (Legacy Implementation) location. * Moving the batch_kwargs module to its original (Legacy Implementation) location. * Moving the batch_kwargs module to its original (Legacy Implementation) location. * Switch SparkDFBatchData into try-except import block * Lint * Update how_to_configure_a_redshift_datasource.rst * small formatting change * Additional decoupling from Legacy datasource implementation. * Typo * Linting. * Anonymizers * Anonymizers * Anonymizers * Linting * add_column_row_condition fully tested * Initial tests for execution engine parent class passing * Execution Engine tests finished * Update how_to_configure_a_pandas_filesystem_datasource.rst * Update doc * Rename creating_modular_expectations.rst to how_to_create_modular_expectations.rst * Turn code into code blocks * Significant updates to formatting Almost ready for review * More formatting additions * Ready for review * WIP * Core Concepts Update Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_redshift_datasource.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_redshift_datasource.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * last bit of clean up. how-to-redshift-datasource * update how-to-write-how-to and clean up * typo * Guide finished * updates from review * Update how_to_configure_a_pandas_filesystem_datasource.rst * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_pandas_filesystem_datasource.rst * first push before PR * Update docs/conf.py * Update how_to_configure_a_pandas_filesystem_datasource.rst * Corrections such as unneeded method and needed Anonymizer. * remove legacy diff description * Update docs/guides/how_to_guides/creating_and_editing_expectations/how_to_create_parameterized_expectations_super_fast.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update docs/guides/how_to_guides/creating_and_editing_expectations/how_to_create_parameterized_expectations_super_fast.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * first push for how-to-sparkdf-filesystem doc * added some more references * first push. Before adding blurb on introspection and query * first push of doc + bugfix for query * Core Concepts Update Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Core Concepts Update Signed-off-by: James Campbell <james.p.campbell@gmail.com> * added output * Formatting issues but information much improved * Update docs/conf.py * Slight formatting improvement * first push how-to-mysql-datasource * Complete for review pt.2 * some formatting changes * Update docs/conf.py * Renaming Datasource to LegacyDatasource, DatasourceConfig to LegacyDatasourceConfig, DatasourceConfigSchema to LegacyDatasourceConfigSchema, and datasourceConfigSchema to legacyDatasourceConfigSchema. * s/StreamlinedSqlExecutionEngine/SimpleSqlalchemyDatasource/ * Renaming ExecutionEnvironment to Datasource * Update docs/guides/how_to_guides/creating_and_editing_expectations/how_to_create_modular_expectations.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update docs/guides/how_to_guides/creating_and_editing_expectations/how_to_create_modular_expectations.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update docs/guides/how_to_guides/creating_and_editing_expectations/how_to_create_modular_expectations.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update docs/guides/how_to_guides/creating_and_editing_expectations/how_to_create_modular_expectations.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update docs/guides/how_to_guides/creating_and_editing_expectations/how_to_create_modular_expectations.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Linting. * renaming execution_environment into new_datasource WIP * renaming execution_environment into new_datasource WIP * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_pandas_filesystem_datasource.rst * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_spark_filesystem_datasource.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_spark_filesystem_datasource.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * removed spark output from doc * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_pandas_s3_datasource.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * updates from Pr review (how to pandas s3) * trailing line * renamed test_asset * ExecutionEnvironment -> Datasource * Linting. * oops * dotting some i's * Merge. * Clean up. * Clean up. * Clean up. * Linting. * updates from a closer look * String literals renaming. * Remove unusued execution_environment code from base.py and add a comment to make Schema validation robust for the new and the Legacy Datasource classes alike. * Update how_to_configure_a_snowflake_datasource.rst * Add sample pngs * Add doc for How to Create Renderers for Custom Expectations * Update renderer to return typed RenderedTableContent * Add typehints * Add sample images * Add page ref * Fill out how-to guide * Crop image * lint * Remove batch_definition from get_batch and get_validator * Tidy up error handling * Add much better tests for get_validator; Switch attach_new_expectation_suite to create_expectation_suite_with_name * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_snowflake_datasource.rst * Propagate name changes through new tests * Make black happy * Apply suggestions from code review removed `role: ADMIN` and `warehouse` from config * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_snowflake_datasource.rst * Make isort happy * Add docstrings for MetricProvider and Expectation Signed-off-by: James Campbell <james.p.campbell@gmail.com> * PR Review updates Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Delete .test_durations * Delete Untitled.ipynb * Delete ge_docs_links.csv * Commit cleanup Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Ensure BatchDefinition is serializable Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Fix import and custom expectation issues Signed-off-by: James Campbell <james.p.campbell@gmail.com> * WIP custom expectations docs fixes Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Use legacy logic for default expectation values. Signed-off-by: James Campbell <james.p.campbell@gmail.com> * lint * Update link Signed-off-by: James Campbell <james.p.campbell@gmail.com> * 1. how_to_configure_a_redshift_datasource: a. Added step 5 - save the config - and modified the note in step 4 b. Fixed the class name - it was out of date 2. how_to_configure_a_snowflake_datasource: Fixed the class name in the output snippet - it was out of date * Update changelog, version for release Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Delete tmp--pyproject.toml * Spark Self-Managed WIP * Self-managed Spark WIP * Self-managed Spark documentation/UAT WIP. * Self-managed Spark documentation/UAT WIP. * reset * Pin to legacy pip Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Remove TableDataConnector -- it is a broken unimplemented module that was copied and pasted from Legacy design, but not worked on. * Self-managed Spark documentation/UAT WIP. * UAT-based updates to parameterized expectations Signed-off-by: James Campbell <james.p.campbell@gmail.com> * Self-Managed Spark Datasource HOWTO guide. * Self-Managed Spark Datasource HOWTO guide. * Self-Managed Spark Datasource HOWTO guide. * Docs/draft docs for data connectors (great-expectations#2086) Changes proposed in this pull request: This PR adds how-to guides for new-style DataConnectors and Datasources for 0.13. Co-authored-by: Abe Gong <abegong@users.noreply.github.com> Co-authored-by: William Shin <will@superconductive.com> * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_self_managed_spark_datasource.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_self_managed_spark_datasource.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_self_managed_spark_datasource.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Update docs/guides/how_to_guides/configuring_datasources/how_to_configure_a_self_managed_spark_datasource.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * Enhance self_check of Datasource class. * Linting * ExecutionEngine Config [WIP] * ExecutionEngine Config [WIP] * Update links * Add docs tab for experimental API * Add admonition pointing user to docs for experimental API * Update docs/guides/how_to_guides/creating_and_editing_expectations/how_to_create_parameterized_expectations_super_fast.rst Co-authored-by: Abe Gong <abegong@users.noreply.github.com> * ExecutionEngine Config * Tests for more expressive Datasource.self_check() diagnostic. * Linting. * TypeError fix for numeric utility. * Additional test. Co-authored-by: James Campbell <james.p.campbell@gmail.com> Co-authored-by: Rob Lim <robert.m.lim@gmail.com> Co-authored-by: William Shin <will@superconductive.com> Co-authored-by: Abe Gong <abegong@users.noreply.github.com> Co-authored-by: Abe <abegong@gmail.com> Co-authored-by: gilpasternak35 <gilpasternak35@gmail.com> Co-authored-by: Eugene Mandel <eugene@superconductivehealth.com>

eugmandel added bug Bugs bugs bugs! help wanted Issues we'd love to see community contributions for. Join #contributors-contributing in our Slack! labels Nov 4, 2020

isichei added a commit to isichei/great_expectations that referenced this issue Nov 7, 2020

[BUGFIX] Fixes great-expectations#2029

b408f2e

Please enter the commit message for your changes. Lines starting

isichei mentioned this issue Nov 7, 2020

[BUGFIX] Fixes great-expectations/great_expectations#2029 #2039

Merged

eugmandel closed this as completed in #2039 Nov 16, 2020

eugmandel added a commit that referenced this issue Nov 16, 2020

[BUGFIX] Fixes #2029 (#2039)

104b5f8

Please enter the commit message for your changes. Lines starting Co-authored-by: Eugene Mandel <eugene.mandel@gmail.com>

alexsherstinsky pushed a commit to alexsherstinsky/great_expectations that referenced this issue Nov 25, 2020

Bring changes from great-expectations#2029 / great-expectations#2039

b57de47

Signed-off-by: James Campbell <james.p.campbell@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GE errors when trying to return a result when contains a pandas NA #2029

GE errors when trying to return a result when contains a pandas NA #2029

isichei commented Nov 2, 2020 •

edited

eugmandel commented Nov 4, 2020

isichei commented Nov 5, 2020

eugmandel commented Nov 6, 2020

GE errors when trying to return a result when contains a pandas NA #2029

GE errors when trying to return a result when contains a pandas NA #2029

Comments

isichei commented Nov 2, 2020 • edited

eugmandel commented Nov 4, 2020

isichei commented Nov 5, 2020

eugmandel commented Nov 6, 2020

isichei commented Nov 2, 2020 •

edited