BasicSuiteBuilderProfiler includes incorrect expectations when using `included_expectations` #1422

Aylr · 2020-05-12T16:02:49Z

Describe the bug
When specifying included_expectations in the scaffold notebook, other expectations appear in the resulting suite.

To Reproduce

Using the movie ratings data, specifically the ratings table.
great_expectations suite scaffold foo
uncomment all columns for inclusion in the suite
configure and run the profiler as follows:

scaffold_config = {
    "included_columns": included_columns,
    "included_expectations": ["expect_column_values_to_not_be_null"],
}
suite, evr = BasicSuiteBuilderProfiler().profile(batch, profiler_configuration=scaffold_config)

Run the remainder of the notebook
In Data Docs, note that there are both table-level expectations and possibly other expectations (in this case one of type expect_column_values_to_be_unique

Expected behavior
When configuring the BasicSuiteBuilderProfiler with included_expectations, only expectations of that type should remain in the resulting suite.

Environment (please complete the following information):

OS: macOS
GE Version: 0.10.9+30.g676e89c7

Additional context
It is possible that the exclusions may have a similar bug - so verify that as well.

Resulting suite:

{
  "data_asset_type": "Dataset",
  "expectation_suite_name": "ratings-small",
  "expectations": [
    {
      "expectation_type": "expect_column_values_to_not_be_null",
      "kwargs": {
        "column": "userId"
      },
      "meta": {
        "BasicSuiteBuilderProfiler": {
          "confidence": "very low"
        }
      }
    },
    {
      "expectation_type": "expect_column_values_to_be_unique",
      "kwargs": {
        "column": "movieId"
      },
      "meta": {
        "BasicSuiteBuilderProfiler": {
          "confidence": "very low"
        }
      }
    },
    {
      "expectation_type": "expect_column_values_to_not_be_null",
      "kwargs": {
        "column": "movieId"
      },
      "meta": {
        "BasicSuiteBuilderProfiler": {
          "confidence": "very low"
        }
      }
    },
    {
      "expectation_type": "expect_column_values_to_not_be_null",
      "kwargs": {
        "column": "rating"
      },
      "meta": {
        "BasicSuiteBuilderProfiler": {
          "confidence": "very low"
        }
      }
    },
    {
      "expectation_type": "expect_column_values_to_be_unique",
      "kwargs": {
        "column": "timestamp"
      },
      "meta": {
        "BasicSuiteBuilderProfiler": {
          "confidence": "very low"
        }
      }
    },
    {
      "expectation_type": "expect_column_values_to_not_be_null",
      "kwargs": {
        "column": "timestamp"
      },
      "meta": {
        "BasicSuiteBuilderProfiler": {
          "confidence": "very low"
        }
      }
    }
  ],
  "meta": {
    "BasicSuiteBuilderProfiler": {
      "batch_kwargs": {
        "datasource": "movies",
        "schema": "public",
        "table": "ratings_small"
      },
      "created_at": 1589298770.031562,
      "created_by": "BasicSuiteBuilderProfiler"
    },
    "citations": [
      {
        "batch_kwargs": {
          "datasource": "movies",
          "schema": "public",
          "table": "ratings_small"
        },
        "batch_markers": {
          "ge_load_time": "20200512T155247.969134Z"
        },
        "batch_parameters": null,
        "citation_date": "2020-05-12T09:52:50.588720",
        "comment": "BasicSuiteBuilderProfiler added a citation based on the current batch."
      }
    ],
    "columns": {
      "movieId": {
        "description": ""
      },
      "rating": {
        "description": ""
      },
      "timestamp": {
        "description": ""
      },
      "userId": {
        "description": ""
      }
    },
    "great_expectations.__version__": "0.10.9+21.g5c2bcc3c.dirty",
    "notes": {
      "content": [
        "_To add additional notes, edit the <code>meta.notes.content</code> field in the appropriate Expectation json file._"
      ],
      "format": "markdown"
    }
  }
}

Full Repro code:

from datetime import datetime
import great_expectations as ge
import great_expectations.jupyter_ux
from great_expectations.profile import BasicSuiteBuilderProfiler
from great_expectations.data_context.types.resource_identifiers import (
    ValidationResultIdentifier,
)

context = ge.data_context.DataContext()

expectation_suite_name = "ratings-small"
suite = context.create_expectation_suite(
    expectation_suite_name, overwrite_existing=True
)

batch_kwargs = {"table": "ratings_small", "schema": "public", "datasource": "movies"}
batch = context.get_batch(batch_kwargs, suite)
batch.head()

included_columns = [
    'userId',
    'movieId',
    'rating',
    'timestamp'
]

# Wipe the suite clean to prevent unwanted expectations on the batch
suite = context.create_expectation_suite(expectation_suite_name, overwrite_existing=True)
batch = context.get_batch(batch_kwargs, suite)

scaffold_config = {
    "included_columns": included_columns,
#     "excluded_columns": [],
    "included_expectations": ["expect_column_values_to_not_be_null"],
#     "excluded_expectations": [],
}

suite, evr = BasicSuiteBuilderProfiler().profile(batch, profiler_configuration=scaffold_config)

context.save_expectation_suite(suite, expectation_suite_name)

# Let's make a simple sortable timestamp. Note this could come from your pipeline runner.
run_id = datetime.utcnow().strftime("%Y%m%dT%H%M%S.%fZ")

results = context.run_validation_operator("action_list_operator", assets_to_validate=[batch], run_id=run_id)
expectation_suite_identifier = list(results["details"].keys())[0]
validation_result_identifier = ValidationResultIdentifier(
    expectation_suite_identifier=expectation_suite_identifier,
    batch_identifier=batch.batch_kwargs.to_id(),
    run_id=run_id
)
context.build_data_docs()
context.open_data_docs(validation_result_identifier)

The text was updated successfully, but these errors were encountered:

…1422 (#1445) * fix issue where extra expectations included by BasicSuiteBuilderProfiler * ran linter * Update changelog Co-authored-by: James Campbell <james.p.campbell@gmail.com>

…reat-expectations#1422 (great-expectations#1445) * fix issue where extra expectations included by BasicSuiteBuilderProfiler * ran linter * Update changelog Co-authored-by: James Campbell <james.p.campbell@gmail.com>

Aylr added the bug Bugs bugs bugs! label May 12, 2020

roselightheart mentioned this issue May 15, 2020

[BUGFIX] fix extra expectations included by BasicSuiteBuilderProfiler #1422 #1445

Merged

jcampbell closed this as completed in #1445 May 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BasicSuiteBuilderProfiler includes incorrect expectations when using `included_expectations` #1422

BasicSuiteBuilderProfiler includes incorrect expectations when using `included_expectations` #1422

Aylr commented May 12, 2020 •

edited

BasicSuiteBuilderProfiler includes incorrect expectations when using included_expectations #1422

BasicSuiteBuilderProfiler includes incorrect expectations when using included_expectations #1422

Comments

Aylr commented May 12, 2020 • edited

Full Repro code:

BasicSuiteBuilderProfiler includes incorrect expectations when using `included_expectations` #1422

BasicSuiteBuilderProfiler includes incorrect expectations when using `included_expectations` #1422

Aylr commented May 12, 2020 •

edited