Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BasicSuiteBuilderProfiler includes incorrect expectations when using included_expectations #1422

Closed
Aylr opened this issue May 12, 2020 · 0 comments · Fixed by #1445
Closed
Labels
bug Bugs bugs bugs!

Comments

@Aylr
Copy link
Contributor

Aylr commented May 12, 2020

Describe the bug
When specifying included_expectations in the scaffold notebook, other expectations appear in the resulting suite.

To Reproduce

  1. Using the movie ratings data, specifically the ratings table.
  2. great_expectations suite scaffold foo
  3. uncomment all columns for inclusion in the suite
  4. configure and run the profiler as follows:
scaffold_config = {
    "included_columns": included_columns,
    "included_expectations": ["expect_column_values_to_not_be_null"],
}
suite, evr = BasicSuiteBuilderProfiler().profile(batch, profiler_configuration=scaffold_config)
  1. Run the remainder of the notebook
  2. In Data Docs, note that there are both table-level expectations and possibly other expectations (in this case one of type expect_column_values_to_be_unique

Expected behavior
When configuring the BasicSuiteBuilderProfiler with included_expectations, only expectations of that type should remain in the resulting suite.

Environment (please complete the following information):

  • OS: macOS
  • GE Version: 0.10.9+30.g676e89c7

Additional context
It is possible that the exclusions may have a similar bug - so verify that as well.

Resulting suite:

{
  "data_asset_type": "Dataset",
  "expectation_suite_name": "ratings-small",
  "expectations": [
    {
      "expectation_type": "expect_column_values_to_not_be_null",
      "kwargs": {
        "column": "userId"
      },
      "meta": {
        "BasicSuiteBuilderProfiler": {
          "confidence": "very low"
        }
      }
    },
    {
      "expectation_type": "expect_column_values_to_be_unique",
      "kwargs": {
        "column": "movieId"
      },
      "meta": {
        "BasicSuiteBuilderProfiler": {
          "confidence": "very low"
        }
      }
    },
    {
      "expectation_type": "expect_column_values_to_not_be_null",
      "kwargs": {
        "column": "movieId"
      },
      "meta": {
        "BasicSuiteBuilderProfiler": {
          "confidence": "very low"
        }
      }
    },
    {
      "expectation_type": "expect_column_values_to_not_be_null",
      "kwargs": {
        "column": "rating"
      },
      "meta": {
        "BasicSuiteBuilderProfiler": {
          "confidence": "very low"
        }
      }
    },
    {
      "expectation_type": "expect_column_values_to_be_unique",
      "kwargs": {
        "column": "timestamp"
      },
      "meta": {
        "BasicSuiteBuilderProfiler": {
          "confidence": "very low"
        }
      }
    },
    {
      "expectation_type": "expect_column_values_to_not_be_null",
      "kwargs": {
        "column": "timestamp"
      },
      "meta": {
        "BasicSuiteBuilderProfiler": {
          "confidence": "very low"
        }
      }
    }
  ],
  "meta": {
    "BasicSuiteBuilderProfiler": {
      "batch_kwargs": {
        "datasource": "movies",
        "schema": "public",
        "table": "ratings_small"
      },
      "created_at": 1589298770.031562,
      "created_by": "BasicSuiteBuilderProfiler"
    },
    "citations": [
      {
        "batch_kwargs": {
          "datasource": "movies",
          "schema": "public",
          "table": "ratings_small"
        },
        "batch_markers": {
          "ge_load_time": "20200512T155247.969134Z"
        },
        "batch_parameters": null,
        "citation_date": "2020-05-12T09:52:50.588720",
        "comment": "BasicSuiteBuilderProfiler added a citation based on the current batch."
      }
    ],
    "columns": {
      "movieId": {
        "description": ""
      },
      "rating": {
        "description": ""
      },
      "timestamp": {
        "description": ""
      },
      "userId": {
        "description": ""
      }
    },
    "great_expectations.__version__": "0.10.9+21.g5c2bcc3c.dirty",
    "notes": {
      "content": [
        "_To add additional notes, edit the <code>meta.notes.content</code> field in the appropriate Expectation json file._"
      ],
      "format": "markdown"
    }
  }
}

Full Repro code:

from datetime import datetime
import great_expectations as ge
import great_expectations.jupyter_ux
from great_expectations.profile import BasicSuiteBuilderProfiler
from great_expectations.data_context.types.resource_identifiers import (
    ValidationResultIdentifier,
)

context = ge.data_context.DataContext()

expectation_suite_name = "ratings-small"
suite = context.create_expectation_suite(
    expectation_suite_name, overwrite_existing=True
)

batch_kwargs = {"table": "ratings_small", "schema": "public", "datasource": "movies"}
batch = context.get_batch(batch_kwargs, suite)
batch.head()

included_columns = [
    'userId',
    'movieId',
    'rating',
    'timestamp'
]

# Wipe the suite clean to prevent unwanted expectations on the batch
suite = context.create_expectation_suite(expectation_suite_name, overwrite_existing=True)
batch = context.get_batch(batch_kwargs, suite)

scaffold_config = {
    "included_columns": included_columns,
#     "excluded_columns": [],
    "included_expectations": ["expect_column_values_to_not_be_null"],
#     "excluded_expectations": [],
}

suite, evr = BasicSuiteBuilderProfiler().profile(batch, profiler_configuration=scaffold_config)

context.save_expectation_suite(suite, expectation_suite_name)

# Let's make a simple sortable timestamp. Note this could come from your pipeline runner.
run_id = datetime.utcnow().strftime("%Y%m%dT%H%M%S.%fZ")

results = context.run_validation_operator("action_list_operator", assets_to_validate=[batch], run_id=run_id)
expectation_suite_identifier = list(results["details"].keys())[0]
validation_result_identifier = ValidationResultIdentifier(
    expectation_suite_identifier=expectation_suite_identifier,
    batch_identifier=batch.batch_kwargs.to_id(),
    run_id=run_id
)
context.build_data_docs()
context.open_data_docs(validation_result_identifier)
@Aylr Aylr added the bug Bugs bugs bugs! label May 12, 2020
jcampbell added a commit that referenced this issue May 19, 2020
…1422 (#1445)

* fix issue where extra expectations included by BasicSuiteBuilderProfiler

* ran linter

* Update changelog

Co-authored-by: James Campbell <james.p.campbell@gmail.com>
alexsherstinsky pushed a commit to alexsherstinsky/great_expectations that referenced this issue Feb 19, 2021
…reat-expectations#1422 (great-expectations#1445)

* fix issue where extra expectations included by BasicSuiteBuilderProfiler

* ran linter

* Update changelog

Co-authored-by: James Campbell <james.p.campbell@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bugs bugs bugs!
Projects
None yet
1 participant