Skip to content

Commit

Permalink
Merge pull request #494 from great-expectations/feature/data_context_…
Browse files Browse the repository at this point in the history
…namespace

Feature/data context namespace
  • Loading branch information
jcampbell committed Jun 17, 2019
2 parents 17d8fa9 + c9fbefc commit 7b2493e
Show file tree
Hide file tree
Showing 54 changed files with 2,637 additions and 714 deletions.
6 changes: 3 additions & 3 deletions docs/source/evaluation_parameters.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ value that should be used during the initial evaluation of the expectation.
}
You can also store parameter values in a special dictionary called evaluation_parameters that is stored in the \
expectations_config to be available to multiple expectations or while declaring additional expectations.
expectation_suite to be available to multiple expectations or while declaring additional expectations.

.. code-block:: python
Expand All @@ -42,7 +42,7 @@ When validating expectations, you can provide evaluation parameters based on ups
.. code-block:: python
>> my_df.validate(expectations_config=my_dag_step_config, evaluation_parameters={"upstream_row_count": upstream_row_count})
>> my_df.validate(expectation_suite=my_dag_step_config, evaluation_parameters={"upstream_row_count": upstream_row_count})
Finally, the command-line tool also allows you to provide a JSON file that contains parameters to use during evaluation:
Expand All @@ -52,4 +52,4 @@ Finally, the command-line tool also allows you to provide a JSON file that conta
{
"upstream_row_count": 10
}
>> great_expectations validate --evaluation_paramters=my_parameters_file.json dataset_file.csv expectations_config.json
>> great_expectations validate --evaluation_parameters=my_parameters_file.json dataset_file.csv expectation_suite.json
4 changes: 2 additions & 2 deletions docs/source/expectations.rst
Original file line number Diff line number Diff line change
Expand Up @@ -125,13 +125,13 @@ You can also add notes or even structured metadata to expectations to describe t
Saving Expectations
------------------------------------------------------------------------------
At the end of your exploration, call `save_expectations` to store all Expectations from your session to your pipeline test files.
At the end of your exploration, call `save_expectation_suite` to store all Expectations from your session to your pipeline test files.
This is how you always know what to expect from your data.
.. code-block:: bash
>> my_df.save_expectations("my_titanic_expectations.json")
>> my_df.save_expectation_suite("my_titanic_expectations.json")
For more detail on how to control expectation output, please see :ref:`standard_arguments` and :ref:`result_format`.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/profiling.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ profiler class that will evaluate a dataset object and add expectations to it.
>> import great_expectations as ge
>> df = ge.dataset.PandasDataset({"col": [1, 2, 3, 4, 5]})
>> df.profile(ge.profile.ColumnsExistProfiler)
>> df.get_expectations()
>> df.get_expectation_suite()
{'dataset_name': None,
'meta': {'great_expectations.__version__': '0.4.4__develop'},
'expectations': [
Expand Down
4 changes: 2 additions & 2 deletions docs/source/standard_arguments.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Standard arguments for expectations
All Expectations return a json-serializable dictionary when evaluated, and share four standard (optional) arguments:

- :ref:`result_format`: controls what information is returned from the evaluation of the expectation expectation.
- :ref:`include_config`: If true, then the expectation config itself is returned as part of the result object.
- :ref:`include_config`: If true, then the expectation suite itself is returned as part of the result object.
- :ref:`catch_exceptions`: If true, execution will not fail if the Expectation encounters an error. Instead, it will \
return success = False and provide an informative error message.
- :ref:`meta`: allows user-supplied meta-data to be stored with an expectation.
Expand All @@ -23,7 +23,7 @@ See :ref:`result_format` for more information.
`include_config`
------------------------------------------------------------------------------

All Expectations accept a boolean `include_config` parameter. If true, then the expectation config itself is returned as part of the result object
All Expectations accept a boolean `include_config` parameter. If true, then the expectation suite itself is returned as part of the result object

.. code-block:: bash
Expand Down
4 changes: 2 additions & 2 deletions docs/source/validation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,10 @@ Once you've constructed and stored Expectations, you can use them to validate ne
>> import json
>> import great_expectations as ge
>> my_expectations_config = json.load(file("my_titanic_expectations.json"))
>> my_expectation_suite = json.load(file("my_titanic_expectations.json"))
>> my_df = ge.read_csv(
"./tests/examples/titanic.csv",
expectations_config=my_expectations_config
expectation_suite=my_expectation_suite
)
>> my_df.validate()
Expand Down
14 changes: 7 additions & 7 deletions examples/integrations/airflow/operators/expectation_operator.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ def __init__(self,
Validate provided dataset using great_expectations.
:param dataset: Name of the dataset being loaded
:type str
:param expectations_json: file pointing to expectation config or json string
:param expectations_json: file pointing to expectation suite or json string
:type str
:param fail_on_error: True if airflow job should fail when expectations fail
:type bool
Expand Down Expand Up @@ -137,15 +137,15 @@ def _get_dataframe(self):

def _load_json(self):
"""
Load expectation config based on operator parameters. If provided expectations_json is a file the config will
be loaded from this file. Otherwise we'll try to load the config as a string.
Load expectation suite based on operator parameters. If provided expectations_json is a file the suite will
be loaded from this file. Otherwise we'll try to load the expectation suite as a string.
:return:
"""
if os.path.isfile(self.expectations_json):
self.log.info("Loading expectation config from file {file}".format(file=self.expectations_json))
self.log.info("Loading expectation suite from file {file}".format(file=self.expectations_json))
return json.load(open(self.expectations_json))
else:
self.log.info("Loading expectation config from string")
self.log.info("Loading expectation suite from string")
return json.loads(self.expectations_json)

def _store_results(self, results):
Expand All @@ -154,9 +154,9 @@ def _store_results(self, results):

def execute(self, context):
df = self._get_dataframe()
config = self._load_json()
suite = self._load_json()
self.log.info("Start dataset validation for set {set}".format(set=self.dataset_name))
results = df.validate(expectations_config=config)
results = df.validate(expectation_suite=suite)

self.log.info(pformat(results))

Expand Down
2 changes: 1 addition & 1 deletion examples/notebooks/Crop_Expectations_With_Reshape.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -166,7 +166,7 @@
},
"outputs": [],
"source": [
"print(json.dumps(df.get_expectations(), indent = 2))"
"print(json.dumps(df.get_expectation_suite(), indent = 2))"
]
}
],
Expand Down
8 changes: 4 additions & 4 deletions examples/notebooks/Distributional_Expectations_Demo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -364,7 +364,7 @@
},
"outputs": [],
"source": [
"df.get_expectations()"
"df.get_expectation_suite()"
]
},
{
Expand Down Expand Up @@ -459,7 +459,7 @@
},
"outputs": [],
"source": [
"my_expectations = df.get_expectations()"
"my_expectations = df.get_expectation_suite()"
]
},
{
Expand All @@ -481,7 +481,7 @@
},
"outputs": [],
"source": [
"results = df_test.validate(expectations_config=my_expectations)\n",
"results = df_test.validate(expectation_suite=my_expectations)\n",
"results"
]
},
Expand All @@ -493,7 +493,7 @@
},
"outputs": [],
"source": [
"failures = df_test.validate(expectations_config=my_expectations, only_return_failures=True)\n",
"failures = df_test.validate(expectation_suite=my_expectations, only_return_failures=True)\n",
"failures"
]
},
Expand Down
4 changes: 2 additions & 2 deletions examples/notebooks/explore_titanic_data.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -136,8 +136,8 @@
},
"outputs": [],
"source": [
"print json.dumps(titanic_df.get_expectations(), indent=2)\n",
"# titanic_df.save_expectations('titanic_expectations.json')"
"print json.dumps(titanic_df.get_expectation_suite(), indent=2)\n",
"# titanic_df.save_expectation_suite('titanic_expectations.json')"
]
},
{
Expand Down
36 changes: 18 additions & 18 deletions great_expectations/cli/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ def cli():

@cli.command()
@click.argument('dataset')
@click.argument('expectations_config_file')
@click.argument('expectation_suite_file')
@click.option('--evaluation_parameters', '-p', default=None,
help='Path to a file containing JSON object used to evaluate parameters in expectations config.')
@click.option('--result_format', '-o', default="SUMMARY",
Expand All @@ -59,32 +59,32 @@ def cli():
help='Path to a python module containing a custom dataset class.')
@click.option('--custom_dataset_class', '-c', default=None,
help='Name of the custom dataset class to use during evaluation.')
def validate(dataset, expectations_config_file, evaluation_parameters, result_format,
def validate(dataset, expectation_suite_file, evaluation_parameters, result_format,
catch_exceptions, only_return_failures, custom_dataset_module, custom_dataset_class):
"""Validate a CSV file against an expectations configuration.
"""Validate a CSV file against an expectation suite.
DATASET: Path to a file containing a CSV file to validate using the provided expectations_config_file.
DATASET: Path to a file containing a CSV file to validate using the provided expectation_suite_file.
EXPECTATIONS_CONFIG_FILE: Path to a file containing a valid great_expectations expectations config to use to \
EXPECTATION_SUITE_FILE: Path to a file containing a valid great_expectations expectations suite to use to \
validate the data.
"""

"""
Read a dataset file and validate it using a config saved in another file. Uses parameters defined in the dispatch
Read a dataset file and validate it using an expectation suite saved in another file. Uses parameters defined in the dispatch
method.
:param parsed_args: A Namespace object containing parsed arguments from the dispatch method.
:return: The number of unsucessful expectations
"""
expectations_config_file = expectations_config_file
expectation_suite_file = expectation_suite_file

expectations_config = json.load(open(expectations_config_file))
expectation_suite = json.load(open(expectation_suite_file))

if evaluation_parameters is not None:
evaluation_parameters = json.load(
open(evaluation_parameters, "r"))

# Use a custom dataasset module and class if provided. Otherwise infer from the config.
# Use a custom dataasset module and class if provided. Otherwise infer from the expectation suite
if custom_dataset_module:
sys.path.insert(0, os.path.dirname(
custom_dataset_module))
Expand All @@ -93,28 +93,28 @@ def validate(dataset, expectations_config_file, evaluation_parameters, result_fo
custom_module = __import__(str(module_name))
dataset_class = getattr(
custom_module, custom_dataset_class)
elif "data_asset_type" in expectations_config:
if (expectations_config["data_asset_type"] == "Dataset" or
expectations_config["data_asset_type"] == "PandasDataset"):
elif "data_asset_type" in expectation_suite:
if (expectation_suite["data_asset_type"] == "Dataset" or
expectation_suite["data_asset_type"] == "PandasDataset"):
dataset_class = PandasDataset
elif expectations_config["data_asset_type"].endswith("Dataset"):
elif expectation_suite["data_asset_type"].endswith("Dataset"):
logger.info("Using PandasDataset to validate dataset of type %s." %
expectations_config["data_asset_type"])
expectation_suite["data_asset_type"])
dataset_class = PandasDataset
elif expectations_config["data_asset_type"] == "FileDataAsset":
elif expectation_suite["data_asset_type"] == "FileDataAsset":
dataset_class = FileDataAsset
else:
logger.critical("Unrecognized data_asset_type %s. You may need to specifcy custom_dataset_module and \
custom_dataset_class." % expectations_config["data_asset_type"])
custom_dataset_class." % expectation_suite["data_asset_type"])
return -1
else:
dataset_class = PandasDataset

if issubclass(dataset_class, Dataset):
da = read_csv(dataset, expectations_config=expectations_config,
da = read_csv(dataset, expectation_suite=expectation_suite,
dataset_class=dataset_class)
else:
da = dataset_class(dataset, config=expectations_config)
da = dataset_class(dataset, config=expectation_suite)

result = da.validate(
evaluation_parameters=evaluation_parameters,
Expand Down
Loading

0 comments on commit 7b2493e

Please sign in to comment.