Merge 4d0e59f into d746536

great-expectations · May 13, 2019 · 99cf079 · 99cf079
2 parents d746536 + 4d0e59f
commit 99cf079
Show file tree

Hide file tree

Showing 61 changed files with 3,662 additions and 6,435 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -9,8 +9,12 @@ env:
   - PANDAS=0.21.0
   - PANDAS=0.23.4
   - PANDAS=latest
+# jdk:
+#   - openjdk8
 services:
   - postgresql
+# before_install:
+#   - ./run-spark.sh
 install:
   - pip install --only-binary=numpy,scipy numpy scipy
   - if [ "$PANDAS" = "latest" ]; then pip install pandas; else pip install pandas==$PANDAS; fi

diff --git a/README.md b/README.md
@@ -13,7 +13,7 @@ Great Expectations
 What is great_expectations?
 --------------------------------------------------------------------------------
 
-Great Expectations is a framework that helps teams save time and promote analytic integrity with a new twist on automated testing: pipeline tests. Pipeline tests are applied to data (instead of code) and at batch time (instead of compile or deploy time).
+Great Expectations helps teams save time and promote analytic integrity by offering a unique approach to automated testing: pipeline tests. Pipeline tests are applied to data (instead of code) and at batch time (instead of compile or deploy time). Pipeline tests are like unit tests for datasets: they help you guard against upstream data changes and monitor data quality.
 
 Software developers have long known that automated testing is essential for managing complex codebases. Great Expectations brings the same discipline, confidence, and acceleration to data science and engineering teams.
 
@@ -35,7 +35,6 @@ How do I get started?
 
 It's easy! Just use pip install:
 
-
     $ pip install great_expectations
 
 You can also clone the repository, which includes examples of using great_expectations.
@@ -76,7 +75,7 @@ For quick, hands-on introductions to Great Expectations' key features, check out
 What's the best way to get in touch with the Great Expectations team?
 --------------------------------------------------------------------------------
 
-[Issues on GitHub](https://github.com/great-expectations/great_expectations/issues). If you have questions, comments, feature requests, etc., [opening an issue](https://github.com/great-expectations/great_expectations/issues/new) is definitely the best path forward.
+If you have questions, comments, feature requests, etc., [opening an issue](https://github.com/great-expectations/great_expectations/issues/new) is definitely the best path forward. We also have a slack channel: if you emal us at <team@greatexpectations.io> with the subject line "SLACK" we'll get you an invite.
 
 
 Great Expectations doesn't do X. Is it right for my use case?

diff --git a/docs/source/autoinspection.rst b/docs/source/autoinspection.rst
@@ -5,7 +5,7 @@ Autoinspection
 ================================================================================
 
 It can be very convenient to have great expectations automatically review a \
-dataset and suggest expectations that may be appropriate. Currently, there \
+dataset and suggest expectations that may be appropriate. Currently, there's \
 a very basic, but easily extensible, autoinspection capability available.
 
 Dataset objects have an `autoinspect` method which allows you to provide a \

diff --git a/docs/source/data_asset_module.rst b/docs/source/data_asset_module.rst
@@ -33,7 +33,6 @@ great_expectations.data_asset.file_data_asset
         :undoc-members:
         :show-inheritance:
 
-
 great_expectations.data_asset.util
 ----------------------------------
 

diff --git a/docs/source/dataset_module.rst b/docs/source/dataset_module.rst
@@ -3,10 +3,10 @@
 Dataset Module
 ==================================
 
-great_expectations.dataset.base
--------------------------------
+great_expectations.dataset.dataset
+----------------------------------
 
-.. automodule:: great_expectations.dataset.base
+.. automodule:: great_expectations.dataset.dataset
     :members:
     :undoc-members:
     :show-inheritance:
@@ -62,7 +62,7 @@ great_expectations.dataset.util
 
 
 great_expectations.dataset.autoinspect
--------------------------------
+--------------------------------------
 
 .. automodule:: great_expectations.dataset.autoinspect
     :members:

diff --git a/docs/source/expectations.rst b/docs/source/expectations.rst
@@ -139,8 +139,9 @@ Types of Expectations
 ------------------------------------------------------------------------------
 
 Under the hood, great_expectations evaluates similar kinds of expectations using standard logic, including:
+
 * `column_map_expectations`, which apply their condition to each value in a column independently of other values
 * `column_aggregate_expectations`, which apply their condition to an aggregate value or values from the column
 
 In general, if a column is empty, a column_map_expectation will return True (vacuously), whereas a column_aggregate_expectation will return False (since no aggregate value could be computed).
-Adding an expectation about element counts to a set of expectations is usually therefore very important to ensure the overall set of expectations captures the full set of constraints you expect.
+Adding an expectation about element counts to a set of expectations is usually therefore very important to ensure the overall set of expectations captures the full set of constraints you expect.
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -39,6 +39,7 @@ Module Docs
 .. toctree::
    :maxdepth: 2
 
+   data_asset_module
    dataset_module
    data_context_module
 

diff --git a/docs/source/result_format.rst b/docs/source/result_format.rst
@@ -9,7 +9,7 @@ The `result_format` parameter may be either a string or a dictionary which speci
   - For dictionary usage, `result_format` which may include the following keys:
 
     - :ref:`result_format`: Sets the fields to return in result.
-    - partial_unexpected_count: Sets the number of results to include in partial_unexpected_count, if applicable.
+    - partial_unexpected_count: Sets the number of results to include in partial_unexpected_count, if applicable. If set to 0, this will suppress the unexpected counts.
 
 
 `result_format`
@@ -31,7 +31,7 @@ including interactive exploratory work and automatic validation.
 +---------------------------------------+----------------+----------------+----------------+----------------+
 |    details (dictionary)               |Defined on a per-expectation basis                                 |
 +---------------------------------------+----------------+----------------+----------------+----------------+
-| Fields defined for `column_map_expectation` type expectations:                                       |
+| Fields defined for `column_map_expectation` type expectations:                                            |
 +---------------------------------------+----------------+----------------+----------------+----------------+
 |    unexpected_count                   |no              |yes             |yes             |yes             |
 +---------------------------------------+----------------+----------------+----------------+----------------+
@@ -49,7 +49,7 @@ including interactive exploratory work and automatic validation.
 +---------------------------------------+----------------+----------------+----------------+----------------+
 |    unexpected_list                    |no              |no              |no              |yes             |
 +---------------------------------------+----------------+----------------+----------------+----------------+
-| Fields defined for `column_aggregate_expectation` type expectations:                                 |
+| Fields defined for `column_aggregate_expectation` type expectations:                                      |
 +---------------------------------------+----------------+----------------+----------------+----------------+
 |    observed_value                     |no              |yes             |yes             |yes             |
 +---------------------------------------+----------------+----------------+----------------+----------------+

diff --git a/docs/source/roadmap_changelog.rst b/docs/source/roadmap_changelog.rst
@@ -12,9 +12,33 @@ Planned Features
 * Real-time/streaming and adaption of distributional expectations
 
 
-v.0.4.5__develop
+v.0.5.1__develop
 ----------------
+
+
+v.0.5.1
+---------------
+* Fix issue where no result_format available for expect_column_values_to_be_null caused error
+* Use vectorized computation in pandas (#443, #445; thanks @RoyalTS)
+
+
+v.0.5.0
+----------------
+* Restructured class hierarchy to have a more generic DataAsset parent that maintains expectation logic separate \
+    from the tabular organization of Dataset expectations
+* Added new FileDataAsset and associated expectations (#416 thanks @anhollis)
 * Added support for date/datetime type columns in some SQLAlchemy expectations (#413)
+* Added support for a multicolumn expectation, expect multicolumn values to be unique (#408)
+* Optimization: You can now disable `partial_unexpected_counts` by setting the \
+    `partial_unexpected_count` value to 0 in the result_format argument, and we do not compute it when it would
+    not be returned. (#431, thanks @eugmandel)
+* Fix: Correct error in unexpected_percent computations for sqlalchemy when unexpected values exceed limit (#424)
+* Fix: Pass meta object to expectation result (#415, thanks @jseeman)
+* Add support for multicolumn expectations, with `expect_multicolumn_values_to_be_unique` as an example (#406)
+* Add dataset class to from_pandas to simplify using custom datasets (#404, thanks @jtilly)
+* Add schema support for sqlalchemy data context (#410, thanks @rahulj51)
+* Minor documentation, warning, and testing improvements (thanks @zdog).
+
 
 v.0.4.5
 ----------------

diff --git a/great_expectations/cli.py b/great_expectations/cli.py
@@ -2,11 +2,14 @@
 import sys
 import os
 import argparse
+import logging
 
 from great_expectations import read_csv
 from great_expectations import __version__
-from great_expectations.dataset import PandasDataset
+from great_expectations.dataset import Dataset, PandasDataset
+from great_expectations.data_asset import FileDataAsset
 
+logger = logging.getLogger(__name__)
 
 def dispatch(args):
     parser = argparse.ArgumentParser(
@@ -69,6 +72,7 @@ def validate(parsed_args):
     else:
         evaluation_parameters = None
 
+    # Use a custom dataasset module and class if provided. Otherwise infer from the config.
     if parsed_args["custom_dataset_module"]:
         sys.path.insert(0, os.path.dirname(
             parsed_args["custom_dataset_module"]))
@@ -77,14 +81,27 @@ def validate(parsed_args):
         custom_module = __import__(module_name)
         dataset_class = getattr(
             custom_module, parsed_args["custom_dataset_class"])
-
+    elif "data_asset_type" in expectations_config:
+        if expectations_config["data_asset_type"] == "Dataset" or expectations_config["data_asset_type"] == "PandasDataset":
+            dataset_class = PandasDataset
+        elif expectations_config["data_asset_type"].endswith("Dataset"):
+            logger.info("Using PandasDataset to validate dataset of type %s." % expectations_config["data_asset_type"])
+            dataset_class = PandasDataset
+        elif expectations_config["data_asset_type"] == "FileDataAsset":
+            dataset_class = FileDataAsset
+        else:
+            logger.critical("Unrecognized data_asset_type %s. You may need to specifcy custom_dataset_module and custom_dataset_class." % expectations_config["data_asset_type"])
+            return -1
     else:
         dataset_class = PandasDataset
 
-    df = read_csv(data_set, expectations_config=expectations_config,
-                  dataset_class=dataset_class)
+    if issubclass(dataset_class, Dataset):
+        da = read_csv(data_set, expectations_config=expectations_config,
+                    dataset_class=dataset_class)
+    else:
+        da = dataset_class(data_set, config=expectations_config)
 
-    result = df.validate(
+    result = da.validate(
         evaluation_parameters=evaluation_parameters,
         result_format=parsed_args["result_format"],
         catch_exceptions=parsed_args["catch_exceptions"],
@@ -103,6 +120,11 @@ def version(parsed_args):
 
 
 def main():
+    handler = logging.StreamHandler()
+    formatter = logging.Formatter('%(asctime)s %(name)-12s %(levelname)-8s %(message)s')
+    handler.setFormatter(formatter)
+    logger.addHandler(handler)
+    logger.setLevel(logging.INFO)
     return_value = dispatch(sys.argv[1:])
     sys.exit(return_value)