Skip to content

Commit

Permalink
Merge 4d0e59f into d746536
Browse files Browse the repository at this point in the history
  • Loading branch information
jcampbell committed May 13, 2019
2 parents d746536 + 4d0e59f commit 99cf079
Show file tree
Hide file tree
Showing 61 changed files with 3,662 additions and 6,435 deletions.
4 changes: 4 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,12 @@ env:
- PANDAS=0.21.0
- PANDAS=0.23.4
- PANDAS=latest
# jdk:
# - openjdk8
services:
- postgresql
# before_install:
# - ./run-spark.sh
install:
- pip install --only-binary=numpy,scipy numpy scipy
- if [ "$PANDAS" = "latest" ]; then pip install pandas; else pip install pandas==$PANDAS; fi
Expand Down
5 changes: 2 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Great Expectations
What is great_expectations?
--------------------------------------------------------------------------------

Great Expectations is a framework that helps teams save time and promote analytic integrity with a new twist on automated testing: pipeline tests. Pipeline tests are applied to data (instead of code) and at batch time (instead of compile or deploy time).
Great Expectations helps teams save time and promote analytic integrity by offering a unique approach to automated testing: pipeline tests. Pipeline tests are applied to data (instead of code) and at batch time (instead of compile or deploy time). Pipeline tests are like unit tests for datasets: they help you guard against upstream data changes and monitor data quality.

Software developers have long known that automated testing is essential for managing complex codebases. Great Expectations brings the same discipline, confidence, and acceleration to data science and engineering teams.

Expand All @@ -35,7 +35,6 @@ How do I get started?

It's easy! Just use pip install:


$ pip install great_expectations

You can also clone the repository, which includes examples of using great_expectations.
Expand Down Expand Up @@ -76,7 +75,7 @@ For quick, hands-on introductions to Great Expectations' key features, check out
What's the best way to get in touch with the Great Expectations team?
--------------------------------------------------------------------------------

[Issues on GitHub](https://github.com/great-expectations/great_expectations/issues). If you have questions, comments, feature requests, etc., [opening an issue](https://github.com/great-expectations/great_expectations/issues/new) is definitely the best path forward.
If you have questions, comments, feature requests, etc., [opening an issue](https://github.com/great-expectations/great_expectations/issues/new) is definitely the best path forward. We also have a slack channel: if you emal us at <team@greatexpectations.io> with the subject line "SLACK" we'll get you an invite.


Great Expectations doesn't do X. Is it right for my use case?
Expand Down
2 changes: 1 addition & 1 deletion docs/source/autoinspection.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Autoinspection
================================================================================

It can be very convenient to have great expectations automatically review a \
dataset and suggest expectations that may be appropriate. Currently, there \
dataset and suggest expectations that may be appropriate. Currently, there's \
a very basic, but easily extensible, autoinspection capability available.

Dataset objects have an `autoinspect` method which allows you to provide a \
Expand Down
1 change: 0 additions & 1 deletion docs/source/data_asset_module.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,6 @@ great_expectations.data_asset.file_data_asset
:undoc-members:
:show-inheritance:


great_expectations.data_asset.util
----------------------------------

Expand Down
8 changes: 4 additions & 4 deletions docs/source/dataset_module.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@
Dataset Module
==================================

great_expectations.dataset.base
-------------------------------
great_expectations.dataset.dataset
----------------------------------

.. automodule:: great_expectations.dataset.base
.. automodule:: great_expectations.dataset.dataset
:members:
:undoc-members:
:show-inheritance:
Expand Down Expand Up @@ -62,7 +62,7 @@ great_expectations.dataset.util


great_expectations.dataset.autoinspect
-------------------------------
--------------------------------------

.. automodule:: great_expectations.dataset.autoinspect
:members:
Expand Down
3 changes: 2 additions & 1 deletion docs/source/expectations.rst
Original file line number Diff line number Diff line change
Expand Up @@ -139,8 +139,9 @@ Types of Expectations
------------------------------------------------------------------------------
Under the hood, great_expectations evaluates similar kinds of expectations using standard logic, including:
* `column_map_expectations`, which apply their condition to each value in a column independently of other values
* `column_aggregate_expectations`, which apply their condition to an aggregate value or values from the column
In general, if a column is empty, a column_map_expectation will return True (vacuously), whereas a column_aggregate_expectation will return False (since no aggregate value could be computed).
Adding an expectation about element counts to a set of expectations is usually therefore very important to ensure the overall set of expectations captures the full set of constraints you expect.
Adding an expectation about element counts to a set of expectations is usually therefore very important to ensure the overall set of expectations captures the full set of constraints you expect.
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ Module Docs
.. toctree::
:maxdepth: 2

data_asset_module
dataset_module
data_context_module

Expand Down
6 changes: 3 additions & 3 deletions docs/source/result_format.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ The `result_format` parameter may be either a string or a dictionary which speci
- For dictionary usage, `result_format` which may include the following keys:

- :ref:`result_format`: Sets the fields to return in result.
- partial_unexpected_count: Sets the number of results to include in partial_unexpected_count, if applicable.
- partial_unexpected_count: Sets the number of results to include in partial_unexpected_count, if applicable. If set to 0, this will suppress the unexpected counts.


`result_format`
Expand All @@ -31,7 +31,7 @@ including interactive exploratory work and automatic validation.
+---------------------------------------+----------------+----------------+----------------+----------------+
| details (dictionary) |Defined on a per-expectation basis |
+---------------------------------------+----------------+----------------+----------------+----------------+
| Fields defined for `column_map_expectation` type expectations: |
| Fields defined for `column_map_expectation` type expectations: |
+---------------------------------------+----------------+----------------+----------------+----------------+
| unexpected_count |no |yes |yes |yes |
+---------------------------------------+----------------+----------------+----------------+----------------+
Expand All @@ -49,7 +49,7 @@ including interactive exploratory work and automatic validation.
+---------------------------------------+----------------+----------------+----------------+----------------+
| unexpected_list |no |no |no |yes |
+---------------------------------------+----------------+----------------+----------------+----------------+
| Fields defined for `column_aggregate_expectation` type expectations: |
| Fields defined for `column_aggregate_expectation` type expectations: |
+---------------------------------------+----------------+----------------+----------------+----------------+
| observed_value |no |yes |yes |yes |
+---------------------------------------+----------------+----------------+----------------+----------------+
Expand Down
26 changes: 25 additions & 1 deletion docs/source/roadmap_changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,33 @@ Planned Features
* Real-time/streaming and adaption of distributional expectations


v.0.4.5__develop
v.0.5.1__develop
----------------


v.0.5.1
---------------
* Fix issue where no result_format available for expect_column_values_to_be_null caused error
* Use vectorized computation in pandas (#443, #445; thanks @RoyalTS)


v.0.5.0
----------------
* Restructured class hierarchy to have a more generic DataAsset parent that maintains expectation logic separate \
from the tabular organization of Dataset expectations
* Added new FileDataAsset and associated expectations (#416 thanks @anhollis)
* Added support for date/datetime type columns in some SQLAlchemy expectations (#413)
* Added support for a multicolumn expectation, expect multicolumn values to be unique (#408)
* Optimization: You can now disable `partial_unexpected_counts` by setting the \
`partial_unexpected_count` value to 0 in the result_format argument, and we do not compute it when it would
not be returned. (#431, thanks @eugmandel)
* Fix: Correct error in unexpected_percent computations for sqlalchemy when unexpected values exceed limit (#424)
* Fix: Pass meta object to expectation result (#415, thanks @jseeman)
* Add support for multicolumn expectations, with `expect_multicolumn_values_to_be_unique` as an example (#406)
* Add dataset class to from_pandas to simplify using custom datasets (#404, thanks @jtilly)
* Add schema support for sqlalchemy data context (#410, thanks @rahulj51)
* Minor documentation, warning, and testing improvements (thanks @zdog).


v.0.4.5
----------------
Expand Down
32 changes: 27 additions & 5 deletions great_expectations/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,14 @@
import sys
import os
import argparse
import logging

from great_expectations import read_csv
from great_expectations import __version__
from great_expectations.dataset import PandasDataset
from great_expectations.dataset import Dataset, PandasDataset
from great_expectations.data_asset import FileDataAsset

logger = logging.getLogger(__name__)

def dispatch(args):
parser = argparse.ArgumentParser(
Expand Down Expand Up @@ -69,6 +72,7 @@ def validate(parsed_args):
else:
evaluation_parameters = None

# Use a custom dataasset module and class if provided. Otherwise infer from the config.
if parsed_args["custom_dataset_module"]:
sys.path.insert(0, os.path.dirname(
parsed_args["custom_dataset_module"]))
Expand All @@ -77,14 +81,27 @@ def validate(parsed_args):
custom_module = __import__(module_name)
dataset_class = getattr(
custom_module, parsed_args["custom_dataset_class"])

elif "data_asset_type" in expectations_config:
if expectations_config["data_asset_type"] == "Dataset" or expectations_config["data_asset_type"] == "PandasDataset":
dataset_class = PandasDataset
elif expectations_config["data_asset_type"].endswith("Dataset"):
logger.info("Using PandasDataset to validate dataset of type %s." % expectations_config["data_asset_type"])
dataset_class = PandasDataset
elif expectations_config["data_asset_type"] == "FileDataAsset":
dataset_class = FileDataAsset
else:
logger.critical("Unrecognized data_asset_type %s. You may need to specifcy custom_dataset_module and custom_dataset_class." % expectations_config["data_asset_type"])
return -1
else:
dataset_class = PandasDataset

df = read_csv(data_set, expectations_config=expectations_config,
dataset_class=dataset_class)
if issubclass(dataset_class, Dataset):
da = read_csv(data_set, expectations_config=expectations_config,
dataset_class=dataset_class)
else:
da = dataset_class(data_set, config=expectations_config)

result = df.validate(
result = da.validate(
evaluation_parameters=evaluation_parameters,
result_format=parsed_args["result_format"],
catch_exceptions=parsed_args["catch_exceptions"],
Expand All @@ -103,6 +120,11 @@ def version(parsed_args):


def main():
handler = logging.StreamHandler()
formatter = logging.Formatter('%(asctime)s %(name)-12s %(levelname)-8s %(message)s')
handler.setFormatter(formatter)
logger.addHandler(handler)
logger.setLevel(logging.INFO)
return_value = dispatch(sys.argv[1:])
sys.exit(return_value)

Expand Down

0 comments on commit 99cf079

Please sign in to comment.