Skip to content

Commit

Permalink
Merge pull request #417 from great-expectations/LAS-NCSU-FileDataAsset
Browse files Browse the repository at this point in the history
Las ncsu file data asset
  • Loading branch information
jcampbell committed Apr 4, 2019
2 parents 60fe351 + 945845a commit 103b962
Show file tree
Hide file tree
Showing 36 changed files with 2,817 additions and 2,034 deletions.
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
# Documentation build
docs/source/_build

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
Expand Down Expand Up @@ -25,6 +28,7 @@ wheels/
*.egg-info/
.installed.cfg
*.egg
.vscode/

# PyInstaller
# Usually these files are written by a python script from a template
Expand Down
7 changes: 4 additions & 3 deletions docs/source/custom_expectations.rst
Original file line number Diff line number Diff line change
Expand Up @@ -97,11 +97,12 @@ This is more complicated, since you have to handle all the logic of additional p

.. code-block:: bash
from great_expectations.dataset import Dataset, PandasDataset
from great_expectations.data_asset import DataAsset
from great_expectations.dataset import PandasDataset
class CustomPandasDataset(PandasDataset):
@Dataset.expectation(["column", "mostly"])
@DataAsset.expectation(["column", "mostly"])
def expect_column_values_to_equal_1(self, column, mostly=None):
not_null = self[column].notnull()
Expand Down Expand Up @@ -137,7 +138,7 @@ For rapid prototyping, you can use the following syntax to quickly iterate on th
.. code-block:: bash
>> Dataset.test_expectation_function(my_func)
>> DataAsset.test_expectation_function(my_func)
>> Dataset.test_column_map_expectation_function(my_map_func, column='my_column')
Expand Down
43 changes: 43 additions & 0 deletions docs/source/data_asset_module.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
.. _data_asset_module:

Data Asset Module
==================================

great_expectations.data_asset.base
----------------------------------

.. automodule:: great_expectations.data_asset.base
:members:
:undoc-members:
:show-inheritance:
:exclude-members: DataAsset

.. autoclass:: great_expectations.data_asset.base.DataAsset

great_expectations.data_asset.file_data_asset
---------------------------------------------

.. automodule:: great_expectations.data_asset.file_data_asset
:members:
:undoc-members:
:show-inheritance:
:exclude-members: FileDataAsset

.. autoclass:: great_expectations.data_asset.file_data_asset.MetaFileDataAsset
:members:
:undoc-members:
:show-inheritance:

.. autoclass:: great_expectations.data_asset.file_data_asset.FileDataAsset
:members:
:undoc-members:
:show-inheritance:


great_expectations.data_asset.util
----------------------------------

.. automodule:: great_expectations.dataset.util
:members:
:undoc-members:
:show-inheritance:
99 changes: 59 additions & 40 deletions docs/source/glossary.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,74 +4,93 @@
Glossary of Expectations
================================================================================

Dataset
-------
Dataset objects model tabular data and include expectations with row and column semantics. Many Dataset expectations
are implemented using column_map_expectation and column_aggregate_expectation decorators.


Table shape
--------------------------------------------------------------------------------

* :func:`expect_column_to_exist <great_expectations.dataset.base.Dataset.expect_column_to_exist>`
* :func:`expect_table_columns_to_match_ordered_list <great_expectations.dataset.base.Dataset.expect_table_columns_to_match_ordered_list>`
* :func:`expect_table_row_count_to_be_between <great_expectations.dataset.base.Dataset.expect_table_row_count_to_be_between>`
* :func:`expect_table_row_count_to_equal <great_expectations.dataset.base.Dataset.expect_table_row_count_to_equal>`
* :func:`expect_column_to_exist <great_expectations.dataset.dataset.Dataset.expect_column_to_exist>`
* :func:`expect_table_columns_to_match_ordered_list <great_expectations.dataset.dataset.Dataset.expect_table_columns_to_match_ordered_list>`
* :func:`expect_table_row_count_to_be_between <great_expectations.dataset.dataset.Dataset.expect_table_row_count_to_be_between>`
* :func:`expect_table_row_count_to_equal <great_expectations.dataset.dataset.Dataset.expect_table_row_count_to_equal>`

Missing values, unique values, and types
--------------------------------------------------------------------------------

* :func:`expect_column_values_to_be_unique <great_expectations.dataset.base.Dataset.expect_column_values_to_be_unique>`
* :func:`expect_column_values_to_not_be_null <great_expectations.dataset.base.Dataset.expect_column_values_to_not_be_null>`
* :func:`expect_column_values_to_be_null <great_expectations.dataset.base.Dataset.expect_column_values_to_be_null>`
* :func:`expect_column_values_to_be_of_type <great_expectations.dataset.base.Dataset.expect_column_values_to_be_of_type>`
* :func:`expect_column_values_to_be_in_type_list <great_expectations.dataset.base.Dataset.expect_column_values_to_be_in_type_list>`
* :func:`expect_column_values_to_be_unique <great_expectations.dataset.dataset.Dataset.expect_column_values_to_be_unique>`
* :func:`expect_column_values_to_not_be_null <great_expectations.dataset.dataset.Dataset.expect_column_values_to_not_be_null>`
* :func:`expect_column_values_to_be_null <great_expectations.dataset.dataset.Dataset.expect_column_values_to_be_null>`
* :func:`expect_column_values_to_be_of_type <great_expectations.dataset.dataset.Dataset.expect_column_values_to_be_of_type>`
* :func:`expect_column_values_to_be_in_type_list <great_expectations.dataset.dataset.Dataset.expect_column_values_to_be_in_type_list>`

Sets and ranges
--------------------------------------------------------------------------------

* :func:`expect_column_values_to_be_in_set <great_expectations.dataset.base.Dataset.expect_column_values_to_be_in_set>`
* :func:`expect_column_values_to_not_be_in_set <great_expectations.dataset.base.Dataset.expect_column_values_to_not_be_in_set>`
* :func:`expect_column_values_to_be_between <great_expectations.dataset.base.Dataset.expect_column_values_to_be_between>`
* :func:`expect_column_values_to_be_increasing <great_expectations.dataset.base.Dataset.expect_column_values_to_be_increasing>`
* :func:`expect_column_values_to_be_decreasing <great_expectations.dataset.base.Dataset.expect_column_values_to_be_decreasing>`
* :func:`expect_column_values_to_be_in_set <great_expectations.dataset.dataset.Dataset.expect_column_values_to_be_in_set>`
* :func:`expect_column_values_to_not_be_in_set <great_expectations.dataset.dataset.Dataset.expect_column_values_to_not_be_in_set>`
* :func:`expect_column_values_to_be_between <great_expectations.dataset.dataset.Dataset.expect_column_values_to_be_between>`
* :func:`expect_column_values_to_be_increasing <great_expectations.dataset.dataset.Dataset.expect_column_values_to_be_increasing>`
* :func:`expect_column_values_to_be_decreasing <great_expectations.dataset.dataset.Dataset.expect_column_values_to_be_decreasing>`


String matching
--------------------------------------------------------------------------------

* :func:`expect_column_value_lengths_to_be_between <great_expectations.dataset.base.Dataset.expect_column_value_lengths_to_be_between>`
* :func:`expect_column_value_lengths_to_equal <great_expectations.dataset.base.Dataset.expect_column_value_lengths_to_equal>`
* :func:`expect_column_values_to_match_regex <great_expectations.dataset.base.Dataset.expect_column_values_to_match_regex>`
* :func:`expect_column_values_to_not_match_regex <great_expectations.dataset.base.Dataset.expect_column_values_to_not_match_regex>`
* :func:`expect_column_values_to_match_regex_list <great_expectations.dataset.base.Dataset.expect_column_values_to_match_regex_list>`
* :func:`expect_column_values_to_not_match_regex_list <great_expectations.dataset.base.Dataset.expect_column_values_to_not_match_regex_list>`
* :func:`expect_column_value_lengths_to_be_between <great_expectations.dataset.dataset.Dataset.expect_column_value_lengths_to_be_between>`
* :func:`expect_column_value_lengths_to_equal <great_expectations.dataset.dataset.Dataset.expect_column_value_lengths_to_equal>`
* :func:`expect_column_values_to_match_regex <great_expectations.dataset.dataset.Dataset.expect_column_values_to_match_regex>`
* :func:`expect_column_values_to_not_match_regex <great_expectations.dataset.dataset.Dataset.expect_column_values_to_not_match_regex>`
* :func:`expect_column_values_to_match_regex_list <great_expectations.dataset.dataset.Dataset.expect_column_values_to_match_regex_list>`
* :func:`expect_column_values_to_not_match_regex_list <great_expectations.dataset.dataset.Dataset.expect_column_values_to_not_match_regex_list>`

Datetime and JSON parsing
--------------------------------------------------------------------------------

* :func:`expect_column_values_to_match_strftime_format <great_expectations.dataset.base.Dataset.expect_column_values_to_match_strftime_format>`
* :func:`expect_column_values_to_be_dateutil_parseable <great_expectations.dataset.base.Dataset.expect_column_values_to_be_dateutil_parseable>`
* :func:`expect_column_values_to_be_json_parseable <great_expectations.dataset.base.Dataset.expect_column_values_to_be_json_parseable>`
* :func:`expect_column_values_to_match_json_schema <great_expectations.dataset.base.Dataset.expect_column_values_to_match_json_schema>`
* :func:`expect_column_values_to_match_strftime_format <great_expectations.dataset.dataset.Dataset.expect_column_values_to_match_strftime_format>`
* :func:`expect_column_values_to_be_dateutil_parseable <great_expectations.dataset.dataset.Dataset.expect_column_values_to_be_dateutil_parseable>`
* :func:`expect_column_values_to_be_json_parseable <great_expectations.dataset.dataset.Dataset.expect_column_values_to_be_json_parseable>`
* :func:`expect_column_values_to_match_json_schema <great_expectations.dataset.dataset.Dataset.expect_column_values_to_match_json_schema>`

Aggregate functions
--------------------------------------------------------------------------------

* :func:`expect_column_mean_to_be_between <great_expectations.dataset.base.Dataset.expect_column_mean_to_be_between>`
* :func:`expect_column_median_to_be_between <great_expectations.dataset.base.Dataset.expect_column_median_to_be_between>`
* :func:`expect_column_stdev_to_be_between <great_expectations.dataset.base.Dataset.expect_column_stdev_to_be_between>`
* :func:`expect_column_unique_value_count_to_be_between <great_expectations.dataset.base.Dataset.expect_column_unique_value_count_to_be_between>`
* :func:`expect_column_proportion_of_unique_values_to_be_between <great_expectations.dataset.base.Dataset.expect_column_proportion_of_unique_values_to_be_between>`
* :func:`expect_column_most_common_value_to_be_in_set <great_expectations.dataset.base.Dataset.expect_column_most_common_value_to_be_in_set>`
* :func:`expect_column_max_to_be_between <great_expectations.dataset.base.Dataset.expect_column_max_to_be_between>`
* :func:`expect_column_min_to_be_between <great_expectations.dataset.base.Dataset.expect_column_min_to_be_between>`
* :func:`expect_column_sum_to_be_between <great_expectations.dataset.base.Dataset.expect_column_sum_to_be_between>`
* :func:`expect_column_mean_to_be_between <great_expectations.dataset.dataset.Dataset.expect_column_mean_to_be_between>`
* :func:`expect_column_median_to_be_between <great_expectations.dataset.dataset.Dataset.expect_column_median_to_be_between>`
* :func:`expect_column_stdev_to_be_between <great_expectations.dataset.dataset.Dataset.expect_column_stdev_to_be_between>`
* :func:`expect_column_unique_value_count_to_be_between <great_expectations.dataset.dataset.Dataset.expect_column_unique_value_count_to_be_between>`
* :func:`expect_column_proportion_of_unique_values_to_be_between <great_expectations.dataset.dataset.Dataset.expect_column_proportion_of_unique_values_to_be_between>`
* :func:`expect_column_most_common_value_to_be_in_set <great_expectations.dataset.dataset.Dataset.expect_column_most_common_value_to_be_in_set>`
* :func:`expect_column_max_to_be_between <great_expectations.dataset.dataset.Dataset.expect_column_max_to_be_between>`
* :func:`expect_column_min_to_be_between <great_expectations.dataset.dataset.Dataset.expect_column_min_to_be_between>`
* :func:`expect_column_sum_to_be_between <great_expectations.dataset.dataset.Dataset.expect_column_sum_to_be_between>`

Column pairs
--------------------------------------------------------------------------------
* :func:`expect_column_pair_values_A_to_be_greater_than_B <great_expectations.dataset.base.Dataset.expect_column_pair_values_A_to_be_greater_than_B>`
* :func:`expect_column_pair_values_to_be_equal <great_expectations.dataset.base.Dataset.expect_column_pair_values_to_be_equal>`
* :func:`expect_column_pair_values_to_be_in_set <great_expectations.dataset.base.Dataset.expect_column_pair_values_to_be_in_set>`
* :func:`expect_column_pair_values_A_to_be_greater_than_B <great_expectations.dataset.dataset.Dataset.expect_column_pair_values_A_to_be_greater_than_B>`
* :func:`expect_column_pair_values_to_be_equal <great_expectations.dataset.dataset.Dataset.expect_column_pair_values_to_be_equal>`
* :func:`expect_column_pair_values_to_be_in_set <great_expectations.dataset.dataset.Dataset.expect_column_pair_values_to_be_in_set>`

Distributional functions
--------------------------------------------------------------------------------

* :func:`expect_column_kl_divergence_to_be_less_than <great_expectations.dataset.base.Dataset.expect_column_kl_divergence_to_be_less_than>`
* :func:`expect_column_bootstrapped_ks_test_p_value_to_be_greater_than <great_expectations.dataset.base.Dataset.expect_column_bootstrapped_ks_test_p_value_to_be_greater_than>`
* :func:`expect_column_chisquare_test_p_value_to_be_greater_than <great_expectations.dataset.base.Dataset.expect_column_chisquare_test_p_value_to_be_greater_than>`
* :func:`expect_column_parameterized_distribution_ks_test_p_value_to_be_greater_than <great_expectations.dataset.base.Dataset.expect_column_parameterized_distribution_ks_test_p_value_to_be_greater_than>`
* :func:`expect_column_kl_divergence_to_be_less_than <great_expectations.dataset.dataset.Dataset.expect_column_kl_divergence_to_be_less_than>`
* :func:`expect_column_bootstrapped_ks_test_p_value_to_be_greater_than <great_expectations.dataset.dataset.Dataset.expect_column_bootstrapped_ks_test_p_value_to_be_greater_than>`
* :func:`expect_column_chisquare_test_p_value_to_be_greater_than <great_expectations.dataset.dataset.Dataset.expect_column_chisquare_test_p_value_to_be_greater_than>`
* :func:`expect_column_parameterized_distribution_ks_test_p_value_to_be_greater_than <great_expectations.dataset.dataset.Dataset.expect_column_parameterized_distribution_ks_test_p_value_to_be_greater_than>`

FileDataAsset
-------------

File data assets reason at the file level, and the line level (for text data).

* :func:`expect_file_line_regex_match_count_to_be_between <great_expectations.data_asset.file_data_asset.FileDataAsset.expect_file_line_regex_match_count_to_be_between>`
* :func:`expect_file_line_regex_match_count_to_equal <great_expectations.data_asset.file_data_asset.FileDataAsset.expect_file_line_regex_match_count_to_equal>`
* :func:`expect_file_hash_to_equal <great_expectations.data_asset.file_data_asset.FileDataAsset.expect_file_hash_to_equal>`
* :func:`expect_file_size_to_be_between <great_expectations.data_asset.file_data_asset.FileDataAsset.expect_file_size_to_be_between>`
* :func:`expect_file_to_exist <great_expectations.data_asset.file_data_asset.FileDataAsset.expect_file_to_exist>`
* :func:`expect_file_to_have_valid_table_header <great_expectations.data_asset.file_data_asset.FileDataAsset.expect_file_to_have_valid_table_header>`
* :func:`expect_file_to_be_valid_json <great_expectations.data_asset.file_data_asset.FileDataAsset.expect_file_to_be_valid_json>`
3 changes: 1 addition & 2 deletions great_expectations/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
from .util import *
from great_expectations import dataset
from great_expectations import data_asset
from great_expectations.data_context import get_data_context
from .file_expectations import *

from .version import __version__
2 changes: 2 additions & 0 deletions great_expectations/data_asset/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
from .base import DataAsset
from .file_data_asset import FileDataAsset
Loading

0 comments on commit 103b962

Please sign in to comment.