Skip to content

Commit

Permalink
Merge pull request #470 from great-expectations/feature/docs
Browse files Browse the repository at this point in the history
Doc updates to match spark release
  • Loading branch information
jcampbell committed May 24, 2019
2 parents 2ed5fcd + 7af0828 commit 7d2b34d
Show file tree
Hide file tree
Showing 7 changed files with 44 additions and 22 deletions.
24 changes: 22 additions & 2 deletions docs/source/data_contexts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@
Data Contexts
================================================================================

Data Contexts manage connections to Great Expectations Datasets.
Data Contexts manage connections to Great Expectations Datasets. Note: data contexts
will be changed significantly during the next release of GE.

To get a data context, simply call `get_data_context()` on the ge object:

Expand All @@ -17,9 +18,11 @@ To get a data context, simply call `get_data_context()` on the ge object:
>> sql_dataset = sql_context.get_dataset('table_name')
There are currently two types of data contexts:
There are currently four types of data contexts:
- :ref:`PandasCSVDataContext`: The PandasCSVDataContext ('PandasCSV') exposes a local directory containing files as datasets.
- :ref:`SqlAlchemyDataContext`: The SqlAlchemyDataContext ('SqlAlchemy') exposes tables from a SQL-compliant database as datasets.
- :ref:`SparkCSVDataContext`: The SparkCSVDataContext ('SparkCSV') exposes csv files accessible from a SparkSQL context.
- :ref:`DatabricksTableContext`: The DatabricksTableContext ('DatabricksTable') exposes tables from a databricks notebook.
All data contexts expose the following methods:
- list_datasets(): lists datasets available in current context
Expand All @@ -39,3 +42,20 @@ The `options` paramater for a PandasCSVDataContext is simply the glob pattern ma
-----------------------
The `options` parameter for a SqlAlchemyDataContext is the sqlalchemy connection string to connect to the database.
.. _SparkCSVDataContext:
`SparkCSVDataContext`
---------------------
The `options` parameter for a SparkCSVDataContext is a directory from which to read a CSV file, and options to pass to the reader.
.. _DatabricksTableContext:
`DatabricksTableContext`
---------------------
The `options` parameter for a _DatabricksTableContext is a dataase from which to expose tables; get_dataset optionally also accepts
a date partition.
6 changes: 3 additions & 3 deletions docs/source/dataset_module.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ great_expectations.dataset.pandas_dataset
:members:
:undoc-members:
:show-inheritance:
:exclude-members: expect_column_to_exist, expect_table_columns_to_match_ordered_list, expect_table_row_count_to_be_between, expect_table_row_count_to_equal, expect_column_values_to_be_unique, expect_column_values_to_not_be_null, expect_column_values_to_be_null, expect_column_values_to_be_of_type, expect_column_values_to_be_in_type_list, expect_column_values_to_be_in_set, expect_column_max_to_be_between, expect_column_min_to_be_between, expect_column_sum_to_be_between, expect_column_values_to_not_be_in_set, expect_column_values_to_be_between, expect_column_values_to_be_increasing, expect_column_values_to_be_decreasing, expect_column_value_lengths_to_be_between, expect_column_value_lengths_to_equal, expect_column_values_to_match_regex, expect_column_values_to_not_match_regex, expect_column_values_to_match_regex_list, expect_column_values_to_match_strftime_format, expect_column_values_to_be_dateutil_parseable, expect_column_values_to_be_json_parseable, expect_column_values_to_match_json_schema, expect_column_mean_to_be_between, expect_column_median_to_be_between, expect_column_stdev_to_be_between, expect_column_unique_value_count_to_be_between, expect_column_proportion_of_unique_values_to_be_between, expect_column_most_common_value_to_be, expect_column_most_common_value_to_be_in_set, expect_column_chisquare_test_p_value_to_be_greater_than, expect_column_bootstrapped_ks_test_p_value_to_be_greater_than, expect_column_kl_divergence_to_be_less_than
:exclude-members: expect_column_to_exist, expect_table_columns_to_match_ordered_list, expect_table_row_count_to_be_between, expect_table_row_count_to_equal, expect_column_values_to_be_unique, expect_column_values_to_not_be_null, expect_column_values_to_be_null, expect_column_values_to_be_of_type, expect_column_values_to_be_in_type_list, expect_column_values_to_be_in_set, expect_column_max_to_be_between, expect_column_min_to_be_between, expect_column_sum_to_be_between, expect_column_values_to_not_be_in_set, expect_column_values_to_be_between, expect_column_values_to_be_increasing, expect_column_values_to_be_decreasing, expect_column_value_lengths_to_be_between, expect_column_value_lengths_to_equal, expect_column_values_to_match_regex, expect_column_values_to_not_match_regex, expect_column_values_to_match_regex_list, expect_column_values_to_match_strftime_format, expect_column_values_to_be_dateutil_parseable, expect_column_values_to_be_json_parseable, expect_column_values_to_match_json_schema, expect_column_distinct_values_to_contain_set, expect_column_distinct_values_to_equal_set, expect_column_mean_to_be_between, expect_column_median_to_be_between, expect_column_stdev_to_be_between, expect_column_unique_value_count_to_be_between, expect_column_proportion_of_unique_values_to_be_between, expect_column_most_common_value_to_be, expect_column_most_common_value_to_be_in_set, expect_column_chisquare_test_p_value_to_be_greater_than, expect_column_bootstrapped_ks_test_p_value_to_be_greater_than, expect_column_kl_divergence_to_be_less_than

great_expectations.dataset.sqlalchemy_dataset
---------------------------------------------
Expand All @@ -49,7 +49,7 @@ great_expectations.dataset.sqlalchemy_dataset
:members:
:undoc-members:
:show-inheritance:
:exclude-members: expect_column_to_exist, expect_table_columns_to_match_ordered_list, expect_table_row_count_to_be_between, expect_table_row_count_to_equal, expect_column_values_to_be_unique, expect_column_values_to_not_be_null, expect_column_values_to_be_null, expect_column_values_to_be_of_type, expect_column_values_to_be_in_type_list, expect_column_values_to_be_in_set, expect_column_max_to_be_between, expect_column_min_to_be_between, expect_column_sum_to_be_between, expect_column_values_to_not_be_in_set, expect_column_values_to_be_between, expect_column_values_to_be_increasing, expect_column_values_to_be_decreasing, expect_column_value_lengths_to_be_between, expect_column_value_lengths_to_equal, expect_column_values_to_match_regex, expect_column_values_to_not_match_regex, expect_column_values_to_match_regex_list, expect_column_values_to_match_strftime_format, expect_column_values_to_be_dateutil_parseable, expect_column_values_to_be_json_parseable, expect_column_values_to_match_json_schema, expect_column_mean_to_be_between, expect_column_median_to_be_between, expect_column_stdev_to_be_between, expect_column_unique_value_count_to_be_between, expect_column_proportion_of_unique_values_to_be_between, expect_column_most_common_value_to_be, expect_column_most_common_value_to_be_in_set, expect_column_chisquare_test_p_value_to_be_greater_than, expect_column_bootstrapped_ks_test_p_value_to_be_greater_than, expect_column_kl_divergence_to_be_less_than
:exclude-members: expect_column_to_exist, expect_table_columns_to_match_ordered_list, expect_table_row_count_to_be_between, expect_table_row_count_to_equal, expect_column_values_to_be_unique, expect_column_values_to_not_be_null, expect_column_values_to_be_null, expect_column_values_to_be_of_type, expect_column_values_to_be_in_type_list, expect_column_values_to_be_in_set, expect_column_max_to_be_between, expect_column_min_to_be_between, expect_column_sum_to_be_between, expect_column_values_to_not_be_in_set, expect_column_values_to_be_between, expect_column_values_to_be_increasing, expect_column_values_to_be_decreasing, expect_column_value_lengths_to_be_between, expect_column_value_lengths_to_equal, expect_column_values_to_match_regex, expect_column_values_to_not_match_regex, expect_column_values_to_match_regex_list, expect_column_values_to_match_strftime_format, expect_column_values_to_be_dateutil_parseable, expect_column_values_to_be_json_parseable, expect_column_values_to_match_json_schema, expect_column_distinct_values_to_contain_set, expect_column_distinct_values_to_equal_set, expect_column_mean_to_be_between, expect_column_median_to_be_between, expect_column_stdev_to_be_between, expect_column_unique_value_count_to_be_between, expect_column_proportion_of_unique_values_to_be_between, expect_column_most_common_value_to_be, expect_column_most_common_value_to_be_in_set, expect_column_chisquare_test_p_value_to_be_greater_than, expect_column_bootstrapped_ks_test_p_value_to_be_greater_than, expect_column_kl_divergence_to_be_less_than

great_expectations.dataset.sparkdf_dataset
---------------------------------------------
Expand All @@ -69,7 +69,7 @@ great_expectations.dataset.sparkdf_dataset
:members:
:undoc-members:
:show-inheritance:
:exclude-members: expect_column_to_exist, expect_table_columns_to_match_ordered_list, expect_table_row_count_to_be_between, expect_table_row_count_to_equal, expect_column_values_to_be_unique, expect_column_values_to_not_be_null, expect_column_values_to_be_null, expect_column_values_to_be_of_type, expect_column_values_to_be_in_type_list, expect_column_values_to_be_in_set, expect_column_max_to_be_between, expect_column_min_to_be_between, expect_column_sum_to_be_between, expect_column_values_to_not_be_in_set, expect_column_values_to_be_between, expect_column_values_to_be_increasing, expect_column_values_to_be_decreasing, expect_column_value_lengths_to_be_between, expect_column_value_lengths_to_equal, expect_column_values_to_match_regex, expect_column_values_to_not_match_regex, expect_column_values_to_match_regex_list, expect_column_values_to_match_strftime_format, expect_column_values_to_be_dateutil_parseable, expect_column_values_to_be_json_parseable, expect_column_values_to_match_json_schema, expect_column_mean_to_be_between, expect_column_median_to_be_between, expect_column_stdev_to_be_between, expect_column_unique_value_count_to_be_between, expect_column_proportion_of_unique_values_to_be_between, expect_column_most_common_value_to_be, expect_column_most_common_value_to_be_in_set, expect_column_chisquare_test_p_value_to_be_greater_than, expect_column_bootstrapped_ks_test_p_value_to_be_greater_than, expect_column_kl_divergence_to_be_less_than
:exclude-members: expect_column_to_exist, expect_table_columns_to_match_ordered_list, expect_table_row_count_to_be_between, expect_table_row_count_to_equal, expect_column_values_to_be_unique, expect_column_values_to_not_be_null, expect_column_values_to_be_null, expect_column_values_to_be_of_type, expect_column_values_to_be_in_type_list, expect_column_values_to_be_in_set, expect_column_max_to_be_between, expect_column_min_to_be_between, expect_column_sum_to_be_between, expect_column_values_to_not_be_in_set, expect_column_values_to_be_between, expect_column_values_to_be_increasing, expect_column_values_to_be_decreasing, expect_column_value_lengths_to_be_between, expect_column_value_lengths_to_equal, expect_column_values_to_match_regex, expect_column_values_to_not_match_regex, expect_column_values_to_match_regex_list, expect_column_values_to_match_strftime_format, expect_column_values_to_be_dateutil_parseable, expect_column_values_to_be_json_parseable, expect_column_values_to_match_json_schema, expect_column_distinct_values_to_contain_set, expect_column_distinct_values_to_equal_set, expect_column_mean_to_be_between, expect_column_median_to_be_between, expect_column_stdev_to_be_between, expect_column_unique_value_count_to_be_between, expect_column_proportion_of_unique_values_to_be_between, expect_column_most_common_value_to_be, expect_column_most_common_value_to_be_in_set, expect_column_chisquare_test_p_value_to_be_greater_than, expect_column_bootstrapped_ks_test_p_value_to_be_greater_than, expect_column_kl_divergence_to_be_less_than

great_expectations.dataset.util
-------------------------------
Expand Down
2 changes: 2 additions & 0 deletions docs/source/glossary.rst
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,8 @@ Datetime and JSON parsing
Aggregate functions
--------------------------------------------------------------------------------

* :func:`expect_column_distinct_values_to_contain_set <great_expectations.dataset.dataset.Dataset.expect_column_distinct_values_to_contain_set>`
* :func:`expect_column_distinct_values_to_equal_set <great_expectations.dataset.dataset.Dataset.expect_column_distinct_values_to_equal_set>`
* :func:`expect_column_mean_to_be_between <great_expectations.dataset.dataset.Dataset.expect_column_mean_to_be_between>`
* :func:`expect_column_median_to_be_between <great_expectations.dataset.dataset.Dataset.expect_column_median_to_be_between>`
* :func:`expect_column_stdev_to_be_between <great_expectations.dataset.dataset.Dataset.expect_column_stdev_to_be_between>`
Expand Down
5 changes: 1 addition & 4 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,4 @@
.. great_expectations documentation master file, created by
sphinx-quickstart on Thu Jun 8 23:00:19 2017.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
.. great_expectations documentation master file
.. image:: ../../generic_dickens_protagonist.png
:align: right
Expand Down
19 changes: 11 additions & 8 deletions docs/source/intro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,15 @@ Introduction
What is great\_expectations?
----------------------------

Great Expectations is a framework that helps teams save time and promote
analytic integrity with a new twist on automated testing: pipeline
tests. Pipeline tests are applied to data (instead of code) and at batch
time (instead of compile or deploy time).
Great Expectations helps teams save time and promote analytic integrity by \
offering a unique approach to automated testing: pipeline tests. Pipeline \
tests are applied to data (instead of code) and at batch time (instead of \
compile or deploy time). Pipeline tests are like unit tests for datasets: \
they help you guard against upstream data changes and monitor data quality.

Software developers have long known that automated testing is essential
for managing complex codebases. Great Expectations brings the same
discipline, confidence, and acceleration to data science and engineering
teams.
Software developers have long known that automated testing is essential for \
managing complex codebases. Great Expectations brings the same discipline, \
confidence, and acceleration to data science and engineering teams.

Why would I use Great Expectations?
-----------------------------------
Expand Down Expand Up @@ -99,6 +99,9 @@ If you have questions, comments, feature requests, etc., `opening an
issue <https://github.com/great-expectations/great_expectations/issues/new>`__
is definitely the best path forward.

We also have a slack channel, which you can join here: https://tinyurl.com/great-expectations-slack


Great Expectations doesn't do X. Is it right for my use case?
-------------------------------------------------------------

Expand Down
8 changes: 4 additions & 4 deletions docs/source/roadmap_changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,17 @@ Changelog and Roadmap

Planned Features
----------------
* More expectation coverage in SqlAlchemyDataset
* Improved project initialization and data contexts
* Improved variable typing
* New Datasets (e.g. Spark)
* Support for non-tabular datasources (e.g. JSON, XML, AVRO)
* Real-time/streaming and adaption of distributional expectations


v.0.5.1__develop
v.0.6.0__develop
----------------
* Add support for SparkDFDataset (thanks @cselig)
* Migrate distributional expectations to new testing framework
* Add support for two new expectations: expect_column_distinct_values_to_contain_set
and expect_column_distinct_values_to_equal_set (thanks @RoyalTS)
BREAKING CHANGES:
* Add caching mechanism for Datasets; \
when enabled, GE assumes that dataset does not change between evaluation of individual expectations
Expand Down
2 changes: 1 addition & 1 deletion great_expectations/version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "0.5.1__develop"
__version__ = "0.6.0__develop"

0 comments on commit 7d2b34d

Please sign in to comment.