Merge pull request #470 from great-expectations/feature/docs

Doc updates to match spark release
great-expectations · May 24, 2019 · 7d2b34d · 7d2b34d
2 parents 2ed5fcd + 7af0828
commit 7d2b34d
Show file tree

Hide file tree

Showing 7 changed files with 44 additions and 22 deletions.
diff --git a/docs/source/data_contexts.rst b/docs/source/data_contexts.rst
@@ -4,7 +4,8 @@
 Data Contexts
 ================================================================================
 
-Data Contexts manage connections to Great Expectations Datasets.
+Data Contexts manage connections to Great Expectations Datasets. Note: data contexts
+will be changed significantly during the next release of GE.
 
 To get a data context, simply call `get_data_context()` on the ge object:
 
@@ -17,9 +18,11 @@ To get a data context, simply call `get_data_context()` on the ge object:
     >> sql_dataset = sql_context.get_dataset('table_name')
 
 
-There are currently two types of data contexts:
+There are currently four types of data contexts:
   - :ref:`PandasCSVDataContext`: The PandasCSVDataContext ('PandasCSV') exposes a local directory containing files as datasets.
   - :ref:`SqlAlchemyDataContext`: The SqlAlchemyDataContext ('SqlAlchemy') exposes tables from a SQL-compliant database as datasets.
+  - :ref:`SparkCSVDataContext`: The SparkCSVDataContext ('SparkCSV') exposes csv files accessible from a SparkSQL context.
+  - :ref:`DatabricksTableContext`: The DatabricksTableContext ('DatabricksTable') exposes tables from a databricks notebook.
 
 All data contexts expose the following methods:
   - list_datasets(): lists datasets available in current context
@@ -39,3 +42,20 @@ The `options` paramater for a PandasCSVDataContext is simply the glob pattern ma
 -----------------------
 
 The `options` parameter for a SqlAlchemyDataContext is the sqlalchemy connection string to connect to the database.
+
+
+.. _SparkCSVDataContext:
+
+`SparkCSVDataContext`
+---------------------
+
+The `options` parameter for a SparkCSVDataContext is a directory from which to read a CSV file, and options to pass to the reader.
+
+
+.. _DatabricksTableContext:
+
+`DatabricksTableContext`
+---------------------
+
+The `options` parameter for a _DatabricksTableContext is a dataase from which to expose tables; get_dataset optionally also accepts
+a date partition.
diff --git a/docs/source/dataset_module.rst b/docs/source/dataset_module.rst
@@ -29,7 +29,7 @@ great_expectations.dataset.pandas_dataset
         :members:
         :undoc-members:
         :show-inheritance:
-        :exclude-members: expect_column_to_exist, expect_table_columns_to_match_ordered_list, expect_table_row_count_to_be_between, expect_table_row_count_to_equal, expect_column_values_to_be_unique, expect_column_values_to_not_be_null, expect_column_values_to_be_null, expect_column_values_to_be_of_type, expect_column_values_to_be_in_type_list, expect_column_values_to_be_in_set, expect_column_max_to_be_between, expect_column_min_to_be_between, expect_column_sum_to_be_between, expect_column_values_to_not_be_in_set, expect_column_values_to_be_between, expect_column_values_to_be_increasing, expect_column_values_to_be_decreasing, expect_column_value_lengths_to_be_between, expect_column_value_lengths_to_equal, expect_column_values_to_match_regex, expect_column_values_to_not_match_regex, expect_column_values_to_match_regex_list, expect_column_values_to_match_strftime_format, expect_column_values_to_be_dateutil_parseable, expect_column_values_to_be_json_parseable, expect_column_values_to_match_json_schema, expect_column_mean_to_be_between, expect_column_median_to_be_between, expect_column_stdev_to_be_between, expect_column_unique_value_count_to_be_between, expect_column_proportion_of_unique_values_to_be_between, expect_column_most_common_value_to_be, expect_column_most_common_value_to_be_in_set, expect_column_chisquare_test_p_value_to_be_greater_than, expect_column_bootstrapped_ks_test_p_value_to_be_greater_than, expect_column_kl_divergence_to_be_less_than
+        :exclude-members: expect_column_to_exist, expect_table_columns_to_match_ordered_list, expect_table_row_count_to_be_between, expect_table_row_count_to_equal, expect_column_values_to_be_unique, expect_column_values_to_not_be_null, expect_column_values_to_be_null, expect_column_values_to_be_of_type, expect_column_values_to_be_in_type_list, expect_column_values_to_be_in_set, expect_column_max_to_be_between, expect_column_min_to_be_between, expect_column_sum_to_be_between, expect_column_values_to_not_be_in_set, expect_column_values_to_be_between, expect_column_values_to_be_increasing, expect_column_values_to_be_decreasing, expect_column_value_lengths_to_be_between, expect_column_value_lengths_to_equal, expect_column_values_to_match_regex, expect_column_values_to_not_match_regex, expect_column_values_to_match_regex_list, expect_column_values_to_match_strftime_format, expect_column_values_to_be_dateutil_parseable, expect_column_values_to_be_json_parseable, expect_column_values_to_match_json_schema, expect_column_distinct_values_to_contain_set, expect_column_distinct_values_to_equal_set, expect_column_mean_to_be_between, expect_column_median_to_be_between, expect_column_stdev_to_be_between, expect_column_unique_value_count_to_be_between, expect_column_proportion_of_unique_values_to_be_between, expect_column_most_common_value_to_be, expect_column_most_common_value_to_be_in_set, expect_column_chisquare_test_p_value_to_be_greater_than, expect_column_bootstrapped_ks_test_p_value_to_be_greater_than, expect_column_kl_divergence_to_be_less_than
 
 great_expectations.dataset.sqlalchemy_dataset
 ---------------------------------------------
@@ -49,7 +49,7 @@ great_expectations.dataset.sqlalchemy_dataset
         :members:
         :undoc-members:
         :show-inheritance:
-        :exclude-members: expect_column_to_exist, expect_table_columns_to_match_ordered_list, expect_table_row_count_to_be_between, expect_table_row_count_to_equal, expect_column_values_to_be_unique, expect_column_values_to_not_be_null, expect_column_values_to_be_null, expect_column_values_to_be_of_type, expect_column_values_to_be_in_type_list, expect_column_values_to_be_in_set, expect_column_max_to_be_between, expect_column_min_to_be_between, expect_column_sum_to_be_between, expect_column_values_to_not_be_in_set, expect_column_values_to_be_between, expect_column_values_to_be_increasing, expect_column_values_to_be_decreasing, expect_column_value_lengths_to_be_between, expect_column_value_lengths_to_equal, expect_column_values_to_match_regex, expect_column_values_to_not_match_regex, expect_column_values_to_match_regex_list, expect_column_values_to_match_strftime_format, expect_column_values_to_be_dateutil_parseable, expect_column_values_to_be_json_parseable, expect_column_values_to_match_json_schema, expect_column_mean_to_be_between, expect_column_median_to_be_between, expect_column_stdev_to_be_between, expect_column_unique_value_count_to_be_between, expect_column_proportion_of_unique_values_to_be_between, expect_column_most_common_value_to_be, expect_column_most_common_value_to_be_in_set, expect_column_chisquare_test_p_value_to_be_greater_than, expect_column_bootstrapped_ks_test_p_value_to_be_greater_than, expect_column_kl_divergence_to_be_less_than
+        :exclude-members: expect_column_to_exist, expect_table_columns_to_match_ordered_list, expect_table_row_count_to_be_between, expect_table_row_count_to_equal, expect_column_values_to_be_unique, expect_column_values_to_not_be_null, expect_column_values_to_be_null, expect_column_values_to_be_of_type, expect_column_values_to_be_in_type_list, expect_column_values_to_be_in_set, expect_column_max_to_be_between, expect_column_min_to_be_between, expect_column_sum_to_be_between, expect_column_values_to_not_be_in_set, expect_column_values_to_be_between, expect_column_values_to_be_increasing, expect_column_values_to_be_decreasing, expect_column_value_lengths_to_be_between, expect_column_value_lengths_to_equal, expect_column_values_to_match_regex, expect_column_values_to_not_match_regex, expect_column_values_to_match_regex_list, expect_column_values_to_match_strftime_format, expect_column_values_to_be_dateutil_parseable, expect_column_values_to_be_json_parseable, expect_column_values_to_match_json_schema, expect_column_distinct_values_to_contain_set, expect_column_distinct_values_to_equal_set, expect_column_mean_to_be_between, expect_column_median_to_be_between, expect_column_stdev_to_be_between, expect_column_unique_value_count_to_be_between, expect_column_proportion_of_unique_values_to_be_between, expect_column_most_common_value_to_be, expect_column_most_common_value_to_be_in_set, expect_column_chisquare_test_p_value_to_be_greater_than, expect_column_bootstrapped_ks_test_p_value_to_be_greater_than, expect_column_kl_divergence_to_be_less_than
 
 great_expectations.dataset.sparkdf_dataset
 ---------------------------------------------
@@ -69,7 +69,7 @@ great_expectations.dataset.sparkdf_dataset
         :members:
         :undoc-members:
         :show-inheritance:
-        :exclude-members: expect_column_to_exist, expect_table_columns_to_match_ordered_list, expect_table_row_count_to_be_between, expect_table_row_count_to_equal, expect_column_values_to_be_unique, expect_column_values_to_not_be_null, expect_column_values_to_be_null, expect_column_values_to_be_of_type, expect_column_values_to_be_in_type_list, expect_column_values_to_be_in_set, expect_column_max_to_be_between, expect_column_min_to_be_between, expect_column_sum_to_be_between, expect_column_values_to_not_be_in_set, expect_column_values_to_be_between, expect_column_values_to_be_increasing, expect_column_values_to_be_decreasing, expect_column_value_lengths_to_be_between, expect_column_value_lengths_to_equal, expect_column_values_to_match_regex, expect_column_values_to_not_match_regex, expect_column_values_to_match_regex_list, expect_column_values_to_match_strftime_format, expect_column_values_to_be_dateutil_parseable, expect_column_values_to_be_json_parseable, expect_column_values_to_match_json_schema, expect_column_mean_to_be_between, expect_column_median_to_be_between, expect_column_stdev_to_be_between, expect_column_unique_value_count_to_be_between, expect_column_proportion_of_unique_values_to_be_between, expect_column_most_common_value_to_be, expect_column_most_common_value_to_be_in_set, expect_column_chisquare_test_p_value_to_be_greater_than, expect_column_bootstrapped_ks_test_p_value_to_be_greater_than, expect_column_kl_divergence_to_be_less_than
+        :exclude-members: expect_column_to_exist, expect_table_columns_to_match_ordered_list, expect_table_row_count_to_be_between, expect_table_row_count_to_equal, expect_column_values_to_be_unique, expect_column_values_to_not_be_null, expect_column_values_to_be_null, expect_column_values_to_be_of_type, expect_column_values_to_be_in_type_list, expect_column_values_to_be_in_set, expect_column_max_to_be_between, expect_column_min_to_be_between, expect_column_sum_to_be_between, expect_column_values_to_not_be_in_set, expect_column_values_to_be_between, expect_column_values_to_be_increasing, expect_column_values_to_be_decreasing, expect_column_value_lengths_to_be_between, expect_column_value_lengths_to_equal, expect_column_values_to_match_regex, expect_column_values_to_not_match_regex, expect_column_values_to_match_regex_list, expect_column_values_to_match_strftime_format, expect_column_values_to_be_dateutil_parseable, expect_column_values_to_be_json_parseable, expect_column_values_to_match_json_schema, expect_column_distinct_values_to_contain_set, expect_column_distinct_values_to_equal_set, expect_column_mean_to_be_between, expect_column_median_to_be_between, expect_column_stdev_to_be_between, expect_column_unique_value_count_to_be_between, expect_column_proportion_of_unique_values_to_be_between, expect_column_most_common_value_to_be, expect_column_most_common_value_to_be_in_set, expect_column_chisquare_test_p_value_to_be_greater_than, expect_column_bootstrapped_ks_test_p_value_to_be_greater_than, expect_column_kl_divergence_to_be_less_than
 
 great_expectations.dataset.util
 -------------------------------

diff --git a/docs/source/glossary.rst b/docs/source/glossary.rst
@@ -58,6 +58,8 @@ Datetime and JSON parsing
 Aggregate functions
 --------------------------------------------------------------------------------
 
+* :func:`expect_column_distinct_values_to_contain_set <great_expectations.dataset.dataset.Dataset.expect_column_distinct_values_to_contain_set>`
+* :func:`expect_column_distinct_values_to_equal_set <great_expectations.dataset.dataset.Dataset.expect_column_distinct_values_to_equal_set>`
 * :func:`expect_column_mean_to_be_between <great_expectations.dataset.dataset.Dataset.expect_column_mean_to_be_between>`
 * :func:`expect_column_median_to_be_between <great_expectations.dataset.dataset.Dataset.expect_column_median_to_be_between>`
 * :func:`expect_column_stdev_to_be_between <great_expectations.dataset.dataset.Dataset.expect_column_stdev_to_be_between>`

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -1,7 +1,4 @@
-.. great_expectations documentation master file, created by
-   sphinx-quickstart on Thu Jun  8 23:00:19 2017.
-   You can adapt this file completely to your liking, but it should at least
-   contain the root `toctree` directive.
+.. great_expectations documentation master file
 
 .. image:: ../../generic_dickens_protagonist.png
    :align: right

diff --git a/docs/source/intro.rst b/docs/source/intro.rst
@@ -8,15 +8,15 @@ Introduction
 What is great\_expectations?
 ----------------------------
 
-Great Expectations is a framework that helps teams save time and promote
-analytic integrity with a new twist on automated testing: pipeline
-tests. Pipeline tests are applied to data (instead of code) and at batch
-time (instead of compile or deploy time).
+Great Expectations helps teams save time and promote analytic integrity by \
+offering a unique approach to automated testing: pipeline tests. Pipeline \
+tests are applied to data (instead of code) and at batch time (instead of \
+compile or deploy time). Pipeline tests are like unit tests for datasets: \
+they help you guard against upstream data changes and monitor data quality.
 
-Software developers have long known that automated testing is essential
-for managing complex codebases. Great Expectations brings the same
-discipline, confidence, and acceleration to data science and engineering
-teams.
+Software developers have long known that automated testing is essential for \
+managing complex codebases. Great Expectations brings the same discipline, \
+confidence, and acceleration to data science and engineering teams.
 
 Why would I use Great Expectations?
 -----------------------------------
@@ -99,6 +99,9 @@ If you have questions, comments, feature requests, etc., `opening an
 issue <https://github.com/great-expectations/great_expectations/issues/new>`__
 is definitely the best path forward.
 
+We also have a slack channel, which you can join here: https://tinyurl.com/great-expectations-slack
+
+
 Great Expectations doesn't do X. Is it right for my use case?
 -------------------------------------------------------------
 

diff --git a/docs/source/roadmap_changelog.rst b/docs/source/roadmap_changelog.rst
@@ -5,17 +5,17 @@ Changelog and Roadmap
 
 Planned Features
 ----------------
-* More expectation coverage in SqlAlchemyDataset
+* Improved project initialization and data contexts
 * Improved variable typing
-* New Datasets (e.g. Spark)
 * Support for non-tabular datasources (e.g. JSON, XML, AVRO)
-* Real-time/streaming and adaption of distributional expectations
 
 
-v.0.5.1__develop
+v.0.6.0__develop
 ----------------
 * Add support for SparkDFDataset (thanks @cselig)
 * Migrate distributional expectations to new testing framework
+* Add support for two new expectations: expect_column_distinct_values_to_contain_set 
+  and expect_column_distinct_values_to_equal_set (thanks @RoyalTS)
 BREAKING CHANGES:
 * Add caching mechanism for Datasets; \
   when enabled, GE assumes that dataset does not change between evaluation of individual expectations

diff --git a/great_expectations/version.py b/great_expectations/version.py
@@ -1 +1 @@
-__version__ = "0.5.1__develop"
+__version__ = "0.6.0__develop"