Skip to content

Latest commit

 

History

History
536 lines (369 loc) · 16.4 KB

README.md

File metadata and controls

536 lines (369 loc) · 16.4 KB

dbt-expectations

Extension package for dbt inspired by the Great Expectations package for Python. The intent is to allow dbt users to deploy GE-like tests in their data warehouse directly from dbt, vs having to add another integration with their data warehouse.

Install

Include in packages.yml

packages:
  - git: "https://github.com/calogica/dbt-expectations.git"
    revision: <for latest release, see https://github.com/calogica/dbt-expectations/releases>

For latest release, see https://github.com/calogica/dbt-expectations/releases

Dependencies

This package includes a reference to dbt-date which in turn references dbt-utils so there's no need to also import dbt-utils in your local project.

Variables

The following variables need to be defined in your dbt_project.yml file:

vars:
  'dbt_date:time_zone': 'America/Los_Angeles'

You may specify any valid timezone string in place of America/Los_Angeles. For example, use America/New_York for East Coast Time.

Integration Tests

This project contains integration tests for all test macros in a separate integration_tests dbt project contained in this repo.

To run the tests:

  1. You will need a profile called integration_tests in ~/.dbt/profiles.yml pointing to a writable database.
  2. Then, from within the integration_tests folder, run dbt seed to load data_test.csv to the test schema of your database.
  3. Then run dbt test to run the tests specified in integration_tests/models/schema_tests/schema.yml

Macros

Table shape

Expect the specified column to exist.

Usage:

tests:
- dbt_expectations.expect_column_to_exist

Expect the number of columns in a model to be between two values.

tests:
  - dbt_expectations.expect_table_column_count_to_be_between:
      minimum: 1
      maximum: 4

Expect the number of columns in a model to match another model.

tests:
  - dbt_expectations.expect_table_column_count_to_equal_other_table:
      compare_model: ref("other_model")

Expect the number of columns in a model to be equal to expected_number_of_columns.

tests:
  - dbt_expectations.expect_table_column_count_to_equal:
      expected_number_of_columns: 7

Expect the columns to exactly match a specified list.

tests:
  - dbt_expectations.expect_table_columns_to_match_ordered_list:
      ordered_column_list: ["col_a", "col_b"]

Expect the columns in a model to match a given list.

tests:
  - dbt_expectations.expect_table_columns_to_match_set:
      column_list: ["col_a", "col_b"]

Expect the number of rows in a model to be between two values.

tests:
  - dbt_expectations.expect_table_row_count_to_be_between:
      minimum: 1
      maximum: 4

Expect the number of rows in a model match another model.

tests:
  - dbt_expectations.expect_table_row_count_to_equal_other_table:
      compare_model: ref("other_model")

Expect the number of rows in a model to be equal to expected_number_of_rows.

tests:
  - dbt_expectations.expect_table_row_count_to_equal_other_table:
      expected_number_of_rows: 4

Missing values, unique values, and types

Expect each column value to be unique.

tests:
  - dbt_expectations.expect_column_values_to_be_unique

Expect column values to not be null.

tests:
  - dbt_expectations.expect_column_values_to_not_be_null

Expect column values to be null.

tests:
  - dbt_expectations.expect_column_values_to_be_null

Expect a column to contain values of a specified data type.

tests:
  - dbt_expectations.expect_column_values_to_be_of_type:
      column_type: date

Expect a column to contain values from a specified type list.

tests:
  - dbt_expectations.expect_column_values_to_be_in_type_list:
      column_type_list: [date, datetime]

Sets and ranges

Expect each column value to be in a given set.

tests:
  - dbt_expectations.expect_column_values_to_be_in_set:
      values: ['a','b','c']

Expect each column value to be between two values.

tests:
  - dbt_expectations.expect_column_values_to_be_between:
      minimum: 0
      maximum: 10

Expect each column value not to be in a given set.

tests:
  - dbt_expectations.expect_column_values_to_not_be_in_set:
      values: ['e','f','g']

Expect column values to be increasing.

If strictly=True, then this expectation is only satisfied if each consecutive value is strictly increasing–equal values are treated as failures.

tests:
  - dbt_expectations.expect_column_values_to_be_increasing:
      sort_column: date_day

Expect column values to be decreasing.

If strictly=True, then this expectation is only satisfied if each consecutive value is strictly increasing–equal values are treated as failures.

tests:
  - dbt_expectations.expect_column_values_to_be_decreasing:
      sort_column: col_numeric_a
      strictly: false

String matching

Expect column entries to be strings with length between a minimum value and a maximum value (inclusive).

tests:
  - dbt_expectations.expect_column_value_lengths_to_be_between:
      minimum_length: 1
      maximum_length: 4

Expect column entries to be strings with length equal to the provided value.

tests:
  - dbt_expectations.expect_column_value_lengths_to_equal:
      length: 10

expect_column_values_to_match_regex

expect_column_values_to_not_match_regex

expect_column_values_to_match_regex_list

expect_column_values_to_not_match_regex_list

expect_column_values_to_match_like_pattern

expect_column_values_to_not_match_like_pattern

expect_column_values_to_match_like_pattern_list

expect_column_values_to_not_match_like_pattern_list

Aggregate functions

Expect the set of distinct column values to be contained by a given set.

tests:
  - dbt_expectations.expect_column_distinct_values_to_be_in_set:
      values: ['a','b','c','d']

Expect the set of distinct column values to contain a given set.

In contrast to expect_column_values_to_be_in_set this ensures not that all column values are members of the given set but that values from the set must be present in the column.

tests:
  - dbt_expectations.expect_column_distinct_values_to_contain_set:
      values: ['a','b']

Expect the set of distinct column values to equal a given set.

In contrast to expect_column_distinct_values_to_contain_set this ensures not only that a certain set of values are present in the column but that these and only these values are present.

tests:
  - dbt_expectations.expect_column_distinct_values_to_equal_set:
      values: ['a','b','c']

Expect the column mean to be between a minimum value and a maximum value (inclusive).

tests:
  - dbt_expectations.expect_column_mean_to_be_between:
      minimum: 0
      maximum: 2

Expect the column median to be between a minimum value and a maximum value (inclusive).

tests:
  - dbt_expectations.expect_column_median_to_be_between:
      minimum: 0
      maximum: 2

Expect specific provided column quantiles to be between provided minimum and maximum values.

tests:
  - dbt_expectations.expect_column_quantile_values_to_be_between:
      quantile: .95
      minimum: 0
      maximum: 2

Expect the column standard deviation to be between a minimum value and a maximum value. Uses sample standard deviation (normalized by N-1).

tests:
  - dbt_expectations.expect_column_stdev_to_be_between:
      minimum: 0
      maximum: 2

Expect the number of unique values to be between a minimum value and a maximum value.

tests:
  - dbt_expectations.expect_column_unique_value_count_to_be_between:
      minimum: 3
      maximum: 3

Expect the proportion of unique values to be between a minimum value and a maximum value.

For example, in a column containing [1, 2, 2, 3, 3, 3, 4, 4, 4, 4], there are 4 unique values and 10 total values for a proportion of 0.4.

tests:
  - dbt_expectations.expect_column_proportion_of_unique_values_to_be_between:
      minimum: 0
      maximum: .4

Expect the most common value to be within the designated value set

tests:
  - dbt_expectations.expect_column_most_common_value_to_be_in_set:
      values: [0.5]
      top_n: 1

Expect the column max to be between a min and max value

tests:
  - dbt_expectations.expect_column_max_to_be_between:
      minimum: 1
      maximum: 1

Expect the column min to be between a min and max value

tests:
  - dbt_expectations.expect_column_min_to_be_between:
      minimum: 0
      maximum: 1

Expect the column to sum to be between a min and max value

tests:
  - dbt_expectations.expect_column_sum_to_be_between:
      minimum: 1
      maximum: 2

Multi-column

Expect values in column A to be greater than column B.

tests:
  - dbt_expectations.expect_column_pair_values_A_to_be_greater_than_B:
      column_A: col_numeric_a
      column_B: col_numeric_a
      or_equal: True

Expect the values in column A to be the same as column B.

tests:
  - dbt_expectations.expect_column_pair_values_to_be_equal:
      column_A: col_numeric_a
      column_B: col_numeric_a

Expect paired values from columns A and B to belong to a set of valid pairs.

Note: value pairs are expressed as lists within lists

tests:
  - dbt_expectations.expect_column_pair_values_to_be_in_set:
      column_A: col_numeric_a
      column_B: col_numeric_b
      value_pairs_set: [[0, 1], [1, 0], [0.5, 0.5], [0.5, 0.5]]

Expect the values for each record to be unique across the columns listed. Note that records can be duplicated.

tests:
  - dbt_expectations.expect_select_column_values_to_be_unique_within_record:
      column_list: ["col_string_a", "col_string_b"]
      ignore_row_if: "any_value_is_missing"

Expects that sum of all rows for a set of columns is equal to a specific value

tests:
  - dbt_expectations.expect_multicolumn_sum_to_equal:
      column_list: ["col_numeric_a", "col_numeric_b"]
      sum_total: 4

Expect that the columns are unique together, e.g. a multi-column primary key.

tests:
  - dbt_expectations.expect_compound_columns_to_be_unique:
      column_list: ["date_col", "col_string_b"]
      ignore_row_if: "any_value_is_missing"

Distributional functions

Expects changes in metric values to be within Z sigma away from a moving average, taking the (optionally logged) differences of an aggregated metric value and comparing it to its value N days ago.

tests:
  - dbt_expectations.expect_column_values_to_be_within_n_moving_stdevs:
      group_by: date_day
      lookback_days: 1
      trend_days: 7
      test_days: 14
      sigma_threshold: 3
      take_logs: true

Expects (optionally grouped & summed) metric values to be within Z sigma away from the column average

tests:
  - dbt_expectations.expect_column_values_to_be_within_n_stdevs:
      group_by: date_day
      sigma_threshold: 3