Extension package for dbt inspired by the Great Expectations package for Python. The intent is to allow dbt users to deploy GE-like tests in their data warehouse directly from dbt, vs having to add another integration with their data warehouse.
Include in packages.yml
packages:
- git: "https://github.com/calogica/dbt-expectations.git"
revision: <for latest release, see https://github.com/calogica/dbt-expectations/releases>
For latest release, see https://github.com/calogica/dbt-expectations/releases
This package includes a reference to dbt-date which in turn references dbt-utils so there's no need to also import dbt-utils in your local project.
The following variables need to be defined in your dbt_project.yml
file:
vars:
'dbt_date:time_zone': 'America/Los_Angeles'
You may specify any valid timezone string in place of America/Los_Angeles
.
For example, use America/New_York
for East Coast Time.
This project contains integration tests for all test macros in a separate integration_tests
dbt project contained in this repo.
To run the tests:
- You will need a profile called
integration_tests
in~/.dbt/profiles.yml
pointing to a writable database. - Then, from within the
integration_tests
folder, rundbt seed
to loaddata_test.csv
to thetest
schema of your database. - Then run
dbt test
to run the tests specified inintegration_tests/models/schema_tests/schema.yml
Expect the specified column to exist.
Usage:
tests:
- dbt_expectations.expect_column_to_exist
Expect the number of columns in a model to be between two values.
tests:
- dbt_expectations.expect_table_column_count_to_be_between:
minimum: 1
maximum: 4
Expect the number of columns in a model to match another model.
tests:
- dbt_expectations.expect_table_column_count_to_equal_other_table:
compare_model: ref("other_model")
Expect the number of columns in a model to be equal to expected_number_of_columns
.
tests:
- dbt_expectations.expect_table_column_count_to_equal:
expected_number_of_columns: 7
Expect the columns to exactly match a specified list.
tests:
- dbt_expectations.expect_table_columns_to_match_ordered_list:
ordered_column_list: ["col_a", "col_b"]
Expect the columns in a model to match a given list.
tests:
- dbt_expectations.expect_table_columns_to_match_set:
column_list: ["col_a", "col_b"]
Expect the number of rows in a model to be between two values.
tests:
- dbt_expectations.expect_table_row_count_to_be_between:
minimum: 1
maximum: 4
Expect the number of rows in a model match another model.
tests:
- dbt_expectations.expect_table_row_count_to_equal_other_table:
compare_model: ref("other_model")
Expect the number of rows in a model to be equal to expected_number_of_rows
.
tests:
- dbt_expectations.expect_table_row_count_to_equal_other_table:
expected_number_of_rows: 4
Expect each column value to be unique.
tests:
- dbt_expectations.expect_column_values_to_be_unique
Expect column values to not be null.
tests:
- dbt_expectations.expect_column_values_to_not_be_null
Expect column values to be null.
tests:
- dbt_expectations.expect_column_values_to_be_null
Expect a column to contain values of a specified data type.
tests:
- dbt_expectations.expect_column_values_to_be_of_type:
column_type: date
Expect a column to contain values from a specified type list.
tests:
- dbt_expectations.expect_column_values_to_be_in_type_list:
column_type_list: [date, datetime]
Expect each column value to be in a given set.
tests:
- dbt_expectations.expect_column_values_to_be_in_set:
values: ['a','b','c']
Expect each column value to be between two values.
tests:
- dbt_expectations.expect_column_values_to_be_between:
minimum: 0
maximum: 10
Expect each column value not to be in a given set.
tests:
- dbt_expectations.expect_column_values_to_not_be_in_set:
values: ['e','f','g']
Expect column values to be increasing.
If strictly=True, then this expectation is only satisfied if each consecutive value is strictly increasing–equal values are treated as failures.
tests:
- dbt_expectations.expect_column_values_to_be_increasing:
sort_column: date_day
Expect column values to be decreasing.
If strictly=True, then this expectation is only satisfied if each consecutive value is strictly increasing–equal values are treated as failures.
tests:
- dbt_expectations.expect_column_values_to_be_decreasing:
sort_column: col_numeric_a
strictly: false
Expect column entries to be strings with length between a minimum value and a maximum value (inclusive).
tests:
- dbt_expectations.expect_column_value_lengths_to_be_between:
minimum_length: 1
maximum_length: 4
Expect column entries to be strings with length equal to the provided value.
tests:
- dbt_expectations.expect_column_value_lengths_to_equal:
length: 10
Expect the set of distinct column values to be contained by a given set.
tests:
- dbt_expectations.expect_column_distinct_values_to_be_in_set:
values: ['a','b','c','d']
Expect the set of distinct column values to contain a given set.
In contrast to expect_column_values_to_be_in_set
this ensures not that all column values are members of the given set but that values from the set must be present in the column.
tests:
- dbt_expectations.expect_column_distinct_values_to_contain_set:
values: ['a','b']
Expect the set of distinct column values to equal a given set.
In contrast to expect_column_distinct_values_to_contain_set
this ensures not only that a certain set of values are present in the column but that these and only these values are present.
tests:
- dbt_expectations.expect_column_distinct_values_to_equal_set:
values: ['a','b','c']
Expect the column mean to be between a minimum value and a maximum value (inclusive).
tests:
- dbt_expectations.expect_column_mean_to_be_between:
minimum: 0
maximum: 2
Expect the column median to be between a minimum value and a maximum value (inclusive).
tests:
- dbt_expectations.expect_column_median_to_be_between:
minimum: 0
maximum: 2
Expect specific provided column quantiles to be between provided minimum and maximum values.
tests:
- dbt_expectations.expect_column_quantile_values_to_be_between:
quantile: .95
minimum: 0
maximum: 2
Expect the column standard deviation to be between a minimum value and a maximum value. Uses sample standard deviation (normalized by N-1).
tests:
- dbt_expectations.expect_column_stdev_to_be_between:
minimum: 0
maximum: 2
Expect the number of unique values to be between a minimum value and a maximum value.
tests:
- dbt_expectations.expect_column_unique_value_count_to_be_between:
minimum: 3
maximum: 3
Expect the proportion of unique values to be between a minimum value and a maximum value.
For example, in a column containing [1, 2, 2, 3, 3, 3, 4, 4, 4, 4], there are 4 unique values and 10 total values for a proportion of 0.4.
tests:
- dbt_expectations.expect_column_proportion_of_unique_values_to_be_between:
minimum: 0
maximum: .4
Expect the most common value to be within the designated value set
tests:
- dbt_expectations.expect_column_most_common_value_to_be_in_set:
values: [0.5]
top_n: 1
Expect the column max to be between a min and max value
tests:
- dbt_expectations.expect_column_max_to_be_between:
minimum: 1
maximum: 1
Expect the column min to be between a min and max value
tests:
- dbt_expectations.expect_column_min_to_be_between:
minimum: 0
maximum: 1
Expect the column to sum to be between a min and max value
tests:
- dbt_expectations.expect_column_sum_to_be_between:
minimum: 1
maximum: 2
Expect values in column A to be greater than column B.
tests:
- dbt_expectations.expect_column_pair_values_A_to_be_greater_than_B:
column_A: col_numeric_a
column_B: col_numeric_a
or_equal: True
Expect the values in column A to be the same as column B.
tests:
- dbt_expectations.expect_column_pair_values_to_be_equal:
column_A: col_numeric_a
column_B: col_numeric_a
Expect paired values from columns A and B to belong to a set of valid pairs.
Note: value pairs are expressed as lists within lists
tests:
- dbt_expectations.expect_column_pair_values_to_be_in_set:
column_A: col_numeric_a
column_B: col_numeric_b
value_pairs_set: [[0, 1], [1, 0], [0.5, 0.5], [0.5, 0.5]]
Expect the values for each record to be unique across the columns listed. Note that records can be duplicated.
tests:
- dbt_expectations.expect_select_column_values_to_be_unique_within_record:
column_list: ["col_string_a", "col_string_b"]
ignore_row_if: "any_value_is_missing"
Expects that sum of all rows for a set of columns is equal to a specific value
tests:
- dbt_expectations.expect_multicolumn_sum_to_equal:
column_list: ["col_numeric_a", "col_numeric_b"]
sum_total: 4
Expect that the columns are unique together, e.g. a multi-column primary key.
tests:
- dbt_expectations.expect_compound_columns_to_be_unique:
column_list: ["date_col", "col_string_b"]
ignore_row_if: "any_value_is_missing"
Expects changes in metric values to be within Z sigma away from a moving average, taking the (optionally logged) differences of an aggregated metric value and comparing it to its value N days ago.
tests:
- dbt_expectations.expect_column_values_to_be_within_n_moving_stdevs:
group_by: date_day
lookback_days: 1
trend_days: 7
test_days: 14
sigma_threshold: 3
take_logs: true
Expects (optionally grouped & summed) metric values to be within Z sigma away from the column average
tests:
- dbt_expectations.expect_column_values_to_be_within_n_stdevs:
group_by: date_day
sigma_threshold: 3