Skip to content

Commit

Permalink
Sync fork with original repository (#15)
Browse files Browse the repository at this point in the history
* support count_true and count_false for boolean columns in BigQuery

* support count_true and count_false for boolean columns in BigQuery

* fix: change generate profile args macro name for Athena

* Add unique_combination_of_columns to common_tests_configs_mapping

* missing comma

* fix drop_failure_percent_threshold failing a non anomalous test

* Support all anomaly vars on all configuration levels

* ELE-2470 temp tables are not being deleted

* Add empty line at the end of a filre

* fix typo

* Removed default detection/training_period

* Make sure we delete temp tables last

* removed unused import

* Not collecting metrics by default.

* release 0.14.0

* fix bug when no temp tables xist

* Update macros/edr/tests/test_utils/clean_elementary_test_tables.sql

Co-authored-by: IDoneShaveIt <48473443+IDoneShaveIt@users.noreply.github.com>

* 1. add tags to all elementary monitors 2. only run tests if elementary is enabled

* rename tag

* Create clean_dbt_columns_temp_tables macro

* Add empty line at the end of a file

* clean logs

* add arg chunk_size for all insert_rows() (elementary-data#669)

* release 0.14.1

* override primary_test_model_id (elementary-data#671)

* added ignore_small_changes to freshness and event_freshness

* improvement: bigquery specific for query_table_metrics (elementary-data#674)

* improvement: bigquery specific for query_table_metrics

Using information schema to get row count is much more performant than doing a full table scan

* use TABLE_STORAGE and add database & schema

* add empty case

* add set

* Add index on created_at test_result_rows and remove backfill post hook

* Ele 2606 package version with caching and extra logs (elementary-data#673)

* artifacts: use cache also for model post-hook

* add performance logs to artifacts logic

* duration monitoring - bugfix - handle the case the duration stack is not initialized

* Change the aggregate of failed_row_count_calc to count(*)

* Readme updates (elementary-data#684)

* changes to readme

* changes to readme

* changes

* changes

* image url

* image url

* changes

* formating

* formating

* changes

* link

* pre commit

* improve flattening performance for dbt_columns (elementary-data#681)

* improve flattening performance for dbt_columns

* removed unused const

* black

* Add get_requires_permissions and validate_required_permissions macros

* Improved messages

* Fixed default__get_required_permissions + add target.database to get_relevant_databases

---------

Co-authored-by: suelai <suela.isaj@gmail.com>
Co-authored-by: Roman Korsun <romakorsun2000@gmail.com>
Co-authored-by: Yasuhisa Yoshida <syou6162@gmail.com>
Co-authored-by: Ofek Weiss <55920061+ofek1weiss@users.noreply.github.com>
Co-authored-by: Ofek Weiss <ofek1weiss@gmail.com>
Co-authored-by: IDoneShaveIt <idanshavit31@gmail.com>
Co-authored-by: IDoneShaveIt <48473443+IDoneShaveIt@users.noreply.github.com>
Co-authored-by: Elon Gliksberg <elongliks@gmail.com>
Co-authored-by: GitHub Actions <noreply@github.com>
Co-authored-by: Ella Katz <ella@elementary-data.com>
Co-authored-by: J.C <zhang@sansan.com>
Co-authored-by: Noy Arie <noyarie1992@gmail.com>
Co-authored-by: Chris Dong <86695140+dongchris@users.noreply.github.com>
Co-authored-by: noakurman <kurman.noa@gmail.com>
Co-authored-by: Itamar Hartstein <haritamar@gmail.com>
Co-authored-by: Maayan Salom <maayansalom@gmail.com>
  • Loading branch information
17 people committed Apr 9, 2024
1 parent fad76af commit fb111d4
Show file tree
Hide file tree
Showing 57 changed files with 763 additions and 139 deletions.
71 changes: 41 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,27 +2,39 @@
<img alt="Logo" src="https://raw.githubusercontent.com/elementary-data/elementary/master/static/github_banner.png"/ width="1000">
</p>

<h2 align="center">
dbt native data observability for analytics & data engineers
</h2>
<h4 align="center">
Monitor your data quality, operation and performance directly from your dbt project.
</h4>
# [dbt native data observability](https://docs.elementary-data.com/introduction)

<p align="center">
<a href="https://join.slack.com/t/elementary-community/shared_invite/zt-uehfrq2f-zXeVTtXrjYRbdE_V6xq4Rg"><img src="https://img.shields.io/badge/join-Slack-ff69b4"/></a>
<a href="https://docs.elementary-data.com/quickstart"><img src="https://img.shields.io/badge/docs-quickstart-orange"/></a>
<img alt="License" src="https://img.shields.io/badge/license-Apache--2.0-ff69b4"/>
<img alt="Downloads" src="https://static.pepy.tech/personalized-badge/elementary-lineage?period=total&units=international_system&left_color=grey&right_color=orange"&left_text=Downloads"/>
</p>

## What is Elementary?

This dbt package is part of Elementary, the dbt-native data observability solution for data and analytics engineers.
Set up in minutes, gain immediate visibility, detect data issues, send actionable alerts, and understand impact and root cause.
Available as self-hosted or Cloud service with premium features.

#### Table of Contents

- [Quick start - dbt package](#quick-start---dbt-package)
- [Get more out of Elementary](#get-more-out-of-elementary-dbt-package)
- [Run results and dbt artifacts](#run-results-and-dbt-artifacts)
- [Data anomaly detection as dbt tests](#data-anomaly-detection-as-dbt-tests)
- [How Elementary works?](#how-elementary-works)
- [Community & Support](#community--support)
- [Contribution](#contributions)

## Quick start
## Quick start - dbt package

1. Add to your `packages.yml`:

```yml packages.yml
packages:
- package: elementary-data/elementary
version: 0.13.2
version: 0.14.1
## Docs: https://docs.elementary-data.com
```

Expand All @@ -40,11 +52,22 @@ models:

4. Run `dbt run --select elementary`

Check out the [full documentation](https://docs.elementary-data.com/) for generating the UI, alerts and adding anomaly detection tests.
Check out the [full documentation](https://docs.elementary-data.com/).

## Get more out of Elementary dbt package

Elementary has 3 offerings: This dbt package, Elementary Community (OSS) and Elementary (cloud service).

- **dbt package**
- For basic data monitoring and dbt artifacts collection, Elementary offers a dbt package. The package adds logging, artifacts uploading, and Elementary tests (anomaly detection and schema) to your project.
- **Elementary Community**
- An open-source CLI tool you can deploy and orchestrate to send alerts and self-host the Elementary report. Best for data and analytics engineers that require basic observability capabilities or for evaluating features without vendor approval. Our community can provide great support on [Slack](https://www.elementary-data.com/community) if needed.
- **Elementary Cloud**
- Ideal for teams monitoring mission-critical data pipelines, requiring guaranteed uptime and reliability, short-time-to-value, advanced features, collaboration, and professional support. The solution is secure by design, and requires no access to your data from cloud. To learn more, [book a demo](https://cal.com/maayansa/elementary-intro-github-package) or [start a trial](https://www.elementary-data.com/signup).

## Run Results and dbt artifacts

The package automatically uploads the dbt artifacts and run results to tables in your data warehouse:
The package automatically uploads dbt artifacts and run results to tables in your data warehouse:

Run results tables:

Expand All @@ -65,7 +88,7 @@ Metadata tables:

Here you can find [additional details about the tables](https://docs.elementary-data.com/guides/modules-overview/dbt-package).

## Data anomalies detection as dbt tests
## Data anomaly detection as dbt tests

Elementary dbt tests collect metrics and metadata over time, such as freshness, volume, schema changes, distribution, cardinality, etc.
Executed as any other dbt tests, the Elementary tests alert on anomalies and outliers.
Expand All @@ -85,29 +108,17 @@ models:
- elementary.all_columns_anomalies
```

## Data observability report

<kbd align="center">
<a href="https://storage.googleapis.com/elementary_static/elementary_demo.html"><img align="center" style="max-width:300px;" src="https://raw.githubusercontent.com/elementary-data/elementary/master/static/report_ui.gif"> </a>
</kbd>

## Slack alerts

<img alt="UI" src="https://raw.githubusercontent.com/elementary-data/elementary/master/static/Slack_alert_elementary.png" width="600">
Read about the available [Elementary tests and configuration](https://docs.elementary-data.com/data-tests/introduction).

## How it works?
## How Elementary works?

Elementary dbt package creates tables of metadata and test results in your data warehouse, as part of your dbt runs. The [CLI tool](https://github.com/elementary-data/elementary) reads the data from these tables, and is used to generate the UI and alerts.
Elementary dbt package creates tables of metadata and test results in your data warehouse, as part of your dbt runs.

<img align="center" style="max-width:300px;" src="https://raw.githubusercontent.com/elementary-data/elementary/master/static/how_elementary_works.png">
The cloud service or the CLI tool read the data from these tables, send alerts and present the results in the UI.

## Data warehouse support

- [x] **Snowflake** ![](https://raw.githubusercontent.com/elementary-data/elementary/master/static/snowflake-16.png)
- [x] **BigQuery** ![](https://raw.githubusercontent.com/elementary-data/elementary/master/static/bigquery-16.svg)
- [x] **Redshift** ![](https://raw.githubusercontent.com/elementary-data/elementary/master/static/redshift-16.png)
- [x] **Databricks SQL** ![](https://raw.githubusercontent.com/elementary-data/elementary/master/static/databricks-16.png)
- [x] **Postgres** ![](https://raw.githubusercontent.com/elementary-data/elementary/master/static/postgres-16.png)
<kbd align="center">
<a href="https://storage.googleapis.com/elementary_static/elementary_demo.html"><img align="center" style="max-width:300px;" src="https://raw.githubusercontent.com/elementary-data/elementary/master/static/report_ui.gif"> </a>
</kbd>

## Community & Support

Expand Down
2 changes: 1 addition & 1 deletion dbt_project.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: "elementary"
version: "0.13.2"
version: "0.14.1"

require-dbt-version: [">=1.0.0", "<2.0.0"]

Expand Down
22 changes: 22 additions & 0 deletions integration_tests/dbt_project/macros/get_anomaly_config.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
{% macro get_anomaly_config(model_config, config) %}
{% set mock_model = {
"alias": "mock_model",
"config": {
"elementary": model_config
}
} %}
{# trick elementary into thinking this is the running model #}
{% do context.update({
"model": {
"depends_on": {
"nodes": ["id"]
}
},
"graph": {
"nodes": {
"id": mock_model
}
}
}) %}
{% do return(elementary.get_anomalies_test_configuration(api.Relation.create("db", "schema", "mock_model"), **config)[0]) %}
{% endmacro %}
11 changes: 9 additions & 2 deletions integration_tests/tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@

def pytest_addoption(parser):
parser.addoption("--target", action="store", default="postgres")
parser.addoption("--skip-init", action="store_true", default=False)


@pytest.fixture(scope="session")
Expand All @@ -31,8 +32,9 @@ def project_dir_copy():


@pytest.fixture(scope="session", autouse=True)
def init_tests_env(target, project_dir_copy: str):
env.init(target, project_dir_copy)
def init_tests_env(target, skip_init, project_dir_copy: str):
if not skip_init:
env.init(target, project_dir_copy)


@pytest.fixture(autouse=True)
Expand Down Expand Up @@ -75,6 +77,11 @@ def target(request) -> str:
return request.config.getoption("--target")


@pytest.fixture(scope="session")
def skip_init(request) -> str:
return request.config.getoption("--skip-init")


@pytest.fixture
def test_id(request) -> str:
if request.cls:
Expand Down
126 changes: 126 additions & 0 deletions integration_tests/tests/test_anomaly_test_configuration.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
import json
from dataclasses import dataclass
from typing import Generic, Literal, TypeVar

from dbt_project import DbtProject
from parametrization import Parametrization

T = TypeVar("T")


@dataclass
class ParamValues(Generic[T]):
vars: T
model: T
test: T


PARAM_VALUES = {
"timestamp_column": ParamValues(
"vars.updated_at", "model.updated_at", "test.updated_at"
),
"where_expression": ParamValues(
"where = 'var'", "where = 'model'", "where = 'test'"
),
"anomaly_sensitivity": ParamValues(1, 2, 3),
"anomaly_direction": ParamValues("spike", "drop", "both"),
"min_training_set_size": ParamValues(10, 20, 30),
"time_bucket": ParamValues(
{"count": 1, "period": "day"},
{"count": 1, "period": "hour"},
{"count": 1, "period": "day"},
),
"backfill_days": ParamValues(30, 60, 90),
"seasonality": ParamValues("day_of_week", "hour_of_day", "day_of_week"),
"event_timestamp_column": ParamValues(
"vars.updated_at", "model.updated_at", "test.updated_at"
),
"ignore_small_changes": ParamValues(
{"spike_failure_percent_threshold": 10, "drop_failure_percent_threshold": 10},
{"spike_failure_percent_threshold": 20, "drop_failure_percent_threshold": 20},
{"spike_failure_percent_threshold": 30, "drop_failure_percent_threshold": 30},
),
"fail_on_zero": ParamValues(True, False, True),
"detection_delay": ParamValues(
{"count": 1, "period": "day"},
{"count": 2, "period": "day"},
{"count": 3, "period": "day"},
),
"anomaly_exclude_metrics": ParamValues(
"where = 'var'", "where = 'model'", "where = 'test'"
),
"detection_period": ParamValues(
{"count": 1, "period": "day"},
{"count": 2, "period": "day"},
{"count": 3, "period": "day"},
),
"training_period": ParamValues(
{"count": 30, "period": "day"},
{"count": 60, "period": "day"},
{"count": 90, "period": "day"},
),
}


def _get_expected_adapted_config(values_type: Literal["vars", "model", "test"]):
def get_value(key: str):
return PARAM_VALUES[key].__dict__[values_type]

days_back_multiplier = (
7 if get_value("seasonality") in ["day_of_week", "hour_of_week"] else 1
)
return {
"timestamp_column": get_value("timestamp_column"),
"where_expression": get_value("where_expression"),
"anomaly_sensitivity": get_value("anomaly_sensitivity"),
"anomaly_direction": get_value("anomaly_direction"),
"time_bucket": get_value("time_bucket"),
"days_back": get_value("training_period")["count"] * days_back_multiplier,
"backfill_days": get_value("detection_period")["count"],
"seasonality": get_value("seasonality"),
"event_timestamp_column": get_value("event_timestamp_column"),
"ignore_small_changes": get_value("ignore_small_changes"),
"fail_on_zero": get_value("fail_on_zero"),
"detection_delay": get_value("detection_delay"),
"anomaly_exclude_metrics": get_value("anomaly_exclude_metrics"),
"freshness_column": None, # Deprecated
"dimensions": None, # should only be set at the test level
}


@Parametrization.autodetect_parameters()
@Parametrization.case(
name="vars",
vars_config={key: value.vars for key, value in PARAM_VALUES.items()},
model_config={},
test_config={},
expected_config=_get_expected_adapted_config("vars"),
)
@Parametrization.case(
name="model",
vars_config={key: value.vars for key, value in PARAM_VALUES.items()},
model_config={key: value.model for key, value in PARAM_VALUES.items()},
test_config={},
expected_config=_get_expected_adapted_config("model"),
)
@Parametrization.case(
name="test",
vars_config={key: value.vars for key, value in PARAM_VALUES.items()},
model_config={key: value.model for key, value in PARAM_VALUES.items()},
test_config={key: value.test for key, value in PARAM_VALUES.items()},
expected_config=_get_expected_adapted_config("test"),
)
def test_anomaly_test_configuration(
dbt_project: DbtProject,
vars_config: dict,
model_config: dict,
test_config: dict,
expected_config: dict,
):
dbt_project.dbt_runner.vars.update(vars_config)
result = dbt_project.dbt_runner.run_operation(
"elementary_tests.get_anomaly_config",
macro_args={"model_config": model_config, "config": test_config},
)
adapted_config = json.loads(result[0])
assert adapted_config == expected_config
File renamed without changes.
64 changes: 64 additions & 0 deletions integration_tests/tests/test_dbt_artifacts/test_columns.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
import json
from typing import List, Optional

from dbt_project import DbtProject
from parametrization import Parametrization

TABLE_NODE = {
"columns": {
"with_description": {
"name": "with_description",
"description": "This column has a description",
},
"without_description": {
"name": "without_description",
},
"with_empty_description": {
"name": "with_empty_description",
"description": "",
},
"with_null_description": {
"name": "with_null_description",
"description": None,
},
}
}


@Parametrization.autodetect_parameters()
@Parametrization.case(
name="default",
only_with_description=None,
expected_columns=["with_description"],
)
@Parametrization.case(
name="only_with_description",
only_with_description=True,
expected_columns=["with_description"],
)
@Parametrization.case(
name="all",
only_with_description=False,
expected_columns=[
"with_description",
"without_description",
"with_empty_description",
"with_null_description",
],
)
def test_flatten_table_columns(
dbt_project: DbtProject,
only_with_description: Optional[bool],
expected_columns: List[str],
) -> None:
if only_with_description is not None:
dbt_project.dbt_runner.vars[
"upload_only_columns_with_descriptions"
] = only_with_description
flattened_columns = json.loads(
dbt_project.dbt_runner.run_operation(
"elementary.flatten_table_columns", macro_args={"table_node": TABLE_NODE}
)[0]
)
flattened_column_names = [column["name"] for column in flattened_columns]
assert flattened_column_names == expected_columns
Loading

0 comments on commit fb111d4

Please sign in to comment.