Sync fork with original repository (#15)

* support count_true and count_false for boolean columns in BigQuery * support count_true and count_false for boolean columns in BigQuery * fix: change generate profile args macro name for Athena * Add unique_combination_of_columns to common_tests_configs_mapping * missing comma * fix drop_failure_percent_threshold failing a non anomalous test * Support all anomaly vars on all configuration levels * ELE-2470 temp tables are not being deleted * Add empty line at the end of a filre * fix typo * Removed default detection/training_period * Make sure we delete temp tables last * removed unused import * Not collecting metrics by default. * release 0.14.0 * fix bug when no temp tables xist * Update macros/edr/tests/test_utils/clean_elementary_test_tables.sql Co-authored-by: IDoneShaveIt <48473443+IDoneShaveIt@users.noreply.github.com> * 1. add tags to all elementary monitors 2. only run tests if elementary is enabled * rename tag * Create clean_dbt_columns_temp_tables macro * Add empty line at the end of a file * clean logs * add arg chunk_size for all insert_rows() (elementary-data#669) * release 0.14.1 * override primary_test_model_id (elementary-data#671) * added ignore_small_changes to freshness and event_freshness * improvement: bigquery specific for query_table_metrics (elementary-data#674) * improvement: bigquery specific for query_table_metrics Using information schema to get row count is much more performant than doing a full table scan * use TABLE_STORAGE and add database & schema * add empty case * add set * Add index on created_at test_result_rows and remove backfill post hook * Ele 2606 package version with caching and extra logs (elementary-data#673) * artifacts: use cache also for model post-hook * add performance logs to artifacts logic * duration monitoring - bugfix - handle the case the duration stack is not initialized * Change the aggregate of failed_row_count_calc to count(*) * Readme updates (elementary-data#684) * changes to readme * changes to readme * changes * changes * image url * image url * changes * formating * formating * changes * link * pre commit * improve flattening performance for dbt_columns (elementary-data#681) * improve flattening performance for dbt_columns * removed unused const * black * Add get_requires_permissions and validate_required_permissions macros * Improved messages * Fixed default__get_required_permissions + add target.database to get_relevant_databases --------- Co-authored-by: suelai <suela.isaj@gmail.com> Co-authored-by: Roman Korsun <romakorsun2000@gmail.com> Co-authored-by: Yasuhisa Yoshida <syou6162@gmail.com> Co-authored-by: Ofek Weiss <55920061+ofek1weiss@users.noreply.github.com> Co-authored-by: Ofek Weiss <ofek1weiss@gmail.com> Co-authored-by: IDoneShaveIt <idanshavit31@gmail.com> Co-authored-by: IDoneShaveIt <48473443+IDoneShaveIt@users.noreply.github.com> Co-authored-by: Elon Gliksberg <elongliks@gmail.com> Co-authored-by: GitHub Actions <noreply@github.com> Co-authored-by: Ella Katz <ella@elementary-data.com> Co-authored-by: J.C <zhang@sansan.com> Co-authored-by: Noy Arie <noyarie1992@gmail.com> Co-authored-by: Chris Dong <86695140+dongchris@users.noreply.github.com> Co-authored-by: noakurman <kurman.noa@gmail.com> Co-authored-by: Itamar Hartstein <haritamar@gmail.com> Co-authored-by: Maayan Salom <maayansalom@gmail.com>
goes-funky · Apr 9, 2024 · fb111d4 · fb111d4
1 parent fad76af
commit fb111d4
Show file tree

Hide file tree

Showing 57 changed files with 763 additions and 139 deletions.
diff --git a/README.md b/README.md
@@ -2,27 +2,39 @@
 <img alt="Logo" src="https://raw.githubusercontent.com/elementary-data/elementary/master/static/github_banner.png"/ width="1000">
 </p>
 
-<h2 align="center">
- dbt native data observability for analytics & data engineers
-</h2>
-<h4 align="center">
-Monitor your data quality, operation and performance directly from your dbt project.
-</h4>
+# [dbt native data observability](https://docs.elementary-data.com/introduction)
 
 <p align="center">
 <a href="https://join.slack.com/t/elementary-community/shared_invite/zt-uehfrq2f-zXeVTtXrjYRbdE_V6xq4Rg"><img src="https://img.shields.io/badge/join-Slack-ff69b4"/></a>
 <a href="https://docs.elementary-data.com/quickstart"><img src="https://img.shields.io/badge/docs-quickstart-orange"/></a>
 <img alt="License" src="https://img.shields.io/badge/license-Apache--2.0-ff69b4"/>
 <img alt="Downloads" src="https://static.pepy.tech/personalized-badge/elementary-lineage?period=total&units=international_system&left_color=grey&right_color=orange"&left_text=Downloads"/>
+</p>
+
+## What is Elementary?
+
+This dbt package is part of Elementary, the dbt-native data observability solution for data and analytics engineers.
+Set up in minutes, gain immediate visibility, detect data issues, send actionable alerts, and understand impact and root cause.
+Available as self-hosted or Cloud service with premium features.
+
+#### Table of Contents
+
+- [Quick start - dbt package](#quick-start---dbt-package)
+- [Get more out of Elementary](#get-more-out-of-elementary-dbt-package)
+- [Run results and dbt artifacts](#run-results-and-dbt-artifacts)
+- [Data anomaly detection as dbt tests](#data-anomaly-detection-as-dbt-tests)
+- [How Elementary works?](#how-elementary-works)
+- [Community & Support](#community--support)
+- [Contribution](#contributions)
 
-## Quick start
+## Quick start - dbt package
 
 1. Add to your `packages.yml`:
 
 ```yml packages.yml
 packages:
   - package: elementary-data/elementary
-    version: 0.13.2
+    version: 0.14.1
     ## Docs: https://docs.elementary-data.com
 ```
 
@@ -40,11 +52,22 @@ models:
 
 4. Run `dbt run --select elementary`
 
-Check out the [full documentation](https://docs.elementary-data.com/) for generating the UI, alerts and adding anomaly detection tests.
+Check out the [full documentation](https://docs.elementary-data.com/).
+
+## Get more out of Elementary dbt package
+
+Elementary has 3 offerings: This dbt package, Elementary Community (OSS) and Elementary (cloud service).
+
+- **dbt package**
+  - For basic data monitoring and dbt artifacts collection, Elementary offers a dbt package. The package adds logging, artifacts uploading, and Elementary tests (anomaly detection and schema) to your project.
+- **Elementary Community**
+  - An open-source CLI tool you can deploy and orchestrate to send alerts and self-host the Elementary report. Best for data and analytics engineers that require basic observability capabilities or for evaluating features without vendor approval. Our community can provide great support on [Slack](https://www.elementary-data.com/community) if needed.
+- **Elementary Cloud**
+  - Ideal for teams monitoring mission-critical data pipelines, requiring guaranteed uptime and reliability, short-time-to-value, advanced features, collaboration, and professional support. The solution is secure by design, and requires no access to your data from cloud. To learn more, [book a demo](https://cal.com/maayansa/elementary-intro-github-package) or [start a trial](https://www.elementary-data.com/signup).
 
 ## Run Results and dbt artifacts
 
-The package automatically uploads the dbt artifacts and run results to tables in your data warehouse:
+The package automatically uploads dbt artifacts and run results to tables in your data warehouse:
 
 Run results tables:
 
@@ -65,7 +88,7 @@ Metadata tables:
 
 Here you can find [additional details about the tables](https://docs.elementary-data.com/guides/modules-overview/dbt-package).
 
-## Data anomalies detection as dbt tests
+## Data anomaly detection as dbt tests
 
 Elementary dbt tests collect metrics and metadata over time, such as freshness, volume, schema changes, distribution, cardinality, etc.
 Executed as any other dbt tests, the Elementary tests alert on anomalies and outliers.
@@ -85,29 +108,17 @@ models:
       - elementary.all_columns_anomalies
 ```
 
-## Data observability report
-
-<kbd align="center">
-        <a href="https://storage.googleapis.com/elementary_static/elementary_demo.html"><img align="center" style="max-width:300px;" src="https://raw.githubusercontent.com/elementary-data/elementary/master/static/report_ui.gif"> </a>
-</kbd>
-
-## Slack alerts
-
-<img alt="UI" src="https://raw.githubusercontent.com/elementary-data/elementary/master/static/Slack_alert_elementary.png" width="600">
+Read about the available [Elementary tests and configuration](https://docs.elementary-data.com/data-tests/introduction).
 
-## How it works?
+## How Elementary works?
 
-Elementary dbt package creates tables of metadata and test results in your data warehouse, as part of your dbt runs. The [CLI tool](https://github.com/elementary-data/elementary) reads the data from these tables, and is used to generate the UI and alerts.
+Elementary dbt package creates tables of metadata and test results in your data warehouse, as part of your dbt runs.
 
-<img align="center" style="max-width:300px;" src="https://raw.githubusercontent.com/elementary-data/elementary/master/static/how_elementary_works.png">
+The cloud service or the CLI tool read the data from these tables, send alerts and present the results in the UI.
 
-## Data warehouse support
-
-- [x] **Snowflake** ![](https://raw.githubusercontent.com/elementary-data/elementary/master/static/snowflake-16.png)
-- [x] **BigQuery** ![](https://raw.githubusercontent.com/elementary-data/elementary/master/static/bigquery-16.svg)
-- [x] **Redshift** ![](https://raw.githubusercontent.com/elementary-data/elementary/master/static/redshift-16.png)
-- [x] **Databricks SQL** ![](https://raw.githubusercontent.com/elementary-data/elementary/master/static/databricks-16.png)
-- [x] **Postgres** ![](https://raw.githubusercontent.com/elementary-data/elementary/master/static/postgres-16.png)
+<kbd align="center">
+        <a href="https://storage.googleapis.com/elementary_static/elementary_demo.html"><img align="center" style="max-width:300px;" src="https://raw.githubusercontent.com/elementary-data/elementary/master/static/report_ui.gif"> </a>
+</kbd>
 
 ## Community & Support
 

diff --git a/dbt_project.yml b/dbt_project.yml
@@ -1,5 +1,5 @@
 name: "elementary"
-version: "0.13.2"
+version: "0.14.1"
 
 require-dbt-version: [">=1.0.0", "<2.0.0"]
 

diff --git a/integration_tests/dbt_project/macros/get_anomaly_config.sql b/integration_tests/dbt_project/macros/get_anomaly_config.sql
@@ -0,0 +1,22 @@
+{% macro get_anomaly_config(model_config, config) %}
+  {% set mock_model = {
+    "alias": "mock_model",
+    "config": {
+      "elementary": model_config
+    }
+  } %}
+  {# trick elementary into thinking this is the running model #}
+  {% do context.update({
+    "model": {
+      "depends_on": {
+        "nodes": ["id"]
+      }
+    },
+    "graph": {
+      "nodes": {
+        "id": mock_model
+      }
+    }
+  }) %}
+  {% do return(elementary.get_anomalies_test_configuration(api.Relation.create("db", "schema", "mock_model"), **config)[0]) %}
+{% endmacro %}
diff --git a/integration_tests/tests/conftest.py b/integration_tests/tests/conftest.py
@@ -13,6 +13,7 @@
 
 def pytest_addoption(parser):
     parser.addoption("--target", action="store", default="postgres")
+    parser.addoption("--skip-init", action="store_true", default=False)
 
 
 @pytest.fixture(scope="session")
@@ -31,8 +32,9 @@ def project_dir_copy():
 
 
 @pytest.fixture(scope="session", autouse=True)
-def init_tests_env(target, project_dir_copy: str):
-    env.init(target, project_dir_copy)
+def init_tests_env(target, skip_init, project_dir_copy: str):
+    if not skip_init:
+        env.init(target, project_dir_copy)
 
 
 @pytest.fixture(autouse=True)
@@ -75,6 +77,11 @@ def target(request) -> str:
     return request.config.getoption("--target")
 
 
+@pytest.fixture(scope="session")
+def skip_init(request) -> str:
+    return request.config.getoption("--skip-init")
+
+
 @pytest.fixture
 def test_id(request) -> str:
     if request.cls:

diff --git a/integration_tests/tests/test_anomaly_test_configuration.py b/integration_tests/tests/test_anomaly_test_configuration.py
@@ -0,0 +1,126 @@
+import json
+from dataclasses import dataclass
+from typing import Generic, Literal, TypeVar
+
+from dbt_project import DbtProject
+from parametrization import Parametrization
+
+T = TypeVar("T")
+
+
+@dataclass
+class ParamValues(Generic[T]):
+    vars: T
+    model: T
+    test: T
+
+
+PARAM_VALUES = {
+    "timestamp_column": ParamValues(
+        "vars.updated_at", "model.updated_at", "test.updated_at"
+    ),
+    "where_expression": ParamValues(
+        "where = 'var'", "where = 'model'", "where = 'test'"
+    ),
+    "anomaly_sensitivity": ParamValues(1, 2, 3),
+    "anomaly_direction": ParamValues("spike", "drop", "both"),
+    "min_training_set_size": ParamValues(10, 20, 30),
+    "time_bucket": ParamValues(
+        {"count": 1, "period": "day"},
+        {"count": 1, "period": "hour"},
+        {"count": 1, "period": "day"},
+    ),
+    "backfill_days": ParamValues(30, 60, 90),
+    "seasonality": ParamValues("day_of_week", "hour_of_day", "day_of_week"),
+    "event_timestamp_column": ParamValues(
+        "vars.updated_at", "model.updated_at", "test.updated_at"
+    ),
+    "ignore_small_changes": ParamValues(
+        {"spike_failure_percent_threshold": 10, "drop_failure_percent_threshold": 10},
+        {"spike_failure_percent_threshold": 20, "drop_failure_percent_threshold": 20},
+        {"spike_failure_percent_threshold": 30, "drop_failure_percent_threshold": 30},
+    ),
+    "fail_on_zero": ParamValues(True, False, True),
+    "detection_delay": ParamValues(
+        {"count": 1, "period": "day"},
+        {"count": 2, "period": "day"},
+        {"count": 3, "period": "day"},
+    ),
+    "anomaly_exclude_metrics": ParamValues(
+        "where = 'var'", "where = 'model'", "where = 'test'"
+    ),
+    "detection_period": ParamValues(
+        {"count": 1, "period": "day"},
+        {"count": 2, "period": "day"},
+        {"count": 3, "period": "day"},
+    ),
+    "training_period": ParamValues(
+        {"count": 30, "period": "day"},
+        {"count": 60, "period": "day"},
+        {"count": 90, "period": "day"},
+    ),
+}
+
+
+def _get_expected_adapted_config(values_type: Literal["vars", "model", "test"]):
+    def get_value(key: str):
+        return PARAM_VALUES[key].__dict__[values_type]
+
+    days_back_multiplier = (
+        7 if get_value("seasonality") in ["day_of_week", "hour_of_week"] else 1
+    )
+    return {
+        "timestamp_column": get_value("timestamp_column"),
+        "where_expression": get_value("where_expression"),
+        "anomaly_sensitivity": get_value("anomaly_sensitivity"),
+        "anomaly_direction": get_value("anomaly_direction"),
+        "time_bucket": get_value("time_bucket"),
+        "days_back": get_value("training_period")["count"] * days_back_multiplier,
+        "backfill_days": get_value("detection_period")["count"],
+        "seasonality": get_value("seasonality"),
+        "event_timestamp_column": get_value("event_timestamp_column"),
+        "ignore_small_changes": get_value("ignore_small_changes"),
+        "fail_on_zero": get_value("fail_on_zero"),
+        "detection_delay": get_value("detection_delay"),
+        "anomaly_exclude_metrics": get_value("anomaly_exclude_metrics"),
+        "freshness_column": None,  # Deprecated
+        "dimensions": None,  # should only be set at the test level
+    }
+
+
+@Parametrization.autodetect_parameters()
+@Parametrization.case(
+    name="vars",
+    vars_config={key: value.vars for key, value in PARAM_VALUES.items()},
+    model_config={},
+    test_config={},
+    expected_config=_get_expected_adapted_config("vars"),
+)
+@Parametrization.case(
+    name="model",
+    vars_config={key: value.vars for key, value in PARAM_VALUES.items()},
+    model_config={key: value.model for key, value in PARAM_VALUES.items()},
+    test_config={},
+    expected_config=_get_expected_adapted_config("model"),
+)
+@Parametrization.case(
+    name="test",
+    vars_config={key: value.vars for key, value in PARAM_VALUES.items()},
+    model_config={key: value.model for key, value in PARAM_VALUES.items()},
+    test_config={key: value.test for key, value in PARAM_VALUES.items()},
+    expected_config=_get_expected_adapted_config("test"),
+)
+def test_anomaly_test_configuration(
+    dbt_project: DbtProject,
+    vars_config: dict,
+    model_config: dict,
+    test_config: dict,
+    expected_config: dict,
+):
+    dbt_project.dbt_runner.vars.update(vars_config)
+    result = dbt_project.dbt_runner.run_operation(
+        "elementary_tests.get_anomaly_config",
+        macro_args={"model_config": model_config, "config": test_config},
+    )
+    adapted_config = json.loads(result[0])
+    assert adapted_config == expected_config
diff --git a/integration_tests/tests/test_artifacts.py → ...ests/test_dbt_artifacts/test_artifacts.py b/integration_tests/tests/test_artifacts.py → ...ests/test_dbt_artifacts/test_artifacts.py
diff --git a/integration_tests/tests/test_dbt_artifacts/test_columns.py b/integration_tests/tests/test_dbt_artifacts/test_columns.py
@@ -0,0 +1,64 @@
+import json
+from typing import List, Optional
+
+from dbt_project import DbtProject
+from parametrization import Parametrization
+
+TABLE_NODE = {
+    "columns": {
+        "with_description": {
+            "name": "with_description",
+            "description": "This column has a description",
+        },
+        "without_description": {
+            "name": "without_description",
+        },
+        "with_empty_description": {
+            "name": "with_empty_description",
+            "description": "",
+        },
+        "with_null_description": {
+            "name": "with_null_description",
+            "description": None,
+        },
+    }
+}
+
+
+@Parametrization.autodetect_parameters()
+@Parametrization.case(
+    name="default",
+    only_with_description=None,
+    expected_columns=["with_description"],
+)
+@Parametrization.case(
+    name="only_with_description",
+    only_with_description=True,
+    expected_columns=["with_description"],
+)
+@Parametrization.case(
+    name="all",
+    only_with_description=False,
+    expected_columns=[
+        "with_description",
+        "without_description",
+        "with_empty_description",
+        "with_null_description",
+    ],
+)
+def test_flatten_table_columns(
+    dbt_project: DbtProject,
+    only_with_description: Optional[bool],
+    expected_columns: List[str],
+) -> None:
+    if only_with_description is not None:
+        dbt_project.dbt_runner.vars[
+            "upload_only_columns_with_descriptions"
+        ] = only_with_description
+    flattened_columns = json.loads(
+        dbt_project.dbt_runner.run_operation(
+            "elementary.flatten_table_columns", macro_args={"table_node": TABLE_NODE}
+        )[0]
+    )
+    flattened_column_names = [column["name"] for column in flattened_columns]
+    assert flattened_column_names == expected_columns