Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass timestamp_column as a test param #72

Closed
oravi opened this issue May 7, 2022 · 1 comment
Closed

Pass timestamp_column as a test param #72

oravi opened this issue May 7, 2022 · 1 comment
Labels
Enhancement New feature or request Good First Issue 🥇 Good for newcomers

Comments

@oravi
Copy link
Contributor

oravi commented May 7, 2022

Task Overview

  • Currently timestamp_column is the only configuration that is needed to be configured globally in the model config section (usually it's being configured in the properties.yml under elementary in the config tag).
  • Passing the timestamp_column as a test param will enable running multiple tests with different timestamp columns. For example running a test with updated_at column which represents the update time of the row or running a test with event_time which represents the time the event was sent.

Design

  • There are three main files where the test macros are implemented - test_table_anomalies.sql, test_column_anomalies.sql and test_all_columns_anomalies.sql (please note that currently there is some code duplication in these files and in the future we will probably fix it).

  • All of these test macros should receive a new parameter (defined at the end) with a default value 'none', called 'timestamp_column'.

  • In each test currently there are two lines of code which are responsible for extracting the timestamp_column from the global model config
    {%- set table_config = elementary.get_table_config_from_graph(model) %}
    {%- set timestamp_column = elementary.insensitive_get_dict_value(table_config, 'timestamp_column') %}

  • The macro 'get_table_config_from_graph' returns the timestamp_column and its normalized data type (called 'timestamp_column_data_type')

  • The following code in the macro 'get_table_config_from_graph' that is responsible for finding the timestamp column data type should be extracted to a macro called find_normalized_data_type_for_column -
    {% set columns_from_relation = adapter.get_columns_in_relation(model_relation) %} {% if columns_from_relation and columns_from_relation is iterable %} {% for column_obj in columns_from_relation %} {% if column_obj.column | lower == timestamp_column | lower %} {% set timestamp_column_data_type = elementary.normalize_data_type(column_obj.dtype) %}

  • Then in the test itself if the received timestamp_column new param is not none, use this extracted macro to find the column normalized data type and pass this timestamp_column and timestamp_column_data_type to the relevant functions (get_is_column_timestamp, column_monitoring_query, table_monitoring_query).

  • If the timestamp_column is none, use the global timestamp column as it is implemented today

@Maayan-s
Copy link
Contributor

Maayan-s commented Jun 8, 2022

Added by @elongl in version 0.3.16 of the package.
Thanks @elongl 🌟

@Maayan-s Maayan-s closed this as completed Jun 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement New feature or request Good First Issue 🥇 Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants