Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add compare_relation_columns macro #5

Merged
merged 1 commit into from
Oct 2, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 39 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -202,8 +202,46 @@ Comparing column "status"
| 🤷: missing from b | 26 | 0.06 |
| 🙅: ‍values do not... | 4,070 | 9.73 |

## compare_relation_columns ([source](macros/compare_relation_columns.sql))
This macro will return a query, that, when executed, compares the ordinal_position
and data_types of columns in two [Relations](https://docs.getdbt.com/docs/api-variable#section-relation).

| column_name | a_ordinal_position | b_ordinal_position | a_data_type | b_data_type |
|-------------|--------------------|--------------------|-------------------|-------------------|
| order_id | 1 | 1 | integer | integer |
| customer_id | 2 | 2 | integer | integer |
| order_date | 3 | 3 | timestamp | date |
| status | 4 | 5 | character varying | character varying |
| amount | 5 | 4 | bigint | bigint |


This is especially useful in two situations:
1. Comparing a new version of a relation with an old one, to make sure that the
structure is the same
2. Helping figure out why a `union` of two relations won't work (often because
the data types are different)

For example, in the above result set, we can see that `status` and `amount` have
switched order. Further, `order_date` is a timestamp in our "a" relation, whereas
it is a date in our "b" relation.

```sql
{#- in dbt Develop -#}

{% set old_etl_relation=adapter.get_relation(
database=target.database,
schema="old_etl_schema",
identifier="fct_orders"
) -%}

{% set dbt_relation=ref('fct_orders') %}

{{ audit_helper.compare_relation_columns(
a_relation=old_etl_relation,
b_relation=dbt_relation
) }}

```

# To-do:
* Macro to check if two models have the same structure
* Macro to check if two schemas contain the same relations
15 changes: 15 additions & 0 deletions integration_tests/analysis/compare_relation_columns_smoke_test.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{% set a_relation=ref('data_compare_relations__a_relation') %}

{% set compare_relation_columns_sql = audit_helper.compare_relation_columns(
a_relation,
a_relation
) %}

{{ compare_relation_columns_sql }}

{% if execute %}

{% set results = run_query(compare_relation_columns_sql) %}
{% do results.print_table() %}

{% endif %}
2 changes: 1 addition & 1 deletion integration_tests/dbt_project.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@

name: 'adit_helper_integration_tests'
name: 'audit_helper_integration_tests'
version: '1.0'

profile: 'audit_helper_integration_tests'
Expand Down
160 changes: 160 additions & 0 deletions macros/compare_relation_columns.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
{% macro compare_relation_columns(a_relation, b_relation) %}
with a_cols as (
{{ audit_helper.get_columns_in_relation_sql(a_relation) }}
),

b_cols as (
{{ audit_helper.get_columns_in_relation_sql(b_relation) }}
)

select
column_name,
a_cols.ordinal_position as a_ordinal_position,
b_cols.ordinal_position as b_ordinal_position,
a_cols.data_type as a_data_type,
b_cols.data_type as b_data_type
from a_cols
full outer join b_cols using (column_name)
order by a_ordinal_position, b_ordinal_position

{% endmacro %}


{% macro get_columns_in_relation_sql(relation) %}

{{ adapter_macro('audit_helper.get_columns_in_relation_sql', relation) }}

{% endmacro %}

{% macro redshift__get_columns_in_relation_sql(relation) %}
{#-
See https://github.com/fishtown-analytics/dbt/blob/23484b18b71010f701b5312f920f04529ceaa6b2/plugins/redshift/dbt/include/redshift/macros/adapters.sql#L71
Edited to include ordinal_position
-#}
with bound_views as (
select
ordinal_position,
table_schema,
column_name,
data_type,
character_maximum_length,
numeric_precision,
numeric_scale

from information_schema."columns"
where table_name = '{{ relation.identifier }}'
),

unbound_views as (
select
ordinal_position,
view_schema,
col_name,
case
when col_type ilike 'character varying%' then
'character varying'
when col_type ilike 'numeric%' then 'numeric'
else col_type
end as col_type,
case
when col_type like 'character%'
then nullif(REGEXP_SUBSTR(col_type, '[0-9]+'), '')::int
else null
end as character_maximum_length,
case
when col_type like 'numeric%'
then nullif(
SPLIT_PART(REGEXP_SUBSTR(col_type, '[0-9,]+'), ',', 1),
'')::int
else null
end as numeric_precision,
case
when col_type like 'numeric%'
then nullif(
SPLIT_PART(REGEXP_SUBSTR(col_type, '[0-9,]+'), ',', 2),
'')::int
else null
end as numeric_scale

from pg_get_late_binding_view_cols()
cols(view_schema name, view_name name, col_name name,
col_type varchar, ordinal_position int)
where view_name = '{{ relation.identifier }}'
),

unioned as (
select * from bound_views
union all
select * from unbound_views
)

select
*

from unioned
{% if relation.schema %}
where table_schema = '{{ relation.schema }}'
{% endif %}
order by ordinal_position

{% endmacro %}

{% macro snowflake__get_columns_in_relation_sql(relation) %}
{#-
From: https://github.com/fishtown-analytics/dbt/blob/dev/louisa-may-alcott/plugins/snowflake/dbt/include/snowflake/macros/adapters.sql#L48
Edited to include ordinal_position
-#}
select
ordinal_position,
column_name,
data_type,
character_maximum_length,
numeric_precision,
numeric_scale

from
{{ relation.information_schema('columns') }}

where table_name ilike '{{ relation.identifier }}'
{% if relation.schema %}
and table_schema ilike '{{ relation.schema }}'
{% endif %}
{% if relation.database %}
and table_catalog ilike '{{ relation.database }}'
{% endif %}
order by ordinal_position
{% endmacro %}

{% macro postgres__get_columns_in_relation_sql(relation) %}
{#-
From: https://github.com/fishtown-analytics/dbt/blob/23484b18b71010f701b5312f920f04529ceaa6b2/plugins/postgres/dbt/include/postgres/macros/adapters.sql#L32
Edited to include ordinal_position
-#}
select
ordinal_position,
column_name,
data_type,
character_maximum_length,
numeric_precision,
numeric_scale

from {{ relation.information_schema('columns') }}
where table_name = '{{ relation.identifier }}'
{% if relation.schema %}
and table_schema = '{{ relation.schema }}'
{% endif %}
order by ordinal_position
{% endmacro %}


{% macro bigquery__get_columns_in_relation_sql(relation) %}

select
ordinal_position,
column_name,
data_type

from `{{ relation.database }}`.`{{ relation.schema }}`.INFORMATION_SCHEMA.COLUMNS
where table_name = '{{ relation.identifier }}'

{% endmacro %}