Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] get_relation reporting dbt found an approximate match #5

Closed
1 task done
sfc-gh-dflippo opened this issue May 6, 2022 · 16 comments · Fixed by #23
Closed
1 task done

[Bug] get_relation reporting dbt found an approximate match #5

sfc-gh-dflippo opened this issue May 6, 2022 · 16 comments · Fixed by #23
Assignees
Labels
bug Something isn't working

Comments

@sfc-gh-dflippo
Copy link

sfc-gh-dflippo commented May 6, 2022

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

Hello, I am the author of the dbt_constraints package and I am attempting to add support for dbt-oracle.

I am receiving the following error when my code runs adapter.get_relation():

Encountered an error:
Compilation Error in operation dbt_constraints-on-run-end-0 (./dbt_project.yml)
  When searching for a relation, dbt found an approximate match. Instead of guessing 
  which relation to use, dbt will move on. Please delete DBT_USER.dim_customers, or rename it to be less ambiguous.
  Searched for: dbt_user.dim_customers
  Found: DBT_USER.dim_customers

I have a schema named DBT_USER and I have specified the the schema in upper case in my profile.

I believe the root cause is how dbt-oracle changes the case of objects in your catalog.sql:

          select
              lower(tables.table_catalog) as "table_database",
              lower(tables.table_schema) as "table_schema",
              lower(tables.table_name) as "table_name",
              lower(tables.table_type) as "table_type",
              all_tab_comments.comments as "table_comment",
              lower(columns.column_name) as "column_name",
              ordinal_position as "column_index",
              lower(case
                when data_type like '%CHAR%'
                then data_type || '(' || cast(char_length as varchar(10)) || ')'
                else data_type
              end) as "column_type",
              all_col_comments.comments as "column_comment",
              tables.table_schema as "table_owner"
          from tables

I work for Snowflake and our database uses exactly the same conventions as Oracle for how identifiers are treated with and without quotes. Identifiers become upper case if not quoted and case sensitive if quoted. Therefore, I think it can be helpful to compare some of your code against the dbt-Labs code for dbt-Snowflake. In the dbt-Labs implementation of dbt-Snowflake they do not adjust the case of our databases, schemas, tables, or columns in the catalog.sql

There could also be an issue with the oracle__get_columns_in_relation macro:

      select
          lower(column_name) as "name",
          lower(data_type) as "type",
          char_length as "character_maximum_length",
          numeric_precision as "numeric_precision",
          numeric_scale as "numeric_scale"
      from columns

In the snowflake__get_columns_in_relation macro dbt-labs uses DESCRIBE TABLE to retrieve columns and they do not change the case of column names or data types (upper case just like Oracle).

I reviewed all the SQL lookups dbt-Labs uses for Snowflake and I consistently see they use UPPER() = UPPER() in their WHERE clauses but in all other cases their metadata lookups preserve the case for databases, schema, tables, and columns. I recommend dbt-Oracle should do the same.

Expected Behavior

No response

Steps To Reproduce

No response

Relevant log output using --debug flag enabled

No response

Environment

- OS: MacOS
- Python: 3.8.13
- dbt: 1.0.6

What Oracle database version are you using dbt with?

Oracle XE 21c

Additional Context

No response

@sfc-gh-dflippo sfc-gh-dflippo added the bug Something isn't working label May 6, 2022
@aosingh aosingh self-assigned this May 7, 2022
@aosingh
Copy link
Member

aosingh commented May 11, 2022

Hi @danflippo
Thank you so much for such detailed explanation. This is very helpful.
I will check this issue and keep you posted.

@aosingh
Copy link
Member

aosingh commented May 25, 2022

@danflippo

I wanted to test dbt_constraints package to understand the issue better. I installed it using dbt deps command

packages.yml is as shown below

packages:
  - git: https://github.com/fishtown-analytics/dbt-utils.git
  - package: Snowflake-Labs/dbt_constraints
    version: 0.3.0

After running dbt test command, I get the below error. Do I need to modify my tests ?

 File "/Users/abhisoms/dbt-oracle/.dbt-oracle-env/lib/python3.8/site-packages/dbt/clients/jinja_static.py", line 140, in statically_parse_adapter_dispatch
    macro = db_wrapper.dispatch(
  File "/Users/abhisoms/dbt-oracle/.dbt-oracle-env/lib/python3.8/site-packages/dbt/context/providers.py", line 187, in dispatch
    raise CompilationException(msg)
dbt.exceptions.CompilationException: Compilation Error
  In dispatch: No macro named 'create_primary_key' found
      Searched for: 'dbt_adbs_test_project.oracle__create_primary_key', 'dbt_adbs_test_project.default__create_primary_key', 'dbt_constraints.oracle__create_primary_key', 'dbt_constraints.default__create_primary_key'

@sfc-gh-dflippo
Copy link
Author

For each database I support I have to add a new set of adapter-specific macros. I was testing with an offline branch but I have now published this branch to Github so you can access it. Since it isn't part of a release, you will need to pull in the module directly using the following git syntax:

  - git: "https://github.com/Snowflake-Labs/dbt_constraints.git"
    revision: dbt-oracle-support

@sfc-gh-dflippo
Copy link
Author

I did some more tracing and I found an additional change. In the Snowflake adapter's impl.py we have the following function:

    def _make_match_kwargs(self, database, schema, identifier):
        quoting = self.config.quoting
        if identifier is not None and quoting["identifier"] is False:
            identifier = identifier.upper()

        if schema is not None and quoting["schema"] is False:
            schema = schema.upper()

        if database is not None and quoting["database"] is False:
            database = database.upper()

        return filter_null_values(
            {"identifier": identifier, "schema": schema, "database": database}
        )

This overrides the default implementation in dbt-core:

    def _make_match_kwargs(self, database: str, schema: str, identifier: str) -> Dict[str, str]:
        quoting = self.config.quoting
        if identifier is not None and quoting["identifier"] is False:
            identifier = identifier.lower()

        if schema is not None and quoting["schema"] is False:
            schema = schema.lower()

        if database is not None and quoting["database"] is False:
            database = database.lower()

        return filter_null_values(
            {
                "database": database,
                "identifier": identifier,
                "schema": schema,
            }
        )

As I mentioned earlier, the databases share the same convention so a non-quoted identifier becomes uppercase and I think this is one of the necessary changes to accommodate this. The dbt-core get_relation() function calls _make_match() and that function calls _make_match_kwargs(). I still believe it is also good to remove your lower() functions in your catalog.sql. I wouldn't be surprised if you had needed those lower() functions in your SQL because you didn't have the logic to override the match function. I think once your override the function you will be able to change your catalog.sql to:

          select
              tables.table_catalog as "table_database",
              tables.table_schema as "table_schema",
              tables.table_name as "table_name",
              tables.table_type as "table_type",
              all_tab_comments.comments as "table_comment",
              columns.column_name as "column_name",
              ordinal_position as "column_index",
              case
                when data_type like '%CHAR%'
                then data_type || '(' || cast(char_length as varchar(10)) || ')'
                else data_type
              end as "column_type",
              all_col_comments.comments as "column_comment",
              tables.table_schema as "table_owner"
          from tables

I would issue a pull-request but I can only imagine the legal red tape I would have to go through to get the Oracle Contribution Agreement signed.

Also, these are the steps to recreate my test using dbt-constraint's built-in integration tests:

  • Create a profile called dbt_constraints that points to your Oracle DB. It should be able to use any empty user/schema.
  • navigate to the folder your_package/dbt_packages/dbt_constraints/integration_tests
  • dbt deps
  • dbt build

The way dbt_constraints works, there are normal tests called primary_key, unique_key, and foreign_key that work pretty must as you would expect. These tests do not have any issues in my integration tests. Then after all the models and tests are complete, dbt_constraints has a on-run-end hook that runs the create_constraints macro in create_constraints.sql. This macro loops through all the executed tests and looks for tests that it can turn into an equivalent physical constraint. It is in the beginning of this macro that the macro fails with the approximate match error. I believe the error occurs on line 200:

            {%- set table_relation = adapter.get_relation(
                database=table_models[0].database,
                schema=table_models[0].schema,
                identifier=table_models[0].name) -%}

At this point, the logic has found a test and through its depends_on.nodes property it has a reference to the table model. This is my manifest.json:
manifest.json.zip

If you search for "model.dbt_constraints_integration_tests.dim_customers", you will find it on line 465. You can find the three attributes I am using to look up the relation on line 490 ("database": "XEPDB1"), 491 ("schema": "DBT_USER") and 501 ("name": "dim_customers").

@aosingh
Copy link
Member

aosingh commented May 31, 2022

Thank you @danflippo for this detailed analysis. I am going through your recommendations and analysis and will get back to you.

@aosingh
Copy link
Member

aosingh commented Jun 2, 2022

Hi @danflippo

I am testing the dbt_constraints package from the following git branch

  - git: https://github.com/Snowflake-Labs/dbt_constraints.git
    revision: dbt-oracle-support

For connection, both user and schema are set to upper case

DBT_ORACLE_USER=DBT_TEST
DBT_ORACLE_SCHEMA=DBT_TEST

The test cases pass after which dbt_constraints logic is invoked at which point I encounter the following error

dbt.exceptions.CompilationException: Compilation Error in operation dbt_constraints-on-run-end-0 (./dbt_project.yml)
  'dbt.contracts.graph.compiled.CompiledSingularTestNode object' has no attribute 'test_metadata'
  
  > in macro create_constraints (macros/create_constraints.sql)
  > called by macro create_constraints_by_type (macros/create_constraints.sql)
  > called by operation dbt_constraints-on-run-end-0 (./dbt_project.yml)

My schema.yml looks is as shown below

models:
  - name: people
    columns:
      - name: id
        tests:
          - dbt_constraints.primary_key
          - not_null
          - unique
      - name: gender
        tests:
          - accepted_values:
              values: ['Male', 'Female']
  - name: eu_direct_sales_channels_promo_costs
    columns:
      - name: country_id
        tests:
          - relationships:
              to: ref('countries')
              field: country_id
          - dbt_constraints.foreign_key:
              pk_table_name: ref('countries')
              pk_column_name: country_id

Please let me know if I am missing something here.

@aosingh
Copy link
Member

aosingh commented Jun 2, 2022

On a side note, I tested dbt functionalities with both lowercase dbt_test and uppercase DBT_TEST schema in profile.yml. It works the same in both cases. Even if you change the casing for subsequent runs dbt does not fail.

@aosingh
Copy link
Member

aosingh commented Jun 2, 2022

This is the output log of dbt build command

18:31:55  14 of 19 START test relationships_eu_direct_sales_channels_promo_costs_country_id__country_id__ref_countries_ [RUN]
18:31:57  13 of 19 OK created view model dbt_test.people.................................. [OK in 7.25s]
18:31:57  15 of 19 START test accepted_values_people_gender__Male__Female................. [RUN]
18:31:57  16 of 19 START test dbt_constraints_primary_key_people_id....................... [RUN]
18:32:01  10 of 19 OK snapshotted dbt_test.promotion_costs_snapshot....................... [OK in 15.17s]
18:32:01  17 of 19 START test not_null_people_id.......................................... [RUN]
18:32:03  14 of 19 PASS relationships_eu_direct_sales_channels_promo_costs_country_id__country_id__ref_countries_ [PASS in 8.43s]
18:32:03  18 of 19 START test test_count_employees........................................ [RUN]
18:32:05  15 of 19 PASS accepted_values_people_gender__Male__Female....................... [PASS in 7.96s]
18:32:05  19 of 19 START test unique_people_id............................................ [RUN]
18:32:05  16 of 19 PASS dbt_constraints_primary_key_people_id............................. [PASS in 7.97s]
18:32:07  17 of 19 PASS not_null_people_id................................................ [PASS in 6.13s]
18:32:09  18 of 19 PASS test_count_employees.............................................. [PASS in 5.57s]
18:32:10  19 of 19 PASS unique_people_id.................................................. [PASS in 4.81s]
18:32:12  
18:32:12  Running 2 on-run-end hooks
18:32:12  1 of 2 START hook: dbt_adbs_test_project.on-run-end.0........................... [RUN]
18:32:12  1 of 2 OK hook: dbt_adbs_test_project.on-run-end.0.............................. [OK in 0.23s]
18:32:12  Running dbt Constraints
18:32:12  Database error while running on-run-end
18:32:14  Encountered an error:
Compilation Error in operation dbt_constraints-on-run-end-0 (./dbt_project.yml)
  'dbt.contracts.graph.compiled.CompiledSingularTestNode object' has no attribute 'test_metadata'
  
  > in macro create_constraints (macros/create_constraints.sql)
  > called by macro create_constraints_by_type (macros/create_constraints.sql)
  > called by operation dbt_constraints-on-run-end-0 (./dbt_project.yml)

@sfc-gh-dflippo
Copy link
Author

This was reported by another user as well. The issue wasn't related at all to dbt-Oracle. I needed to add one line to verify that a test is not singular test. I have released a new version of dbt Constraints with this fix and I have back-ported the latest changes (including a new test I have for singular tests) into my dbt-oracle-support branch.

You should be able to pull in my changes now with dbt deps.

@aosingh aosingh added the work-in-progress Planned in the upcoming release label Jun 8, 2022
@aosingh
Copy link
Member

aosingh commented Jun 10, 2022

@danflippo

Thank you for updating dbt_constraints. I tested the updates.

With dbt test command, it creates all constraints as defined in schema.yml

23:53:07  Running 2 on-run-end hooks
23:53:07  1 of 2 START hook: dbt_adbs_test_project.on-run-end.0........................... [RUN]
23:53:07  1 of 2 OK hook: dbt_adbs_test_project.on-run-end.0.............................. [OK in 0.17s]
23:53:07  Running dbt Constraints
23:53:08  Creating primary key: PEOPLE_ID_PK
23:53:08  Creating primary key: COUNTRIES_COUNTRY_ID_PK
23:53:13  Creating foreign key: EU_DIRECT_SALES_CHANNELS_PROMO_COSTS_COUNTRY_ID_FK referencing COUNTRIES ['country_id']
23:53:14  Finished dbt Constraints
23:53:14  2 of 2 START hook: dbt_constraints.on-run-end.0................................. [RUN]
23:53:14  2 of 2 OK hook: dbt_constraints.on-run-end.0.................................... [OK in 0.00s]
23:53:14  
23:53:14  
23:53:14  Finished running 10 tests, 3 hooks in 21.18s.
23:53:14  
23:53:14  Completed successfully
23:53:14  
23:53:14  Done. PASS=10 WARN=0 ERROR=0 SKIP=0 TOTAL=10

The following table shows all constraints created in the database

OWNER CONSTRAINT_NAME TABLE_NAME COLUMN_NAME POSITION
DBT_TEST COUNTRIES_COUNTRY_ID_PK COUNTRIES COUNTRY_ID 1
DBT_TEST EU_DIRECT_SALES_CHANNELS_PROMO_COSTS_COUNTRY_ID_FK EU_DIRECT_SALES_CHANNELS_PROMO_COSTS COUNTRY_ID 1
DBT_TEST PEOPLE_ID_PK PEOPLE ID 1

However, with dbt build command it does not create constraints and prints the following. Could you help me understand why this difference ?

23:52:29  Running 2 on-run-end hooks
23:52:29  1 of 2 START hook: dbt_adbs_test_project.on-run-end.0........................... [RUN]
23:52:30  1 of 2 OK hook: dbt_adbs_test_project.on-run-end.0.............................. [OK in 0.19s]
23:52:30  Running dbt Constraints
23:52:30  Skipping primary/unique key because a physical column name was not found on the table: countries ['country_id']
23:52:30  Skipping primary/unique key because a physical column name was not found on the table: people ['id']
23:52:30  Skipping primary/unique key because a physical column name was not found on the table: countries ['country_id']
23:52:30  Skipping primary/unique key because a physical column name was not found on the table: people ['id']
23:52:30  Skipping foreign key because a physical column was not found on the pk table: countries ['country_id']
23:52:31  Skipping foreign key because a physical column was not found on the pk table: countries ['country_id']
23:52:31  Finished dbt Constraints
23:52:31  2 of 2 START hook: dbt_constraints.on-run-end.0................................. [RUN]
23:52:31  2 of 2 OK hook: dbt_constraints.on-run-end.0.................................... [OK in 0.00s]
23:52:31  

aosingh added a commit that referenced this issue Jun 10, 2022
- Adapter method _make_match_kwargs() is updated to perform UPPER case comparison
- Adapter method quote_seed_column() is updated to not quote seed cols by default
- adapter.sql is updated to not lower case metadata
- catalog.sql is updated to not lower case metadata
- Incremental materialization needed a fix to perform a UPPER case comparison on the unique_key in ON clause
- Added a test scenarios in dbt_adbs_test_project to test integration with dbt_constraints
- Fixes Bug - #5
@aosingh
Copy link
Member

aosingh commented Jun 10, 2022

You can also install dbt-oracle with latest changes from the git branch https://github.com/oracle/dbt-oracle/tree/fix/integration_with_dbt_constraints

We plan to release this in the upcoming dbt-oracle v1.0.3

You can also review the PR #22.

@aosingh aosingh mentioned this issue Jun 10, 2022
@aosingh
Copy link
Member

aosingh commented Jun 14, 2022

Hi @danflippo

As we are planning to release dbt-oracle's next version late today or early tomorrow, I wanted to check if you had a chance to test the updated dbt-oracle and found any issue integrating with dbt_constraints ?

Thanks

@aosingh
Copy link
Member

aosingh commented Jun 15, 2022

Although this issue was closed automatically after merging the PR, please let me know if you have any questions.

@sfc-gh-dflippo
Copy link
Author

I'm sorry I didn't get back to you last week. I was on vacation out of the country. The difference above seems to be that it is not finding an exact match for the column in your constraint. I use the following macro to lookup the columns on a relation and compare it to an array of fields for the constraint:

{# This macro tests that all the column names passed to the macro can be found on the table, ignoring case #}
{%- macro table_columns_all_exist(table_relation, column_list) -%}
    {%- set tab_Columns = adapter.get_columns_in_relation(table_relation) -%}

    {%- set tab_column_list = [] -%}
    {%- for column in tab_Columns -%}
        {{ tab_column_list.append(column.name|upper) }}
    {%- endfor -%}

    {%- for column in column_list|map('upper') if column not in tab_column_list -%}
        {{ return(false) }}
    {%- endfor -%}
    {{ return(true) }}

{%- endmacro -%}

I just published a new version of dbt Constraints and I've back-ported the changes into the oracle support branch. I saw your changes and they all look good to me. I'm going to do some debugging to see if I can identify what was causing your error.

@sfc-gh-dflippo
Copy link
Author

In the end I found that using the adapter.get_relation() macro was problematic and I could solve any remaining issues by switching to dbt's api.Relation.create()macro. All of my tests are passing now and I have published a new release of dbt Constraints to the dbt Hub that includes support for Oracle. Thanks for the help! You can mark this issue as complete.

@aosingh
Copy link
Member

aosingh commented Jun 27, 2022

Thank you @danflippo for confirming!

@aosingh aosingh removed the work-in-progress Planned in the upcoming release label Aug 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants