Feature enhancement to support quoting for column names #36

aosingh · 2022-07-20T01:43:54Z

This addresses the dbt-oracle issue [Bug] Incremental upsert strategy fails with ORA-00911 & ORA-01747 #8
Oracle Adapter class exposes a new method called should_identifier_be_quoted(identifier) which returns true if the identifier is either in list of Oracle keywords or does not start with alphabet. This can be used in the macros in the following manner
- ```
     {% if adapter.should_identifier_be_quoted(col) == true %}
         {% set quoted_column_name = quote ~ col ~ quote) %}
     {% else %}
```
Macro {{ column.name }} will return quoted name if column name is either in list of Oracle keywords or does not start with alphabet
List of Oracle keywords from the official documentation is defined in dbt/adapters/oracle/keyword_catalog.py
Handled column quoting in incremental materialization macros

- This addresses github issue #8 - Oracle Adapter class exposes a new method called `should_identifier_be_quoted(identifier)` which returns true or false if the identifier is either in list of Oracle keywords or does not start with alphabet - Macro {{ column.name }} will return quoted name if column name is either in list of Oracle keywords or does not start with alphabet - List of Oracle keywords is defined in dbt/adapters/oracle/keyword_catalog.py from https://docs.oracle.com/en/database/oracle/oracle-database/21/zzpre/Oracle-reserved-words-keywords-namespaces.html#GUID-25FE5FB4-5B17-4AFA-9B59-77B6036EF579 - Handled column quoting in incremental materialization (MERGE, INSERT, UPDATE)

aosingh · 2022-07-20T03:32:53Z

FYI @ThoSap

thbaby · 2022-07-20T09:02:50Z

There are more scenarios when should_identifier_be_quoted should return true. It is difficult to provide a comprehensive list. In general, if any object's name was quoted when it was created, then its name should be quoted later when accessed.

aosingh · 2022-07-20T19:17:15Z

There are more scenarios when should_identifier_be_quoted should return true. It is difficult to provide a comprehensive list. In general, if any object's name was quoted when it was created, then its name should be quoted later when accessed.

Thanks @thbaby, I agree.

Currently, if a object's name was quoted when it was created then dbt let's you enable quoting using the quote configuration. Below is an example

version: 2

models:
  - name: employee
    columns:
      - name: employee_first_name
        quote: true

I am checking if we can access this configuration in should_identifier_be_quoted and then return True accordingly.

If the above does not work, I was thinking of removing should_identifier_be_quoted and exposing the following methods in adapter.

adapter.is_keyword(identifier)
adapter.starts_with_alphabet(identifier)

This will make the purpose of the methods clear and also address the issue in the bug

If a column uses a reserved keyword or starts with an invalid character dbt-oracle by default should treat the columns to be quoted and uppercase.

Let me know if you have any feedback or questions

thbaby · 2022-07-21T10:54:38Z

If at all possible, it'd be good to handle all cases in one shot rather than incrementally handling case by case.

- adapter.should_identifier_be_quoted(identifier) handles 3 cases when an identifier should be quoted. - adapter.check_and_quote_identifier(identifier) is exposed to quote an identifier - A new macro is get_quoted_column_csv(model, column_names) is added to quote a list of column names - The implementation of ``{{column.name}}`` is reverted in exchange for a single quoting api - Incremental materialization macros are fixed to use the adapter.check_and_quote_identifier() - 4 new unit test cases to test quoting for different scenarios

aosingh · 2022-07-23T01:16:04Z

If at all possible, it'd be good to handle all cases in one shot rather than incrementally handling case by case.

Hi @thbaby

I have made changes to adapter.should_identifier_be_quoted(identifier) which returns True in the following 3 scenarios

Identifier is an Oracle keyword
Identifier is not valid according to the following rules. (This is as per DBMS_ASSERT.SIMPLE_SQL_NAME)
- The name must begin with an alphabetic character. It may contain alphanumeric characters as well as the characters _, $, and # in the second and subsequent character positions.
User has enabled quoting for the column in the model configuration

I have added more test cases to test quoting for different dbt flows and they pass as expected

Please review and let me know if you have any questions or feedback.

Thanks

thbaby · 2022-07-23T06:24:15Z

Consider these cases:

create table foo (...); -- creates a table named FOO
create table "foo" (...); -- creates a table named foo
create table Foo (...); -- creates a table named FOO
create table "Foo" (...); -- creates a table named Foo

Are all these cases handled by the new changes you have made?

aosingh · 2022-07-24T01:27:01Z

Thanks @thbaby, In this PR, I have tested dbt flows with quoted column names. Below I have explained 2 of the test cases and the SQL code generated by dbt-oracle

Test case 1 - Incremental merge for columns with special characters, spaces and keywords

Sample seed data

_user_id	user name	birth_ date in yyyy-mm-dd	income	last_login_date	desc
1	Easton	1981-05-20	40000	2022-04-25T08:57:59	login

SQL generated

Create table seed

 CREATE TABLE dbt_test.seed 
      ("_user_id" number,
      "user name" varchar2(16),
      "birth_ date in yyyy-mm-dd" timestamp,
       income number,
       last_login_date timestamp,
      "desc" varchar2(16))

Insert csv records

INSERT ALL
    INTO dbt_test.seed ("_user_id", "user name", "birth_ date in yyyy-mm-dd", income, last_login_date, "desc") values(:p1,:p2,:p3,:p4,:p5,:p6)
    INTO dbt_test.seed ("_user_id", "user name", "birth_ date in yyyy-mm-dd", income, last_login_date, "desc") values(:p1,:p2,:p3,:p4,:p5,:p6)
    INTO dbt_test.seed ("_user_id", "user name", "birth_ date in yyyy-mm-dd", income, last_login_date, "desc") values(:p1,:p2,:p3,:p4,:p5,:p6)

Create the model SQL

CREATE  TABLE dbt_test.my_incr_model
  AS  
SELECT * FROM dbt_test.seed

Insert 2 new rows
The following code in not generated by dbt. I insert 2 new rows in the seed table.

INSERT ALL
    INTO dbt_test.seed ("_user_id", "user name", "birth_ date in yyyy-mm-dd", income, last_login_date, "desc") VALUES 
        (2,'Lillian Sr.', TO_DATE('1982-02-03', 'YYYY-MM-DD'), 200000, TO_DATE('2022-05-01 06:01:31', 'YYYY-MM-DD HH:MI:SS'), 'Login')
    INTO dbt_test.seed ("_user_id", "user name", "birth_ date in yyyy-mm-dd", income, last_login_date, "desc") VALUES 
        (5,'John Doe',TO_DATE('1992-10-01', 'YYYY-MM-DD'), 300000, TO_DATE('2022-06-01 06:01:31', 'YYYY-MM-DD HH:MI:SS'), 'Login')
SELECT * FROM dual

Create global tmp table

This creates a temp table with above inserted 2 new rows

CREATE GLOBAL TEMPORARY table o$pt_my_incr_model172219
ON COMMIT PRESERVE ROWS
    AS
   SELECT * FROM dbt_test.seed
   WHERE last_login_date > (SELECT max(last_login_date) FROM dbt_test.my_incr_model)

Merge based on the criteria specified for uniqueness

merge into dbt_test.my_incr_model target
          using o$pt_my_incr_model172219 temp
          on (
                    temp."_user_id" = target."_user_id"
            )
        when matched then
          update set
          target."user name" = temp."user name", 
          target."birth_ date in yyyy-mm-dd" = temp."birth_ date in yyyy-mm-dd", 
          target.INCOME = temp.INCOME, 
          target.LAST_LOGIN_DATE = temp.LAST_LOGIN_DATE, 
          target."desc" = temp."desc"
          when not matched then
          insert("_user_id", "user name", "birth_ date in yyyy-mm-dd", INCOME, LAST_LOGIN_DATE, "desc")
          values(
            temp."_user_id", 
            temp."user name", 
            temp."birth_ date in yyyy-mm-dd", 
            temp.INCOME, 
            temp.LAST_LOGIN_DATE, 
            temp."desc"
            )

For test case source, refer TestIncrementalMergeQuoteWithKeywordsandSpecialChars

Test Case 2 - Incremental merge with schema sync for columns with special characters, spaces and keywords

Sample seed data

_user_id	user name	birth_ date in yyyy-mm-dd	income	last_login_date	desc
1	Easton	1981-05-20	40000	2022-04-25T08:57:59	login

SQL generated

Create table seed

 CREATE TABLE dbt_test.seed 
      ("_user_id" number,
      "user name" varchar2(16),
      "birth_ date in yyyy-mm-dd" timestamp,
       income number,
       last_login_date timestamp,
      "desc" varchar2(16))

Insert csv records

INSERT ALL
    INTO dbt_test.seed ("_user_id", "user name", "birth_ date in yyyy-mm-dd", income, last_login_date, "desc") values(:p1,:p2,:p3,:p4,:p5,:p6)
    INTO dbt_test.seed ("_user_id", "user name", "birth_ date in yyyy-mm-dd", income, last_login_date, "desc") values(:p1,:p2,:p3,:p4,:p5,:p6)
    INTO dbt_test.seed ("_user_id", "user name", "birth_ date in yyyy-mm-dd", income, last_login_date, "desc") values(:p1,:p2,:p3,:p4,:p5,:p6)

Create the model SQL

CREATE  TABLE dbt_test.my_incr_model
  AS  
SELECT * FROM dbt_test.seed

Trigger schema sync
The following code is not generated by dbt. I add and drop a column in seed table. This should trigger a schema sync and dbt-oracle should make the corresponding changes in the model table

ALTER TABLE dbt_test.seed ADD ("birth date in yyyy-mm-dd" DATE )
ALTER TABLE dbt_test.seed DROP ("birth_ date in yyyy-mm-dd") CASCADE CONSTRAINTS

Insert 2 new rows
The following code in not generated by dbt. I insert 2 new rows in the seed table.

INSERT ALL
    INTO dbt_test.seed ("_user_id", "user name", "birth date in yyyy-mm-dd", income, last_login_date, "desc") VALUES 
        (2,'Lillian Sr.', TO_DATE('1982-02-03', 'YYYY-MM-DD'), 200000, TO_DATE('2022-05-01 06:01:31', 'YYYY-MM-DD HH:MI:SS'), 'Login')
    INTO dbt_test.seed ("_user_id", "user name", "birth date in yyyy-mm-dd", income, last_login_date, "desc") VALUES 
        (5,'John Doe',TO_DATE('1992-10-01', 'YYYY-MM-DD'), 300000, TO_DATE('2022-06-01 06:01:31', 'YYYY-MM-DD HH:MI:SS'), 'Login')
SELECT * FROM dual

Create global tmp table

This creates a temp table with above inserted 2 new rows

create global temporary table o$pt_my_incr_model180414
  on commit preserve rows
  as
    
SELECT * FROM dbt_test.seed

    WHERE last_login_date > (SELECT max(last_login_date) FROM dbt_test.my_incr_model)

Sync schema changes

dbt-oracle will detect the schema changes and sync it accordingly

 ALTER table DBT_TEST.MY_INCR_MODEL
              ADD (
              
                "birth date in yyyy-mm-dd" DATE
              
              )

ALTER table DBT_TEST.MY_INCR_MODEL
              DROP (
                
                  "birth_ date in yyyy-mm-dd"
                
                ) CASCADE CONSTRAINTS

Merge based on the criteria specified for uniqueness

merge into dbt_test.my_incr_model target
          using o$pt_my_incr_model180414 temp
          on (
                    temp."_user_id" = target."_user_id"
            )
        when matched then
          update set
          target."user name" = temp."user name", 
          target.INCOME = temp.INCOME, 
          target.LAST_LOGIN_DATE = temp.LAST_LOGIN_DATE, 
          target."desc" = temp."desc", 
          target."birth date in yyyy-mm-dd" = temp."birth date in yyyy-mm-dd"
          when not matched then
          insert("_user_id", "user name", INCOME, LAST_LOGIN_DATE, "desc", "birth date in yyyy-mm-dd")
          values(
            temp."_user_id", 
            temp."user name", 
            temp.INCOME, 
            temp.LAST_LOGIN_DATE, 
            temp."desc", 
            temp."birth date in yyyy-mm-dd"
            )

For test case source, refer TestSyncSchemaIncrementalMergeQuotedColumns

Relations (tables and views)

Relation names are handled a bit differently. Although, I should test relation names with quoting as well.

For relations, the name of the .sql file is the object name in the database. For a model file, test.sql a table called test is created in the following manner
```
 CREATE TABLE test AS 
 {{ SELECT STATEMENT defined in test.sql }}
```

Quoting for relation names can be enabled at a dbt project level.

quoting:
    database: true | false
    schema: true | false
    identifier: true | false

To test this, I will enable quoting for relation names, define a model sql file, materialize it as a table and verify that the methodadapter.get_relation(relation) returns the correct relation based on the matched name. I will let you know once I have tested scenarios for relation names. Meanwhile, please let me know if you have any questions or feedback.

- Removed hardcoding of quoting configurations from all macros. Config should be picked from dbt project config - Added 4 new test cases for relation names

aosingh · 2022-07-25T04:03:01Z

@thbaby

I have tested the following scenarios for relation names and they pass as-expected. These are also included in our test suite to run after every commit

Test Case Name	Model name	Table created	Status
TestQuotedLowerCaseTableName	"foo"	foo	Pass
TestQuotedTitleCaseTableName	"Foo"	Foo	Pass
TestUnQuotedLowerCaseTableName	foo	FOO	Pass
TestUnQuotedTitleCaseTableName	Foo	FOO	Pass

Please let me know if you have any questions or feedback.

oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Jul 20, 2022

aosingh requested review from ramapemmaraju, spanchum and thbaby July 20, 2022 01:44

aosingh added 2 commits July 22, 2022 17:05

Added copyright notice

c5595af

aosingh added 2 commits July 24, 2022 20:21

Better quoting for relation names

9caa591

- Removed hardcoding of quoting configurations from all macros. Config should be picked from dbt project config - Added 4 new test cases for relation names

Removed debug log statement

c87d912

thbaby approved these changes Jul 25, 2022

View reviewed changes

aosingh merged commit 26f7631 into main Jul 25, 2022

aosingh deleted the feature/quoting branch July 25, 2022 16:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature enhancement to support quoting for column names #36

Feature enhancement to support quoting for column names #36

Uh oh!

aosingh commented Jul 20, 2022 •

edited

Loading

Uh oh!

aosingh commented Jul 20, 2022

Uh oh!

thbaby commented Jul 20, 2022

Uh oh!

aosingh commented Jul 20, 2022

Uh oh!

thbaby commented Jul 21, 2022

Uh oh!

aosingh commented Jul 23, 2022

Uh oh!

thbaby commented Jul 23, 2022

Uh oh!

aosingh commented Jul 24, 2022

Sample seed data

SQL generated

Sample seed data

SQL generated

Uh oh!

aosingh commented Jul 25, 2022 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Feature enhancement to support quoting for column names #36

Feature enhancement to support quoting for column names #36

Uh oh!

Conversation

aosingh commented Jul 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aosingh commented Jul 20, 2022

Uh oh!

thbaby commented Jul 20, 2022

Uh oh!

aosingh commented Jul 20, 2022

Uh oh!

thbaby commented Jul 21, 2022

Uh oh!

aosingh commented Jul 23, 2022

Uh oh!

thbaby commented Jul 23, 2022

Uh oh!

aosingh commented Jul 24, 2022

Sample seed data

SQL generated

Sample seed data

SQL generated

Relations (tables and views)

Uh oh!

aosingh commented Jul 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aosingh commented Jul 20, 2022 •

edited

Loading

aosingh commented Jul 25, 2022 •

edited

Loading