Skip to content

Conversation

@aosingh
Copy link
Member

@aosingh aosingh commented Jul 20, 2022

  • This addresses the dbt-oracle issue [Bug] Incremental upsert strategy fails with ORA-00911 & ORA-01747 #8
  • Oracle Adapter class exposes a new method called should_identifier_be_quoted(identifier) which returns true if the identifier is either in list of Oracle keywords or does not start with alphabet. This can be used in the macros in the following manner
    •      {% if adapter.should_identifier_be_quoted(col) == true %}
               {% set quoted_column_name = quote ~ col ~ quote) %}
           {% else %}
      
  • Macro {{ column.name }} will return quoted name if column name is either in list of Oracle keywords or does not start with alphabet
  • List of Oracle keywords from the official documentation is defined in dbt/adapters/oracle/keyword_catalog.py
  • Handled column quoting in incremental materialization macros

- This addresses github issue #8
- Oracle Adapter class exposes a new method called `should_identifier_be_quoted(identifier)` which returns true or false if the identifier is either in list of Oracle keywords or does not start with alphabet
- Macro {{ column.name }} will return  quoted name if column name is either in list of Oracle keywords or does not start with alphabet
- List of Oracle keywords is defined in dbt/adapters/oracle/keyword_catalog.py from https://docs.oracle.com/en/database/oracle/oracle-database/21/zzpre/Oracle-reserved-words-keywords-namespaces.html#GUID-25FE5FB4-5B17-4AFA-9B59-77B6036EF579
- Handled column quoting in incremental materialization (MERGE, INSERT, UPDATE)
@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Jul 20, 2022
@aosingh
Copy link
Member Author

aosingh commented Jul 20, 2022

FYI @ThoSap

@thbaby
Copy link

thbaby commented Jul 20, 2022

There are more scenarios when should_identifier_be_quoted should return true. It is difficult to provide a comprehensive list. In general, if any object's name was quoted when it was created, then its name should be quoted later when accessed.

@aosingh
Copy link
Member Author

aosingh commented Jul 20, 2022

There are more scenarios when should_identifier_be_quoted should return true. It is difficult to provide a comprehensive list. In general, if any object's name was quoted when it was created, then its name should be quoted later when accessed.

Thanks @thbaby, I agree.

Currently, if a object's name was quoted when it was created then dbt let's you enable quoting using the quote configuration. Below is an example

version: 2

models:
  - name: employee
    columns:
      - name: employee_first_name
        quote: true

I am checking if we can access this configuration in should_identifier_be_quoted and then return True accordingly.

If the above does not work, I was thinking of removing should_identifier_be_quoted and exposing the following methods in adapter.

  • adapter.is_keyword(identifier)
  • adapter.starts_with_alphabet(identifier)

This will make the purpose of the methods clear and also address the issue in the bug

If a column uses a reserved keyword or starts with an invalid character dbt-oracle by default should treat the columns to be quoted and uppercase.

Let me know if you have any feedback or questions

@thbaby
Copy link

thbaby commented Jul 21, 2022

If at all possible, it'd be good to handle all cases in one shot rather than incrementally handling case by case.

aosingh added 2 commits July 22, 2022 17:05
- adapter.should_identifier_be_quoted(identifier) handles 3 cases when an identifier should be quoted.
- adapter.check_and_quote_identifier(identifier) is exposed to quote an identifier
- A new macro is get_quoted_column_csv(model, column_names) is added to quote a list of column names
- The implementation of ``{{column.name}}`` is reverted in exchange for a single quoting api
- Incremental materialization macros are fixed to use the adapter.check_and_quote_identifier()
- 4 new unit test cases to test quoting for different scenarios
@aosingh
Copy link
Member Author

aosingh commented Jul 23, 2022

If at all possible, it'd be good to handle all cases in one shot rather than incrementally handling case by case.

Hi @thbaby

I have made changes to adapter.should_identifier_be_quoted(identifier) which returns True in the following 3 scenarios

  • Identifier is an Oracle keyword

  • Identifier is not valid according to the following rules. (This is as per DBMS_ASSERT.SIMPLE_SQL_NAME)

    • The name must begin with an alphabetic character. It may contain alphanumeric characters as well as the characters _, $, and # in the second and subsequent character positions.
  • User has enabled quoting for the column in the model configuration

I have added more test cases to test quoting for different dbt flows and they pass as expected

Please review and let me know if you have any questions or feedback.

Thanks

@thbaby
Copy link

thbaby commented Jul 23, 2022

Consider these cases:

create table foo (...); -- creates a table named FOO
create table "foo" (...); -- creates a table named foo
create table Foo (...); -- creates a table named FOO
create table "Foo" (...); -- creates a table named Foo

Are all these cases handled by the new changes you have made?

@aosingh
Copy link
Member Author

aosingh commented Jul 24, 2022

Thanks @thbaby, In this PR, I have tested dbt flows with quoted column names. Below I have explained 2 of the test cases and the SQL code generated by dbt-oracle

Test case 1 - Incremental merge for columns with special characters, spaces and keywords

Sample seed data

_user_id user name birth_ date in yyyy-mm-dd income last_login_date desc
1 Easton 1981-05-20 40000 2022-04-25T08:57:59 login

SQL generated

Create table seed

 CREATE TABLE dbt_test.seed 
      ("_user_id" number,
      "user name" varchar2(16),
      "birth_ date in yyyy-mm-dd" timestamp,
       income number,
       last_login_date timestamp,
      "desc" varchar2(16))

Insert csv records

INSERT ALL
    INTO dbt_test.seed ("_user_id", "user name", "birth_ date in yyyy-mm-dd", income, last_login_date, "desc") values(:p1,:p2,:p3,:p4,:p5,:p6)
    INTO dbt_test.seed ("_user_id", "user name", "birth_ date in yyyy-mm-dd", income, last_login_date, "desc") values(:p1,:p2,:p3,:p4,:p5,:p6)
    INTO dbt_test.seed ("_user_id", "user name", "birth_ date in yyyy-mm-dd", income, last_login_date, "desc") values(:p1,:p2,:p3,:p4,:p5,:p6)

Create the model SQL

CREATE  TABLE dbt_test.my_incr_model
  AS  
SELECT * FROM dbt_test.seed

Insert 2 new rows
The following code in not generated by dbt. I insert 2 new rows in the seed table.

INSERT ALL
    INTO dbt_test.seed ("_user_id", "user name", "birth_ date in yyyy-mm-dd", income, last_login_date, "desc") VALUES 
        (2,'Lillian Sr.', TO_DATE('1982-02-03', 'YYYY-MM-DD'), 200000, TO_DATE('2022-05-01 06:01:31', 'YYYY-MM-DD HH:MI:SS'), 'Login')
    INTO dbt_test.seed ("_user_id", "user name", "birth_ date in yyyy-mm-dd", income, last_login_date, "desc") VALUES 
        (5,'John Doe',TO_DATE('1992-10-01', 'YYYY-MM-DD'), 300000, TO_DATE('2022-06-01 06:01:31', 'YYYY-MM-DD HH:MI:SS'), 'Login')
SELECT * FROM dual

Create global tmp table

This creates a temp table with above inserted 2 new rows

CREATE GLOBAL TEMPORARY table o$pt_my_incr_model172219
ON COMMIT PRESERVE ROWS
    AS
   SELECT * FROM dbt_test.seed
   WHERE last_login_date > (SELECT max(last_login_date) FROM dbt_test.my_incr_model)

Merge based on the criteria specified for uniqueness

merge into dbt_test.my_incr_model target
          using o$pt_my_incr_model172219 temp
          on (
                    temp."_user_id" = target."_user_id"
            )
        when matched then
          update set
          target."user name" = temp."user name", 
          target."birth_ date in yyyy-mm-dd" = temp."birth_ date in yyyy-mm-dd", 
          target.INCOME = temp.INCOME, 
          target.LAST_LOGIN_DATE = temp.LAST_LOGIN_DATE, 
          target."desc" = temp."desc"
          when not matched then
          insert("_user_id", "user name", "birth_ date in yyyy-mm-dd", INCOME, LAST_LOGIN_DATE, "desc")
          values(
            temp."_user_id", 
            temp."user name", 
            temp."birth_ date in yyyy-mm-dd", 
            temp.INCOME, 
            temp.LAST_LOGIN_DATE, 
            temp."desc"
            )

For test case source, refer TestIncrementalMergeQuoteWithKeywordsandSpecialChars

Test Case 2 - Incremental merge with schema sync for columns with special characters, spaces and keywords

Sample seed data

_user_id user name birth_ date in yyyy-mm-dd income last_login_date desc
1 Easton 1981-05-20 40000 2022-04-25T08:57:59 login

SQL generated

Create table seed

 CREATE TABLE dbt_test.seed 
      ("_user_id" number,
      "user name" varchar2(16),
      "birth_ date in yyyy-mm-dd" timestamp,
       income number,
       last_login_date timestamp,
      "desc" varchar2(16))

Insert csv records

INSERT ALL
    INTO dbt_test.seed ("_user_id", "user name", "birth_ date in yyyy-mm-dd", income, last_login_date, "desc") values(:p1,:p2,:p3,:p4,:p5,:p6)
    INTO dbt_test.seed ("_user_id", "user name", "birth_ date in yyyy-mm-dd", income, last_login_date, "desc") values(:p1,:p2,:p3,:p4,:p5,:p6)
    INTO dbt_test.seed ("_user_id", "user name", "birth_ date in yyyy-mm-dd", income, last_login_date, "desc") values(:p1,:p2,:p3,:p4,:p5,:p6)

Create the model SQL

CREATE  TABLE dbt_test.my_incr_model
  AS  
SELECT * FROM dbt_test.seed

Trigger schema sync
The following code is not generated by dbt. I add and drop a column in seed table. This should trigger a schema sync and dbt-oracle should make the corresponding changes in the model table

ALTER TABLE dbt_test.seed ADD ("birth date in yyyy-mm-dd" DATE )
ALTER TABLE dbt_test.seed DROP ("birth_ date in yyyy-mm-dd") CASCADE CONSTRAINTS

Insert 2 new rows
The following code in not generated by dbt. I insert 2 new rows in the seed table.

INSERT ALL
    INTO dbt_test.seed ("_user_id", "user name", "birth date in yyyy-mm-dd", income, last_login_date, "desc") VALUES 
        (2,'Lillian Sr.', TO_DATE('1982-02-03', 'YYYY-MM-DD'), 200000, TO_DATE('2022-05-01 06:01:31', 'YYYY-MM-DD HH:MI:SS'), 'Login')
    INTO dbt_test.seed ("_user_id", "user name", "birth date in yyyy-mm-dd", income, last_login_date, "desc") VALUES 
        (5,'John Doe',TO_DATE('1992-10-01', 'YYYY-MM-DD'), 300000, TO_DATE('2022-06-01 06:01:31', 'YYYY-MM-DD HH:MI:SS'), 'Login')
SELECT * FROM dual

Create global tmp table

This creates a temp table with above inserted 2 new rows

create global temporary table o$pt_my_incr_model180414
  on commit preserve rows
  as
    
SELECT * FROM dbt_test.seed

    WHERE last_login_date > (SELECT max(last_login_date) FROM dbt_test.my_incr_model)

Sync schema changes

dbt-oracle will detect the schema changes and sync it accordingly

 ALTER table DBT_TEST.MY_INCR_MODEL
              ADD (
              
                "birth date in yyyy-mm-dd" DATE
              
              )
ALTER table DBT_TEST.MY_INCR_MODEL
              DROP (
                
                  "birth_ date in yyyy-mm-dd"
                
                ) CASCADE CONSTRAINTS

Merge based on the criteria specified for uniqueness

merge into dbt_test.my_incr_model target
          using o$pt_my_incr_model180414 temp
          on (
                    temp."_user_id" = target."_user_id"
            )
        when matched then
          update set
          target."user name" = temp."user name", 
          target.INCOME = temp.INCOME, 
          target.LAST_LOGIN_DATE = temp.LAST_LOGIN_DATE, 
          target."desc" = temp."desc", 
          target."birth date in yyyy-mm-dd" = temp."birth date in yyyy-mm-dd"
          when not matched then
          insert("_user_id", "user name", INCOME, LAST_LOGIN_DATE, "desc", "birth date in yyyy-mm-dd")
          values(
            temp."_user_id", 
            temp."user name", 
            temp.INCOME, 
            temp.LAST_LOGIN_DATE, 
            temp."desc", 
            temp."birth date in yyyy-mm-dd"
            )

For test case source, refer TestSyncSchemaIncrementalMergeQuotedColumns

Relations (tables and views)

Relation names are handled a bit differently. Although, I should test relation names with quoting as well.

  • For relations, the name of the .sql file is the object name in the database. For a model file, test.sql a table called test is created in the following manner

     CREATE TABLE test AS 
     {{ SELECT STATEMENT defined in test.sql }}
  • Quoting for relation names can be enabled at a dbt project level.

    quoting:
        database: true | false
        schema: true | false
        identifier: true | false
    

To test this, I will enable quoting for relation names, define a model sql file, materialize it as a table and verify that the methodadapter.get_relation(relation) returns the correct relation based on the matched name. I will let you know once I have tested scenarios for relation names. Meanwhile, please let me know if you have any questions or feedback.

aosingh added 2 commits July 24, 2022 20:21
- Removed hardcoding of quoting configurations from all macros. Config should be picked from dbt project config
- Added 4 new test cases for relation names
@aosingh
Copy link
Member Author

aosingh commented Jul 25, 2022

@thbaby

I have tested the following scenarios for relation names and they pass as-expected. These are also included in our test suite to run after every commit

Test Case Name Model name Table created Status
TestQuotedLowerCaseTableName "foo" foo Pass
TestQuotedTitleCaseTableName "Foo" Foo Pass
TestUnQuotedLowerCaseTableName foo FOO Pass
TestUnQuotedTitleCaseTableName Foo FOO Pass

Please let me know if you have any questions or feedback.

@aosingh aosingh merged commit 26f7631 into main Jul 25, 2022
@aosingh aosingh deleted the feature/quoting branch July 25, 2022 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

OCA Verified All contributors have signed the Oracle Contributor Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants