-
Notifications
You must be signed in to change notification settings - Fork 24
Feature enhancement to support quoting for column names #36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- This addresses github issue #8 - Oracle Adapter class exposes a new method called `should_identifier_be_quoted(identifier)` which returns true or false if the identifier is either in list of Oracle keywords or does not start with alphabet - Macro {{ column.name }} will return quoted name if column name is either in list of Oracle keywords or does not start with alphabet - List of Oracle keywords is defined in dbt/adapters/oracle/keyword_catalog.py from https://docs.oracle.com/en/database/oracle/oracle-database/21/zzpre/Oracle-reserved-words-keywords-namespaces.html#GUID-25FE5FB4-5B17-4AFA-9B59-77B6036EF579 - Handled column quoting in incremental materialization (MERGE, INSERT, UPDATE)
|
FYI @ThoSap |
|
There are more scenarios when should_identifier_be_quoted should return true. It is difficult to provide a comprehensive list. In general, if any object's name was quoted when it was created, then its name should be quoted later when accessed. |
Thanks @thbaby, I agree. Currently, if a object's name was quoted when it was created then dbt let's you enable quoting using the quote configuration. Below is an example version: 2
models:
- name: employee
columns:
- name: employee_first_name
quote: true
I am checking if we can access this configuration in If the above does not work, I was thinking of removing
This will make the purpose of the methods clear and also address the issue in the bug
Let me know if you have any feedback or questions |
|
If at all possible, it'd be good to handle all cases in one shot rather than incrementally handling case by case. |
- adapter.should_identifier_be_quoted(identifier) handles 3 cases when an identifier should be quoted.
- adapter.check_and_quote_identifier(identifier) is exposed to quote an identifier
- A new macro is get_quoted_column_csv(model, column_names) is added to quote a list of column names
- The implementation of ``{{column.name}}`` is reverted in exchange for a single quoting api
- Incremental materialization macros are fixed to use the adapter.check_and_quote_identifier()
- 4 new unit test cases to test quoting for different scenarios
Hi @thbaby I have made changes to
I have added more test cases to test quoting for different dbt flows and they pass as expected Please review and let me know if you have any questions or feedback. Thanks |
|
Consider these cases: create table foo (...); -- creates a table named FOO Are all these cases handled by the new changes you have made? |
|
Thanks @thbaby, In this PR, I have tested dbt flows with quoted column names. Below I have explained 2 of the test cases and the SQL code generated by dbt-oracle Test case 1 - Incremental merge for columns with special characters, spaces and keywordsSample seed data
SQL generatedCreate table seed CREATE TABLE dbt_test.seed
("_user_id" number,
"user name" varchar2(16),
"birth_ date in yyyy-mm-dd" timestamp,
income number,
last_login_date timestamp,
"desc" varchar2(16))Insert csv records INSERT ALL
INTO dbt_test.seed ("_user_id", "user name", "birth_ date in yyyy-mm-dd", income, last_login_date, "desc") values(:p1,:p2,:p3,:p4,:p5,:p6)
INTO dbt_test.seed ("_user_id", "user name", "birth_ date in yyyy-mm-dd", income, last_login_date, "desc") values(:p1,:p2,:p3,:p4,:p5,:p6)
INTO dbt_test.seed ("_user_id", "user name", "birth_ date in yyyy-mm-dd", income, last_login_date, "desc") values(:p1,:p2,:p3,:p4,:p5,:p6)Create the model SQL CREATE TABLE dbt_test.my_incr_model
AS
SELECT * FROM dbt_test.seedInsert 2 new rows INSERT ALL
INTO dbt_test.seed ("_user_id", "user name", "birth_ date in yyyy-mm-dd", income, last_login_date, "desc") VALUES
(2,'Lillian Sr.', TO_DATE('1982-02-03', 'YYYY-MM-DD'), 200000, TO_DATE('2022-05-01 06:01:31', 'YYYY-MM-DD HH:MI:SS'), 'Login')
INTO dbt_test.seed ("_user_id", "user name", "birth_ date in yyyy-mm-dd", income, last_login_date, "desc") VALUES
(5,'John Doe',TO_DATE('1992-10-01', 'YYYY-MM-DD'), 300000, TO_DATE('2022-06-01 06:01:31', 'YYYY-MM-DD HH:MI:SS'), 'Login')
SELECT * FROM dualCreate global tmp table This creates a temp table with above inserted 2 new rows CREATE GLOBAL TEMPORARY table o$pt_my_incr_model172219
ON COMMIT PRESERVE ROWS
AS
SELECT * FROM dbt_test.seed
WHERE last_login_date > (SELECT max(last_login_date) FROM dbt_test.my_incr_model)Merge based on the criteria specified for uniqueness merge into dbt_test.my_incr_model target
using o$pt_my_incr_model172219 temp
on (
temp."_user_id" = target."_user_id"
)
when matched then
update set
target."user name" = temp."user name",
target."birth_ date in yyyy-mm-dd" = temp."birth_ date in yyyy-mm-dd",
target.INCOME = temp.INCOME,
target.LAST_LOGIN_DATE = temp.LAST_LOGIN_DATE,
target."desc" = temp."desc"
when not matched then
insert("_user_id", "user name", "birth_ date in yyyy-mm-dd", INCOME, LAST_LOGIN_DATE, "desc")
values(
temp."_user_id",
temp."user name",
temp."birth_ date in yyyy-mm-dd",
temp.INCOME,
temp.LAST_LOGIN_DATE,
temp."desc"
)
For test case source, refer TestIncrementalMergeQuoteWithKeywordsandSpecialChars Test Case 2 - Incremental merge with schema sync for columns with special characters, spaces and keywordsSample seed data
SQL generatedCreate table seed CREATE TABLE dbt_test.seed
("_user_id" number,
"user name" varchar2(16),
"birth_ date in yyyy-mm-dd" timestamp,
income number,
last_login_date timestamp,
"desc" varchar2(16))Insert csv records INSERT ALL
INTO dbt_test.seed ("_user_id", "user name", "birth_ date in yyyy-mm-dd", income, last_login_date, "desc") values(:p1,:p2,:p3,:p4,:p5,:p6)
INTO dbt_test.seed ("_user_id", "user name", "birth_ date in yyyy-mm-dd", income, last_login_date, "desc") values(:p1,:p2,:p3,:p4,:p5,:p6)
INTO dbt_test.seed ("_user_id", "user name", "birth_ date in yyyy-mm-dd", income, last_login_date, "desc") values(:p1,:p2,:p3,:p4,:p5,:p6)Create the model SQL CREATE TABLE dbt_test.my_incr_model
AS
SELECT * FROM dbt_test.seedTrigger schema sync ALTER TABLE dbt_test.seed ADD ("birth date in yyyy-mm-dd" DATE )
ALTER TABLE dbt_test.seed DROP ("birth_ date in yyyy-mm-dd") CASCADE CONSTRAINTSInsert 2 new rows INSERT ALL
INTO dbt_test.seed ("_user_id", "user name", "birth date in yyyy-mm-dd", income, last_login_date, "desc") VALUES
(2,'Lillian Sr.', TO_DATE('1982-02-03', 'YYYY-MM-DD'), 200000, TO_DATE('2022-05-01 06:01:31', 'YYYY-MM-DD HH:MI:SS'), 'Login')
INTO dbt_test.seed ("_user_id", "user name", "birth date in yyyy-mm-dd", income, last_login_date, "desc") VALUES
(5,'John Doe',TO_DATE('1992-10-01', 'YYYY-MM-DD'), 300000, TO_DATE('2022-06-01 06:01:31', 'YYYY-MM-DD HH:MI:SS'), 'Login')
SELECT * FROM dualCreate global tmp table This creates a temp table with above inserted 2 new rows create global temporary table o$pt_my_incr_model180414
on commit preserve rows
as
SELECT * FROM dbt_test.seed
WHERE last_login_date > (SELECT max(last_login_date) FROM dbt_test.my_incr_model)Sync schema changes dbt-oracle will detect the schema changes and sync it accordingly ALTER table DBT_TEST.MY_INCR_MODEL
ADD (
"birth date in yyyy-mm-dd" DATE
)
ALTER table DBT_TEST.MY_INCR_MODEL
DROP (
"birth_ date in yyyy-mm-dd"
) CASCADE CONSTRAINTSMerge based on the criteria specified for uniqueness merge into dbt_test.my_incr_model target
using o$pt_my_incr_model180414 temp
on (
temp."_user_id" = target."_user_id"
)
when matched then
update set
target."user name" = temp."user name",
target.INCOME = temp.INCOME,
target.LAST_LOGIN_DATE = temp.LAST_LOGIN_DATE,
target."desc" = temp."desc",
target."birth date in yyyy-mm-dd" = temp."birth date in yyyy-mm-dd"
when not matched then
insert("_user_id", "user name", INCOME, LAST_LOGIN_DATE, "desc", "birth date in yyyy-mm-dd")
values(
temp."_user_id",
temp."user name",
temp.INCOME,
temp.LAST_LOGIN_DATE,
temp."desc",
temp."birth date in yyyy-mm-dd"
)
For test case source, refer TestSyncSchemaIncrementalMergeQuotedColumns Relations (tables and views)Relation names are handled a bit differently. Although, I should test relation names with quoting as well.
To test this, I will enable quoting for relation names, define a model sql file, materialize it as a table and verify that the method |
- Removed hardcoding of quoting configurations from all macros. Config should be picked from dbt project config - Added 4 new test cases for relation names
|
I have tested the following scenarios for relation names and they pass as-expected. These are also included in our test suite to run after every commit
Please let me know if you have any questions or feedback. |
should_identifier_be_quoted(identifier)which returns true if the identifier is either in list of Oracle keywords or does not start with alphabet. This can be used in the macros in the following manner{{ column.name }}will return quoted name if column name is either in list of Oracle keywords or does not start with alphabetdbt/adapters/oracle/keyword_catalog.py