Add experimental `dataTypeCase` and `functionCase` options #673

karlhorky · 2023-11-27T17:54:01Z

Closes #237
Closes #653

dataTypeCase (experimental)

Converts data types to upper- or lowercase.

Note: Casing of function names like VARCHAR(30) are not modified - instead rely on the functionCase option for this.

Options

"preserve" (default) preserves the original case.
"upper" converts to uppercase.
"lower" converts to lowercase.

preserve

CREATE TABLE
  users (
    id InTeGeR PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    first_name VARCHAR(30) NOT NULL,
    bio teXT,
    is_email_verified BooL NOT NULL DEFAULT FALSE,
    created_timestamp timestamPtz NOT NULL DEFAULT NOW()
  )

upper

CREATE TABLE
  users (
    id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    first_name VARCHAR(30) NOT NULL,
    bio TEXT,
    is_email_verified BOOL NOT NULL DEFAULT FALSE,
    created_timestamp TIMESTAMPTZ NOT NULL DEFAULT NOW()
  )

lower

CREATE TABLE
  users (
    id integer PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    first_name VARCHAR(30) NOT NULL,
    bio text,
    is_email_verified bool NOT NULL DEFAULT FALSE,
    created_timestamp timestamptz NOT NULL DEFAULT NOW()
  )

functionCase (experimental)

Converts functions to upper- or lowercase.

Options

"preserve" (default) preserves the original case.
"upper" converts to uppercase.
"lower" converts to lowercase.

preserve

CREATE tabLE
  users (
    id iNtegeR PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    first_name VarChaR(30) NOT NULL,
    bio TEXT,
    is_email_verified BOOL NOT NULL DEFAULT FALSE,
    created_timestamp TIMESTAMPTZ NOT NULL DEFAULT NoW()
  )

upper

CREATE tabLE
  users (
    id iNtegeR PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    first_name VARCHAR(30) NOT NULL,
    bio TEXT,
    is_email_verified BOOL NOT NULL DEFAULT FALSE,
    created_timestamp TIMESTAMPTZ NOT NULL DEFAULT NOW()
  )

lower

CREATE tabLE
  users (
    id iNtegeR PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    first_name varchar(30) NOT NULL,
    bio TEXT,
    is_email_verified BOOL NOT NULL DEFAULT FALSE,
    created_timestamp TIMESTAMPTZ NOT NULL DEFAULT now()
  )

nene · 2023-11-27T19:43:10Z

Thanks for starting work on this.

FYI, I will be away from computer for about a week. So likely won't be reviewing this during that time.

karlhorky · 2023-11-28T19:15:49Z

@nene just so you know, digging into the problem a bit, some notes and some ideas for approaches for this PR:

I see there is no distinction in the keyword tokens between all and dataType / datatype / dataTypes:

sql-formatter/src/languages/postgresql/postgresql.keywords.ts

Lines 4 to 8 in 290f3b7

    
           // https://www.postgresql.org/docs/14/sql-keywords-appendix.html 
        
           all: [ 
        
             'ABORT', 
        
             'ABSOLUTE', 
        
             'ACCESS',

sql-formatter/src/languages/transactsql/transactsql.functions.ts

Lines 78 to 83 in 290f3b7

    
           dataType: [ 
        
             'DATALENGTH', 
        
             'IDENT_CURRENT', 
        
             'IDENT_INCR', 
        
             'IDENT_SEED', 
        
             'IDENTITY',

sql-formatter/src/languages/snowflake/snowflake.keywords.ts

Lines 102 to 106 in 290f3b7

    
           datatypes: [ 
        
             'NUMBER', 
        
             'DECIMAL', 
        
             'NUMERIC', 
        
             'INT',

sql-formatter/src/languages/transactsql/transactsql.functions.ts

Lines 303 to 309 in 290f3b7

    
           // Parameterized types 
        
           // https://docs.microsoft.com/en-us/sql/t-sql/data-types/data-types-transact-sql?view=sql-server-ver15 
        
           dataTypes: [ 
        
             'DECIMAL', 
        
             'NUMERIC', 
        
             'FLOAT', 
        
             'REAL',

This is because of the flatKeywordList() function, which effectively discards the key information:

sql-formatter/src/utils.ts

Lines 18 to 20 in 290f3b7

    
           // Used for flattening keyword lists 
        
           export const flatKeywordList = (obj: Record<string, string[]>): string[] => 
        
             dedupe(Object.values(obj).flat());

I found out that there are also function data types (eg. PostgreSQL parameterized data types)
Because of this, I would like to propose multiple potentially controversial changes (I would like to do 2 separate PRs for 3i and 3ii below before I start on #673, to keep the diffs small and PR reviews simple):
1. Simplify the src/languages/*/*.keywords.ts files to export 1 flat array (with empty lines + comments as separators instead of object keys) and delete the flatKeywordList() function - this will prevent unusual, inconsistent naming conventions and also clarify that the object keys do nothing
2. I would want to then do a separate PR to separate out all of the data type information in src/languages/*/*.keywords.ts in these flat arrays in into a separate (optional) export in each file called export const dataTypes, which would temporarily merged together with keywords to keep compatibility before reservedDataTypes in the next step is implemented
3. I would like to introduce a new token type called reservedDataTypes to TokenizerOptions also in #673, and pass in the dataTypes export from each file to this - I would do this in a way similarly to how you implemented multi-word keywords in #363
4. I would like to introduce a second option at the same time in #673 called functionCase, which will have the same options as dataTypeCase and keywordCase, but will case functions such as varchar() and decimal() (and also non-data type functions such as now())

Thoughts on these implementation approaches?

nene · 2023-11-28T20:02:41Z

Seems like a good approach.

I agree that the current structured objects used for keywords can cause unnecessary confusion. Probably best to da away with these. 👍

karlhorky · 2023-11-29T16:08:19Z

PRs open here:

3i Switch from flatKeywordList() to flat arrays #676
3ii Separate data types from generic keywords #677 (needs 676 merged before review)

@nene Since this may need to wait for your return, I will continue on with #673 by rebasing on #677's branch

…ter into add-datatypeformat-option

karlhorky · 2023-11-30T16:55:52Z

Further progress:

3iii (TokenType.RESERVED_DATA_TYPE) and dataTypeCase option has been implemented here: a555575

This is my first time working on code for a parser, so it's very possible that I got something wrong. I hope it's not too far off!

karlhorky · 2023-11-30T17:22:40Z

Finally, last changes:

3iv (functionCase option) has been implemented here 4ae6c48
to avoid this being a breaking change, I also changed the default of dataTypeCase and functionCase to the options.keywordCase value passed in by the user
- if it's better to avoid this connection and make it a breaking change (default 'preserve') I can revert to this

So I think after merging #677 and #676, this PR ready for a first round of feedback @nene. Hopefully my parser / tokenizer code is not too crazy.

Part of preparations for PR #673 Copied from [my original comment](#673 (comment)): I see there is no distinction in the keyword tokens between `all` and `dataType` / `datatype` / `dataTypes`: https://github.com/sql-formatter-org/sql-formatter/blob/290f3b767492d5e3204866447f9f4e4c098f825f/src/languages/postgresql/postgresql.keywords.ts#L4-L8 https://github.com/sql-formatter-org/sql-formatter/blob/290f3b767492d5e3204866447f9f4e4c098f825f/src/languages/transactsql/transactsql.functions.ts#L78-L83 https://github.com/sql-formatter-org/sql-formatter/blob/290f3b767492d5e3204866447f9f4e4c098f825f/src/languages/snowflake/snowflake.keywords.ts#L102-L106 https://github.com/sql-formatter-org/sql-formatter/blob/290f3b767492d5e3204866447f9f4e4c098f825f/src/languages/transactsql/transactsql.functions.ts#L303-L309 This is because of the `flatKeywordList()` function, which effectively discards the key information: https://github.com/sql-formatter-org/sql-formatter/blob/290f3b767492d5e3204866447f9f4e4c098f825f/src/utils.ts#L18-L20 Because of this, I would like to simplify the `src/languages/*/*.keywords.ts` files to export 1 flat array (with empty lines + comments as separators instead of object keys) and delete the `flatKeywordList()` function - this will prevent unusual, inconsistent naming conventions and also clarify that the object keys do nothing. ## Reviewer notes 1. Best to review this with whitespace changes turned off: https://github.com/sql-formatter-org/sql-formatter/pull/676/files?w=1 2. This PR also enables GitHub Actions workflows CI for pull requests on branches other than `master` and `develop`, so that PRs from community members can be tested on GitHub Actions 3. while removing the object keys, I added comments as such: a. if there was a commented URL above the key, I removed the key and left the URL as the sole comment (eg. in `src/languages/db2i/db2i.functions.ts`) b. if there was no other comment (or if the commented URL was generic, for multiple sections), I left the key name as the comment (eg. in `src/languages/hive/hive.functions.ts`)

nene · 2023-12-01T10:39:22Z

In general this looks fine.

to avoid this being a breaking change, I also changed the default of dataTypeCase and functionCase to the options.keywordCase value passed in by the user

if it's better to avoid this connection and make it a breaking change (default 'preserve') I can revert to this

Yes. I think it would keep the API simpler if there aren't any dependencies between these options. This whole thing will definitely be a major version bump, so we can break backwards compatibility a bit. For anybody upgrading, the change is pretty trivial to do, plus they can make a meaningful choice of whether they actually want function names and data types uppercased. I do very much appreciate your concern for backwards compatibility though.

I would also like to see some tests for these new options. Like something similar that exists in test/options/keywordCase.ts. Also some tests to cover how the keywordCase and dataTypeCase effect the ARRAY keyword/data-type. Probably in test/features/arrayLiterals.ts.

I'd like to explore this implementation a bit more when I get back to the computer. But for now at least nothing obviously wrong jumps out to me.

…ter into add-datatypeformat-option

Part of preparations for PR #673 --- Copied from [my comment](#673 (comment)): I would want to then do a separate PR to separate out all of the data type information in `src/languages/*/*.keywords.ts` in these flat arrays in into a separate (optional) export in each file called `export const dataTypes`, which would temporarily merged together with `keywords` to keep compatibility before `reservedDataTypes` in the next step is implemented

codesandbox-ci · 2023-12-01T16:44:08Z

This pull request is automatically built and testable in CodeSandbox.

To see build info of the built libraries, click here or the icon next to each commit SHA.

karlhorky · 2023-12-02T14:53:10Z

Yes. I think it would keep the API simpler if there aren't any dependencies between these options.

Ok, I've reverted that change in 4f46845

I would also like to see some tests for these new options.

Like something similar that exists in test/options/keywordCase.ts.

Done in 8d81244

One note about this approach below

Also some tests to cover how the keywordCase and dataTypeCase effect the ARRAY keyword/data-type. Probably in test/features/arrayLiterals.ts.

Done in 66d596e

karlhorky · 2023-12-02T17:28:56Z

I tested the version published by CodeSandbox CI in one of our projects as noted below, and both options dataTypeCase and functionCase seem to be working 🙌

package.json

{
  "dependencies": {
    "sql-formatter": "https://pkg.csb.dev/sql-formatter-org/sql-formatter/commit/a4e40704/sql-formatter"
  }
}

karlhorky · 2023-12-03T12:57:04Z

src/lexer/disambiguateTokens.ts

    if (!nextToken || !isOpenParen(nextToken)) {
-      return { ...token, type: TokenType.RESERVED_KEYWORD };
+      return {
+        ...token,
+        type:
+          // Function names which are also data types
+          [
+            'BIGINT',
+            'BINARY',
+            'BIT',
+            'BLOB',
+            'BOOLEAN',
+            'CHAR',
+            'DATE',
+            'DECIMAL',
+            'DOUBLE',
+            'FLOAT',
+            'INT',
+            'INTEGER',
+            'JSON',
+            'NCHAR',
+            'NUMBER',
+            'NUMERIC',
+            'NVARCHAR',
+            'REAL',
+            'SMALLINT',
+            'TEXT',
+            'TIME',
+            'TIMESTAMP',
+            'TINYINT',
+            'VARCHAR',
+            'XML',
+          ].includes(token.text)
+            ? TokenType.RESERVED_DATA_TYPE
+            : TokenType.RESERVED_KEYWORD,
+      };


@nene I'm not sure about this approach - I would rather get the list of data types from the dialect.

But since disambiguateTokens() is used in the parser, it seems like passing in dialect information would not be desirable.

So instead I went for a list of commonly-overlapping function + data types - these will only be classified as TokenType.RESERVED_DATA_TYPE if they are *also* a function in the dialect.

Yeah, this is definitely problematic.

The root problem seems to be that these data types are also defined as functions, which previously was only done because we formatted those data-types like functions e.g. DECIMAL(5, 2). Now that we have data types defined separately, there's really no reason to include them among the functions any more.

another alternative approach that I thought of would be:

map each dialect to an array

import + deduplicate all function names

import + deduplicate all data type names

find the overlapping entries

merge all of the arrays and deduplicate

this would make the items more accurate

but I think you're suggesting to go in a different direction with the grammar... I don't think I fully understand what you mean yet

trying to understand what you mean here:

The root problem seems to be that these data types are also defined as functions, which previously was only done because we formatted those data-types like functions e.g. DECIMAL(5, 2). Now that we have data types defined separately, there's really no reason to include them among the functions any more.

my assumptions are:

A) sql-formatter would like to keep DECIMAL(5, 2) recognized as a function, to allow for formatting of the function arguments and parentheses
B) the conversion of the DECIMAL part of this to a data type is still necessary

so I'm unsure of what you mean to suggest here

2 hypotheses of what you mean:

X) it sounds like maybe you're saying that we could remove the funcNameToKeyword function altogether from disambiguateTokens. just as a note, if this is the case, I did try going down this path, but it caused problems (although I'm probably not well-versed enough with the code in order to know if these problems are easy to resolve)

Y) another way of interpreting your comment leads me to believe you're suggesting removing DECIMAL (and anything else that is a data type) from *.functions.ts - but I think that also would be problematic...?

Yes, I would indeed suggest this last option:

Y) another way of interpreting your comment leads me to believe you're suggesting removing DECIMAL (and anything else that is a data type) from *.functions.ts - but I think that also would be problematic...?

Simply removing them will of course result in DECIMAL(1, 2) being formatted as DECIMAL (1, 2) as the formatter no more recognizes as formats it as a function. What should be done, is also changing the parser so it recognizes both <func-name> <parenthesis> and <data-type> <parenthesis> as function calls. Or alternatively we could also introduce a separate node like ParameterizedDataType for the latter.

I'm happy to look into this by myself as I'm now finally back at my computer, and I have a hunch that there are some quirks of the parser that will also be getting in the way of this.

Ok thanks for picking this one up!

One thing to keep in mind and decide on is what the behavior of conflicting options would be in that DECIMAL(1, 2) case, eg:

{ functionCase: 'lower', dataTypeCase: 'upper', }

Example query

SELECT DeCimal(1, 2) AS dec, noW() AS n

I'm guessing it would be this:

SELECT DECIMAL(1, 2) AS dec, now() AS n

If this is true, then I think the docs for dataTypeCase (and maybe also functionCase) need to be updated too

Yep. Definitely like so. Treating DECIMAL() as a function is an implementation detail of the formatter. It would very much confuse users when dataTypeCase doesn't affect parameterized data types.

nene · 2023-12-06T11:09:04Z

test/features/createTable.ts

+  if (!cfg.dialectDoesntHaveVarchar) {
+    it('formats short CREATE TABLE with lowercase data types', () => {
+      expect(
+        format('CREATE TABLE tbl (a INT PRIMARY KEY, b VARCHAR);', {
+          dataTypeCase: 'lower',
+        })
+      ).toBe(dedent`
+      CREATE TABLE
+        tbl (a int PRIMARY KEY, b varchar);
+    `);
+    });
+  }


This test looks completely unnecessary. The lowercasing of data types is already tested by the supportsDataTypeCase() test.

It's also problematic to execute this test only for dialects that support VARCHAR - it's not like lowercasing of data types shouldn't be supported when dialect has no VARCHAR support.

nene · 2023-12-06T19:42:14Z

FYI @karlhorky, created a PR with some initial progress: #689

karlhorky · 2023-12-07T12:51:02Z

@nene Thanks for reviewing, taking over in #689 and merging! 🙌

karlhorky mentioned this pull request Nov 27, 2023

Upper case option affect table fields names #156

Closed

karlhorky changed the title ~~Add experimental datatypeFormat option for changing data type case~~ Add experimental datatypeCase option for changing data type case Nov 27, 2023

karlhorky changed the title ~~Add experimental datatypeCase option for changing data type case~~ Add experimental dataTypeCase option for changing data type case Nov 28, 2023

karlhorky changed the title ~~Add experimental dataTypeCase option for changing data type case~~ Add experimental dataTypeCase and functionCase options Nov 28, 2023

karlhorky mentioned this pull request Nov 29, 2023

Switch from flatKeywordList() to flat arrays #676

Merged

karlhorky mentioned this pull request Nov 29, 2023

Separate data types from generic keywords #677

Merged

karlhorky added 6 commits November 29, 2023 18:19

Identify PostgreSQL data types, add docs

bdf3be9

Add docs for functionCase, rename to dataTypeCase

c6fb0b6

Add caveat about dataTypeCase language support

2ba9d0a

Move more data types to separate constant

10b4451

Add TokenType.RESERVED_DATA_TYPE and dataTypeCase

a555575

Merge branch 'separate-data-types' of github.com:karlhorky/sql-format…

129ba96

…ter into add-datatypeformat-option

karlhorky added 2 commits November 30, 2023 18:14

Add functionCase option, make keywordCase default

4ae6c48

Improve docs

f8df2f6

karlhorky marked this pull request as ready for review November 30, 2023 17:22

karlhorky added 3 commits December 1, 2023 14:47

Merge branch 'separate-data-types' of github.com:karlhorky/sql-format…

d57af57

…ter into add-datatypeformat-option

Switch to reservedDataTypes

9d4c929

Revert to independent defaults

4f46845

karlhorky added 2 commits December 1, 2023 17:33

Merge branch 'sql-formatter-org:master' into add-datatypeformat-option

bf17e85

Address ESLint no-nested-ternary rule error

9815a46

karlhorky added 3 commits December 1, 2023 18:09

Improve grammar and wording

fb60928

Add tests for casing ARRAY data types

66d596e

Add tests for dataTypeCase and functionCase

8d81244

Improve docs

a4e4070

karlhorky mentioned this pull request Dec 2, 2023

prettier-plugin-sql: Allow dialect as alternative to language un-ts/prettier#322

Closed

karlhorky mentioned this pull request Dec 3, 2023

sqlDataTypeCase and sqlFunctionCase options? nene/prettier-plugin-sql-cst#13

Open

Add NUMERIC

4ee4965

karlhorky commented Dec 3, 2023

View reviewed changes

karlhorky mentioned this pull request Dec 5, 2023

Allow uppercasing of function names separately from keywords #237

Closed

nene reviewed Dec 6, 2023

View reviewed changes

nene mentioned this pull request Dec 6, 2023

Add dataTypeCase & functionCase #689

Merged

nene merged commit 4ee4965 into sql-formatter-org:master Dec 7, 2023
2 checks passed

karlhorky deleted the add-datatypeformat-option branch December 7, 2023 12:51

karlhorky mentioned this pull request Dec 8, 2023

prettier-plugin-sql: Upgrade to sql-formatter@15 un-ts/prettier#326

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add experimental `dataTypeCase` and `functionCase` options #673

Add experimental `dataTypeCase` and `functionCase` options #673

karlhorky commented Nov 27, 2023 •

edited

Loading

nene commented Nov 27, 2023

karlhorky commented Nov 28, 2023 •

edited

Loading

nene commented Nov 28, 2023

karlhorky commented Nov 29, 2023 •

edited

Loading

karlhorky commented Nov 30, 2023

karlhorky commented Nov 30, 2023 •

edited

Loading

nene commented Dec 1, 2023

codesandbox-ci bot commented Dec 1, 2023 •

edited

Loading

karlhorky commented Dec 2, 2023 •

edited

Loading

karlhorky commented Dec 2, 2023 •

edited

Loading

karlhorky Dec 3, 2023

nene Dec 6, 2023

karlhorky Dec 6, 2023

karlhorky Dec 6, 2023 •

edited

Loading

nene Dec 6, 2023

karlhorky Dec 6, 2023 •

edited

Loading

nene Dec 6, 2023

nene Dec 6, 2023

nene commented Dec 6, 2023

karlhorky commented Dec 7, 2023

Add experimental dataTypeCase and functionCase options #673

Add experimental dataTypeCase and functionCase options #673

Conversation

karlhorky commented Nov 27, 2023 • edited Loading

dataTypeCase (experimental)

Options

preserve

upper

lower

functionCase (experimental)

Options

preserve

upper

lower

nene commented Nov 27, 2023

karlhorky commented Nov 28, 2023 • edited Loading

nene commented Nov 28, 2023

karlhorky commented Nov 29, 2023 • edited Loading

karlhorky commented Nov 30, 2023

karlhorky commented Nov 30, 2023 • edited Loading

nene commented Dec 1, 2023

codesandbox-ci bot commented Dec 1, 2023 • edited Loading

karlhorky commented Dec 2, 2023 • edited Loading

karlhorky commented Dec 2, 2023 • edited Loading

karlhorky Dec 3, 2023

Choose a reason for hiding this comment

nene Dec 6, 2023

Choose a reason for hiding this comment

karlhorky Dec 6, 2023

Choose a reason for hiding this comment

karlhorky Dec 6, 2023 • edited Loading

Choose a reason for hiding this comment

nene Dec 6, 2023

Choose a reason for hiding this comment

karlhorky Dec 6, 2023 • edited Loading

Choose a reason for hiding this comment

nene Dec 6, 2023

Choose a reason for hiding this comment

nene Dec 6, 2023

Choose a reason for hiding this comment

nene commented Dec 6, 2023

karlhorky commented Dec 7, 2023

Add experimental `dataTypeCase` and `functionCase` options #673

Add experimental `dataTypeCase` and `functionCase` options #673

karlhorky commented Nov 27, 2023 •

edited

Loading

karlhorky commented Nov 28, 2023 •

edited

Loading

karlhorky commented Nov 29, 2023 •

edited

Loading

karlhorky commented Nov 30, 2023 •

edited

Loading

codesandbox-ci bot commented Dec 1, 2023 •

edited

Loading

karlhorky commented Dec 2, 2023 •

edited

Loading

karlhorky commented Dec 2, 2023 •

edited

Loading

karlhorky Dec 6, 2023 •

edited

Loading

karlhorky Dec 6, 2023 •

edited

Loading