Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle case/quoting when replacing names #34

Closed
Tracked by #36911
tsmacdonald opened this issue May 15, 2024 · 7 comments
Closed
Tracked by #36911

Handle case/quoting when replacing names #34

tsmacdonald opened this issue May 15, 2024 · 7 comments

Comments

@tsmacdonald
Copy link
Member

Similar to the work done in metabase/metabase#41781

When we call replace-names, we should handle case and quoting correctly. Currently it depends on an exact string match.

@crisptrutski
Copy link
Contributor

Recovering the original formatting I guess is the interesting part, assuming we want that.

We'd only go up to a point here, something like:

SELECT * FROM cats
SELECT * FROM CATS
SELECT * FROM CaTs

s/cats/dogs

SELECT * FROM dogs
SELECT * FROM DOGS
SELECT * FROM DOGS

@crisptrutski
Copy link
Contributor

For a v1 perhaps even the following is fine:

SELECT * FROM cats
SELECT * FROM CATS
SELECT * FROM CaTs

s/cats/dogs

SELECT * FROM dogs
SELECT * FROM dogs
SELECT * FROM dogs

@crisptrutski
Copy link
Contributor

It seems like whether we want to be case sensitive is not only driver specific, it can get pretty nuanced and difficult for us to determine for some of them.

According to ClaudeAI:

MySQL:

In MySQL, table and column names are case-sensitive on Unix-based systems (Linux, macOS) by default.
On Windows, table and column names are case-insensitive by default.
However, if the lower_case_table_names system variable is set to 1, table and column names become case-insensitive on all platforms.

PostgreSQL:

In PostgreSQL, table and column names are case-sensitive by default.
If you want case-insensitive behavior, you need to use double quotes around the identifiers when creating tables or referring to them in queries.

Microsoft SQL Server:

In SQL Server, table and column names are case-insensitive by default.
The database collation setting determines the case sensitivity behavior.
Case-sensitive collations can be specified, but they are not the default.

Oracle:

In Oracle, table and column names are case-insensitive by default.
Oracle stores table and column names in uppercase by default, but you can refer to them in any case in your queries.

SQLite:

In SQLite, table and column names are case-insensitive by default.
However, if you use double quotes around the identifiers, they become case-sensitive.

For Macaw we can kick this can of worms down the road by taking a flag, and for Metabase we can probably get by for a start by using the default for each driver, and not bothering to detect system variables, connection settings, or handling double quotes.

@tsmacdonald
Copy link
Member Author

That's subtly wrong for Postgres. Quotes preserve case, otherwise things are normalized to lowercase:

# create table quux("CaseSensitive" int, CaseInsensitive text);
CREATE TABLE
french_towns=# \d quux 
                    Table "public.quux"
     Column      |  Type   | Collation | Nullable | Default 
-----------------+---------+-----------+----------+---------
 CaseSensitive   | integer |           |          | 
 caseinsensitive | text    |           |          | 

@piranha believes that behavior is similar for other DBs we care about but I haven't personally looked into it exhaustively

@piranha
Copy link
Contributor

piranha commented May 23, 2024

Except for mysql, where it doesn't care about casing at all? But for me the only place I had to handle this is tests, actual code stayed the same. This actually depends on how db was created (which collation it uses).

@crisptrutski
Copy link
Contributor

I've realized that an interesting wrinkle here is that whether the replacement needs to be quoted depends on the replacement name only, not whether the original name was quoted. In order to preserve styling however, it would also be good to preserve quotes when they were present but not necessary on the original.

Whether we should remove them when renaming from a name that requires quoting to one that does not, i.e. whether the user likes to quote regardless, like the query processor, may be indeterminable from a small query without other references.

To keep things simple for now, we'll only quote when necessary, which saves us carrying over the initial state, but also take an optional :always-quote? flag which can override it. This could get determined by a Metabase setting in future. If we decide in future to try and infer this for a given user or query, that could be analyzed before calling rename and just use this interface too.

@crisptrutski
Copy link
Contributor

It turns out that knowing when to quote is even more complicated - in most schema elements are still case sensitive, it's just that non quoted identifiers are treated as implicitly uppercase or lowercase (database and config dependent). So for one database an identifier references a lowercase field needs to be quoted, and for another database it does not. To make matters worse that case sensitivity can even be a property of the table itself that the field is a part of - the collation defined on it in the database.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants