Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide Macaw with driver-based formatting options #43096

Closed
Tracked by #36911
crisptrutski opened this issue May 23, 2024 · 2 comments · Fixed by #43460 or #43580
Closed
Tracked by #36911

Provide Macaw with driver-based formatting options #43096

crisptrutski opened this issue May 23, 2024 · 2 comments · Fixed by #43460 or #43580
Assignees
Labels
Milestone

Comments

@crisptrutski
Copy link
Contributor

In order to correctly handle case and quoting during SQL rewrites, we need to be driver-aware.

As a first version we'll expect Metabase to pass in some relevant config, and keep Macaw itself driver agnostic.

{:case-insensitive? true
 :quotes            "[]" ;; MSSQL

Perhaps we're able to leverage existing driver methods used by the query processor, especially for the quotes. Otherwise, new multimethods with sensible defaults (thinking case sensitive, with double quotes for quotes).

In future we might find the need to have dialect specific grammars, in which case this will seem a bit redundant. We'll probably keep a fairly agnostic superset grammar as fallback for newfangled 3rd party drivers, which will still take these options though.

@crisptrutski crisptrutski added the .Team/BackendComponents also known as BEC label May 23, 2024
@crisptrutski crisptrutski self-assigned this May 23, 2024
@crisptrutski crisptrutski mentioned this issue May 23, 2024
36 tasks
@dosubot dosubot bot added the .Backend label May 23, 2024
@crisptrutski
Copy link
Contributor Author

It turns out that:

  1. Macaw as it stands can't support [] quoting. Hacking around that can happen just as well in Metabase as in Macaw, so I'd prefer to keep the library "clean" for now.
  2. Most common dialect support the same two quotes, so just handle them for everything.
  3. Whether quotes affect case sensitivity differs by dialect, so I added an extra option for that.

So the options we need are now the following:

{
  :case-insensitive? true ;; ignore case when comparing identifiers
  :quotes-preserve-case? true ;; ignore :case-insensitive? for quoted references?
}

@crisptrutski crisptrutski assigned piranha and unassigned piranha May 27, 2024
@crisptrutski
Copy link
Contributor Author

crisptrutski commented May 31, 2024

It turns out that MySQL's behavior doesn't quite fix any of the patterns we expected. There was a bit of a crossed wire when discussing this with Sanya in the related issue as well, where it turns out he was actually talking about case sensitivity when comparing the values of string typed columns.

To summarize MySQL behavior:

  • Databases and tables are case sensitive depending on the the file systems it's using, and the lower_case_table-names system variable. The default value of this variable comes from the OS the DB was created on.
  • The semantics of that system variable are quite complex, but as far as we care if the value is 0 then we know that it's running on a case insensitive FS and case is not being normalized. If it is any other value then these identifiers are not treated as case sensitive.
  • We can check this system variable by querying the connection, and its value can never change, so it's safe to cache.
  • Column names and aliases are never case sensitive, even if you quote them.

For a v1 I think we can just treat MySQL as never being case sensitive, since in practice it's a strongly discouraged practice to use non lowercase names in the database and table names.

For a v2 we could change the Macaw API to allow case sensitivity to be set on a per-identifier-type basis, and to get some of these values from the database. We could keep this per-database configuration in a in-memory cache to avoid database hits every time we analyze a query, and we could save this as a property of the database when we first connect to it.

That's quite a few moving parts for something customers are unlikely ever to run into, so I think it's fair to kick it down the road.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants