linter for non-memory-comparable primary keys #204

morgo · 2023-03-01T16:17:38Z

I would like to propose a new linter, for which I am happy to provide a PR for. The linter warns when the PRIMARY KEY includes a column that is non-memory comparable. i.e.

The following types are acceptable:

INT/BIGINT/VARBINARY/BLOB/DECIMAL

The following are not:

VARCHAR/TEXT/FLOAT/JSON

The use-case is two fold:

You can't safely store a distinct set of key's in an in-memory map (i.e. 'A' != 'a' in the map, but in MySQL these could be treated equal).
Because of collations, you can't safely range on a key. So its not possible to know if key 'X' is above key 'Y'.

This might sound obscure, but I am looking to adopt skeema for this use-case :P We are developing a schema-change tool that has optimizations that depends on memory-comparable keys. When the key is non-memory-comparable, we currently detect it - but it means we need to fall back to a slower schema change method. So being able to lint it at the user level is a win for us.

The text was updated successfully, but these errors were encountered:

evanelias · 2023-03-01T21:40:45Z

Thank you for the feature request! I'd accept a PR for this if it can be made more generic -- I'd envision it as an option that allows you to configure which data types are permissible for use in primary keys. I believe that should still fulfil your use-case (assuming it's only checking column types and not any other aspect of the column?) but still permit companies with similar-but-slightly-different PK limitations to leverage it.

The linter package has some helpers for making these "allow lists" easy to code, see e.g. the similar data type checking use-case in allow-auto-inc / lint-auto-inc via check_auto_inc.go. This makes use of the helpers Rule.RelatedListOption and Options.IsAllowed.

In terms of option naming and functionality, I can see at least two possible approaches:

a) Enhance the existing lint-pk to make data type checking part of it. Add new related option allow-pk which allows you to supply a comma-separated list of acceptable PK data types. However, since lint-pk defaults to being enabled at warning level, the default for allow-pk would need to be very permissive... either explicitly listing every semi-reasonable PK data type, or treating an empty list as allowing anything, or implementing a special "all" value which is fully permissive.

or

b) Make this a separate new linter check, lint-pk-type, which defaults to disabled (ignore). New companion option allow-pk-type accepts a comma-separated list of acceptable PK data types. Since this is a new rule which defaults to being disabled, there's no need to worry about a default value for allow-pk-type / handling a special "all" value, so this implementation is slightly simpler.

That said, definitely open to discussing other approaches!

If submitting a PR, a few things to keep in mind:

tengo.Column.TypeInDB comes directly from information_schema.columns.column_type, which means it includes things like lengths / display widths / unsigned attribute / set or enum values / etc. This will inherently need a bit more handling than the similar logic in lint-auto-inc, since lint-auto-inc only has to deal with a limited set of column types by nature of AUTO_INCREMENT only being present on numeric types anyway.
We should maintain compatibility with MariaDB also, which potentially increases the number of weird column types. Off the top of my head, I believe all their additional column types (uuid, inet6, inet4) don't have sizes or other attributes to worry about in the previous bullet, but just something to think through.
One of the integration tests in the linter package, TestCheckSchema, automatically enables all linter checks and then runs them against a set of table definitions in linter/testdata/validcfg/*.sql. There's a .skeema file in that directory if you need to pre-configure the allow-list for PK types for testing purposes.

morgo · 2023-03-01T21:52:04Z

Thank you so much for the pointers! Okay, give me some time and I'll work on a PR.

evanelias · 2023-04-26T20:18:12Z

Skeema v1.10 has been released and includes this feature (lint-pk-type and allow-pk-type). Thanks again for the pull request!

morgo added the feature request label Mar 1, 2023

morgo mentioned this issue Mar 8, 2023

Add new linter for PK type #205

Merged

evanelias closed this as completed in #205 Mar 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

linter for non-memory-comparable primary keys #204

linter for non-memory-comparable primary keys #204

morgo commented Mar 1, 2023

evanelias commented Mar 1, 2023

morgo commented Mar 1, 2023

evanelias commented Apr 26, 2023

linter for non-memory-comparable primary keys #204

linter for non-memory-comparable primary keys #204

Comments

morgo commented Mar 1, 2023

evanelias commented Mar 1, 2023

morgo commented Mar 1, 2023

evanelias commented Apr 26, 2023