Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

linter for non-memory-comparable primary keys #204

Closed
morgo opened this issue Mar 1, 2023 · 3 comments · Fixed by #205
Closed

linter for non-memory-comparable primary keys #204

morgo opened this issue Mar 1, 2023 · 3 comments · Fixed by #205

Comments

@morgo
Copy link
Contributor

morgo commented Mar 1, 2023

I would like to propose a new linter, for which I am happy to provide a PR for. The linter warns when the PRIMARY KEY includes a column that is non-memory comparable. i.e.

The following types are acceptable:

  • INT/BIGINT/VARBINARY/BLOB/DECIMAL

The following are not:

  • VARCHAR/TEXT/FLOAT/JSON

The use-case is two fold:

  • You can't safely store a distinct set of key's in an in-memory map (i.e. 'A' != 'a' in the map, but in MySQL these could be treated equal).
  • Because of collations, you can't safely range on a key. So its not possible to know if key 'X' is above key 'Y'.

This might sound obscure, but I am looking to adopt skeema for this use-case :P We are developing a schema-change tool that has optimizations that depends on memory-comparable keys. When the key is non-memory-comparable, we currently detect it - but it means we need to fall back to a slower schema change method. So being able to lint it at the user level is a win for us.

@evanelias
Copy link
Contributor

Thank you for the feature request! I'd accept a PR for this if it can be made more generic -- I'd envision it as an option that allows you to configure which data types are permissible for use in primary keys. I believe that should still fulfil your use-case (assuming it's only checking column types and not any other aspect of the column?) but still permit companies with similar-but-slightly-different PK limitations to leverage it.

The linter package has some helpers for making these "allow lists" easy to code, see e.g. the similar data type checking use-case in allow-auto-inc / lint-auto-inc via check_auto_inc.go. This makes use of the helpers Rule.RelatedListOption and Options.IsAllowed.

In terms of option naming and functionality, I can see at least two possible approaches:

a) Enhance the existing lint-pk to make data type checking part of it. Add new related option allow-pk which allows you to supply a comma-separated list of acceptable PK data types. However, since lint-pk defaults to being enabled at warning level, the default for allow-pk would need to be very permissive... either explicitly listing every semi-reasonable PK data type, or treating an empty list as allowing anything, or implementing a special "all" value which is fully permissive.

or

b) Make this a separate new linter check, lint-pk-type, which defaults to disabled (ignore). New companion option allow-pk-type accepts a comma-separated list of acceptable PK data types. Since this is a new rule which defaults to being disabled, there's no need to worry about a default value for allow-pk-type / handling a special "all" value, so this implementation is slightly simpler.

That said, definitely open to discussing other approaches!

If submitting a PR, a few things to keep in mind:

  • tengo.Column.TypeInDB comes directly from information_schema.columns.column_type, which means it includes things like lengths / display widths / unsigned attribute / set or enum values / etc. This will inherently need a bit more handling than the similar logic in lint-auto-inc, since lint-auto-inc only has to deal with a limited set of column types by nature of AUTO_INCREMENT only being present on numeric types anyway.

  • We should maintain compatibility with MariaDB also, which potentially increases the number of weird column types. Off the top of my head, I believe all their additional column types (uuid, inet6, inet4) don't have sizes or other attributes to worry about in the previous bullet, but just something to think through.

  • One of the integration tests in the linter package, TestCheckSchema, automatically enables all linter checks and then runs them against a set of table definitions in linter/testdata/validcfg/*.sql. There's a .skeema file in that directory if you need to pre-configure the allow-list for PK types for testing purposes.

@morgo
Copy link
Contributor Author

morgo commented Mar 1, 2023

Thank you so much for the pointers! Okay, give me some time and I'll work on a PR.

@evanelias
Copy link
Contributor

Skeema v1.10 has been released and includes this feature (lint-pk-type and allow-pk-type). Thanks again for the pull request!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants