Skip to content

Commit

Permalink
WIP: workspace=docker: fix utf8mb4 default collation for 5.x arm64 ed…
Browse files Browse the repository at this point in the history
…ge case

*** this is a work-in-progress commit, and will be rebased/amended soon to
*** add test coverage.

Background:

When using workspace=docker on an arm64 system (such as Apple Silicon or AWS
Graviton) along with flavor=mysql:5.7 (or any 5.x), Skeema uses a mysql:8.0
image instead, because arm64 builds for 5.x are not available on DockerHub.

Previously, this situation was problematic when CREATE statements referenced
the utf8mb4 character set without an explicit collation clause, because the
default collation for this charset changed between MySQL 5.x and 8.x. Worse
still, the default in 8.x does not even exist in 5.x (except in Aurora 5.7,
where AWS backported it).

The result would be that the workspace would introspect affected objects as
using utf8mb4_0900_ai_ci, while the target database used utf8mb4_general_ci;
so DDL would be emitted which attempts to change columns from one to the
other. This DDL was invalid in mainline 5.x, or valid but unexpected in
Aurora 5.7.

Solution in this commit:

When substituting 8.0 for 5.x on arm64, we now force Skeema's connections to
use SET SESSION default_collation_for_utf8mb4=utf8mb4_general_ci. This causes
the server to interpret utf8mb4 (when no explicit collation is specified) as
having the old 5.x default collation behavior, rather than 8.0's.

The server introduced this session variable only for its own logical
replication purposes (under similar motivations to our own), and warns users
against setting it. Indeed, this variable cannot be set in a server option
file, nor does it PERSIST properly. However, at the session level it works as
expected in 8.0, and we don't ever need to set it in 8.1+, so this solution
appears to be safe despite any server-side deprecation plans in a future
release series.

To be clear, this code path only affects workspace=docker, and only with
flavor=mysql:5.x while using an arm64 cpu. This use-case is already somewhat
uncommon, and will become increasingly rare as more users upgrade to MySQL 8
over time.
  • Loading branch information
evanelias committed Mar 29, 2024
1 parent da0e8e1 commit 627cbcf
Showing 1 changed file with 12 additions and 0 deletions.
12 changes: 12 additions & 0 deletions internal/workspace/localdocker.go
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,18 @@ func NewLocalDocker(opts Options) (_ *LocalDocker, retErr error) {
if err != nil {
log.Warn(err.Error() + ". Substituting mysql:8.0 instead for workspace purposes, which may cause behavior differences.")
image = "mysql:8.0"

// If the original requested flavor was MySQL 5.x, force session-level
// default_collation_for_utf8mb4=utf8mb4_general_ci so that any usage of
// utf8mb4 without an explicit collation clause will behave like it did in
// 5.x. The MySQL Manual warns against setting this, but it works successfully
// at the session level in all versions of 8.0; and our motivation here is
// conceptually similar to the logical replication use-case that this variable
// was introduced to handle.
if opts.Flavor.IsMySQL(5) {
ld.defaultConnParams += "&default_collation_for_utf8mb4=utf8mb4_general_ci"
ld.defaultConnParams = strings.TrimPrefix(ld.defaultConnParams, "&")
}
}
if opts.ContainerName == "" {
opts.ContainerName = "skeema-" + tengo.ContainerNameForImage(image)
Expand Down

0 comments on commit 627cbcf

Please sign in to comment.