Skip to content

extremely slow schema updates after MUPdate #10475

@davepacheco

Description

@davepacheco

On a customer system, after a MUPdate, we found that the normal schema migration that Nexus does was proceeding very slowly. In fact, we initially thought Nexus hadn't come up because:

$ omdb nexus update-status
note: Nexus URL not specified.  Will pick one from DNS.
note: using DNS from system config (typically /etc/resolv.conf)
note: (if this is not right, use --dns-server to specify an alternate DNS server)
warning: failed to connect to NexusLockstep at ...:12232: connect "...:12232": Connection refused (os error 146)
warning: failed to connect to NexusLockstep at ...:12232: connect "...:12232": Connection refused (os error 146)
warning: failed to connect to NexusLockstep at ...:12232: connect "...:12232": Connection refused (os error 146)
Error: failed to connect to any instances of NexusLockstep

We looked at it for a while before realizing that Nexus was making forward progress, just very slowly. Nexus completes schema migrations before opening up its listening sockets, which is why we got the error above.

For reference:

  • I believe this was a MUPdate from R18 to R19.
  • The rack was parked for MUPdate around 2026-04-30T16:00:00Z, or 9am PT.

We were able to tell that schema upgrade steps were happening very slowly but it wasn't clear why. We looked at various data:

  • Nexus logs
  • output from a DTrace script showing queries executing by Nexus nodes
  • the jobs table reported by CockroachDB
  • CockroachDB logs

More details coming.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions