Skip to content

Fix Connect chat validation kicks on Velocity and Spigot#26

Merged
robinbraemer merged 12 commits into
connectfrom
fix/velocity-connect-chat-signing
May 12, 2026
Merged

Fix Connect chat validation kicks on Velocity and Spigot#26
robinbraemer merged 12 commits into
connectfrom
fix/velocity-connect-chat-signing

Conversation

@robinbraemer
Copy link
Copy Markdown
Member

@robinbraemer robinbraemer commented May 12, 2026

Summary

  • Adds Connect-tunnel-only chat signing mitigations for both Velocity and Spigot/Paper.
  • Velocity: drops play-state chat session updates and decoded chat acknowledgements on local Connect tunnel channels, and rewrites impossible pre-epoch signed command packets to unsigned command packets.
  • Spigot/Paper: drops chat session updates and chat acknowledgements on local Connect tunnel channels, rewrites signed commands to unsigned commands, and normalizes chat packet last-seen data so Paper treats tunneled chat as unsigned.

Behavior / tradeoff

This intentionally downgrades chat and commands for Connect-tunneled Java connections to unsigned state. It restores playability for stacked proxy/server topologies where another layer already owns chat session, acknowledgement, last-seen, and timestamp state.

Preserving signed chat through this path would require a larger protocol design where exactly one layer owns and rewrites the full signed-chat state machine end to end.

Test plan

  • Local: Spigot compile plus core and Velocity tests.
  • Local: Velocity shadow jar build and whitespace check.
  • Manual Velocity live test on shaky-debor.play.minekube.net: chat and /help no longer kick; Paper logs chat as [Not Secure]; /browser transfer disconnect remains expected.
  • Manual Spigot/Paper live test on shaky-debor.play.minekube.net: join, repeated chat, /help, /plugins, /about, /?, and /bukkit:? all accepted; chat logs as [Not Secure]; no delayed Chat message validation failure kick after the previous failure window.

forEach(this::getUsernameFromSource) discards the return value, so the
returned collection was always empty. ProfileAudienceParser depends on
this list for tab-completion and player lookup, silently breaking both.

Diagnosis credit to ChannyAnh.
ReflectionUtils.getCastedValue(bytes, DATA) can return null in two cases:
the DATA field is missing (e.g. after Okio relocation/shading), or
field.get() throws IllegalAccessException at runtime, which
ReflectionUtils.getValue swallows. Either way, null then flowed into
handler.onReceive(null) → Unpooled.wrappedBuffer(null) → NPE, which
crashes the OkHttp dispatcher thread and drops every tunnel sharing it.

Fall back to the safe copying path (bytes.toByteArray()) when the
zero-copy reflection is unavailable.

Diagnosis credit to ChannyAnh.
…ct Timer

java.util.Timer spawns one OS thread per instance. WatcherRegister
created a fresh Timer on every WatcherImpl.startResetBackOffTimer() call
and never released the prior one (cancel() ends scheduled tasks but
relies on the timer thread observing the cancel). Under reconnect storms
this leaked daemon threads, adding GC pressure and stop-the-world pauses
visible as server-wide ping spikes.

Replace both Timer fields (retry + reset-backoff) with a single
single-thread ScheduledExecutorService keyed by daemon thread, and
cancel ScheduledFutures instead of cancelling Timers.

Diagnosis credit to ChannyAnh.
newSingleThreadExecutor() serialized every async HTTP call (skin lookups,
health checks, endpoint-name lookups). A slow Mojang skin response would
back up the queue and delay anything else. The executor also used
non-daemon threads, blocking JVM shutdown.

Replace with newFixedThreadPool(4) of named daemon threads.

Diagnosis credit to ChannyAnh.
CommonPlatformInjector.injectedClients/addons and PacketHandlersImpl's
three maps/sets are read on Netty I/O threads (per-channel) and mutated
from registration calls that can originate on any thread. HashSet/HashMap
under concurrent access produces ConcurrentModificationException, which
Netty propagates to the uncaught-exception handler and closes the
channel — visible as random disconnects under load.

Swap injectedClients to a ConcurrentHashMap-backed set, addons to a
ConcurrentHashMap, and the three registries in PacketHandlersImpl to
ConcurrentHashMap / CopyOnWriteArraySet / CopyOnWriteArrayList.

Diagnosis credit to ChannyAnh.
ChannelIn/OutPacketHandler called !res.equals(msg) to detect whether a
handler had replaced the packet. The PacketHandler contract is "return
the same object or a new one", so identity is the correct semantics.
Using .equals() also risks an expensive deep field comparison on every
inbound/outbound packet if the underlying packet types ever override
equals.

Diagnosis credit to ChannyAnh. (ChannelOutPacketHandler change inferred
from the bug description — the same line existed verbatim.)
writeAndFlush on every incoming packet issued one kernel flush syscall
per packet. With normal game traffic (100-500 packets/s per session)
this is the dominant CPU cost on the receive path and shows up as ping
spikes under burst load.

Schedule each write on the channel's EventLoop with voidPromise() to
avoid ChannelFuture allocation, and coalesce all writes within an
EventLoop tick into a single flush guarded by an AtomicBoolean.

Diagnosis credit to ChannyAnh.
CONNECTION_TIMEOUT governs the Netty LocalChannel bootstrap, which is
an in-process connection that should complete in microseconds. The
original 30s held an EventLoop slot for half a minute on every failed
attempt; under congestion this starved healthy sessions on the same
EventLoop and surfaced as ping spikes. 10s is still well above any
plausible legitimate latency.

Diagnosis credit to ChannyAnh.
Bootstraps the test suite for the core module (previously had zero tests)
and adds coverage for each of the 8 fixes in this branch:

- CommandUtilTest — verifies getOnlineUsernames returns one entry per
  online player (was always empty).
- TunnelerTest — drives a real OkHttp WebSocket via MockWebServer; verifies
  the null-bytes fallback by forcing the reflected ByteString.data field
  to null and asserting the safe bytes.toByteArray() path delivers the
  correct payload.
- WatcherRegisterTest — verifies the lazy scheduler lifecycle: never
  created at field init, recreated on start/stop reuse, late-callback
  schedule sites tolerate a null scheduler, and Timer fields are gone.
- HttpUtilsTest — verifies the bounded daemon pool: at least 4 concurrent
  tasks, threads named connect-http-worker, all daemons.
- PacketHandlersImplTest — verifies register/deregister/registerAll
  including the null-key contract for global handlers and the atomic
  computeIfPresent guard against concurrent-register-during-deregister.
- CommonPlatformInjectorTest — concurrent add/remove of injected clients
  with parallel iteration; addon map mutation while iterated.
- ChannelIn/OutPacketHandlerTest — verifies the != identity check
  forwards new packet references that .equals() the original.
- TunnelHandlerTest — uses a Mockito Channel with a real DefaultEventLoop
  to faithfully exercise the CAS-based flush coalescing (EmbeddedChannel
  re-enters the task queue mid-write and can't be used here); verifies
  one flush per burst and FLUSH-before-CLOSE ordering.

Test deps pinned to versions still compatible with the Java 8 source
target: JUnit Jupiter 5.10.5, Mockito 4.11.0, Awaitility 4.2.2.
@robinbraemer robinbraemer force-pushed the fix/velocity-connect-chat-signing branch from 701a22c to 7ee3820 Compare May 12, 2026 15:40
@robinbraemer robinbraemer changed the title Fix Velocity Connect out-of-order chat kicks Fix Connect chat validation kicks on Velocity and Spigot May 12, 2026
@robinbraemer robinbraemer changed the base branch from fix/perf-and-stability-channyanh to connect May 12, 2026 16:08
@robinbraemer robinbraemer merged commit 7df3100 into connect May 12, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant