Skip to content

ZOOKEEPER-4415 + ZOOKEEPER-4912: enable TLSv1.3 and remove default cipher overrides#143

Merged
Sanju98 merged 5 commits into
linkedin:branch-3.6from
Sanju98:sarvind/zk-4912-backport
May 29, 2026
Merged

ZOOKEEPER-4415 + ZOOKEEPER-4912: enable TLSv1.3 and remove default cipher overrides#143
Sanju98 merged 5 commits into
linkedin:branch-3.6from
Sanju98:sarvind/zk-4912-backport

Conversation

@Sanju98
Copy link
Copy Markdown

@Sanju98 Sanju98 commented May 26, 2026

Summary

Combined backport of two upstream Apache ZooKeeper fixes to branch-3.6:

  • ZOOKEEPER-4415 (Apache 66771be4d, released in 3.9.2): server-side TLSv1.3 support. Picks TLSv1.3 as the default SSLContext protocol when the running JDK supports it, and returns the JDK's default enabled protocols (TLSv1.3 + TLSv1.2 on JDK 11+; TLSv1.2-only on older JDKs) when ssl.enabledProtocols is unset.
  • ZOOKEEPER-4912 (Apache master 2aaeff840, not yet in any 3.9.x release): removes the hardcoded TLSv1.2-only default cipher list. SSLContextAndOptions.getCipherSuites() now returns null when ssl.ciphersuites is unset, letting the JDK supply its own default cipher list at SSLEngine construction time.

These two changes were a clean hand-craft against branch-3.6 rather than cherry-picks, because the upstream commits live on top of the JUnit 5 migration that branch-3.6 does not carry — cherry-picking pulled in unrelated test-framework refactoring noise. The substantive TLS logic is unchanged from upstream.

Motivation

With branch-3.6 today, the server installs a fixed TLSv1.2-only cipher set on every SSLEngine even when the JDK supports more recent ciphers and protocols. Two problems block TLSv1.3:

  1. The hardcoded cipher list contains no TLSv1.3 ciphers, so TLSv1.3 handshakes have no overlap on the server side and fail.
  2. The default protocol is pinned to TLSv1.2 regardless of what the JDK supports.

A downstream workaround of shipping ssl.ciphersuites with explicit TLSv1.3 cipher names is brittle against JDK drift: if the running JDK does not recognise a cipher name (e.g. TLS_CHACHA20_POLY1305_SHA256 on OpenJDK 11.0.8, which only added it in 11.0.11), SSLEngine.setEnabledCipherSuites() throws IllegalArgumentException: Unsupported CipherSuite at engine setup and every TLS handshake fails. We hit exactly this in an internal deployment.

After this backport, neither X509Util nor any downstream config has to carry a cipher list at all. The JDK is the single source of truth for both protocol selection and cipher selection, so JDK upgrades automatically pull in new ciphers and protocols without any code change. Per-deployment overrides via ssl.enabledProtocols / ssl.ciphersuites continue to work exactly as before.

Changes

X509Util.java

  • Add TLS_1_1 / TLS_1_2 / TLS_1_3 string constants.
  • Replace DEFAULT_PROTOCOL = "TLSv1.2" with defaultTlsProtocol(), which returns TLSv1.3 when the JDK supports it, TLSv1.2 otherwise.
  • Delete getGCMCiphers / getCBCCiphers / concatArrays / DEFAULT_CIPHERS_JAVA8 / DEFAULT_CIPHERS_JAVA9 / getDefaultCipherSuites / getDefaultCipherSuitesForJavaVersion.
  • Drop the now-irrelevant "Default cipher suites" class doc-comment and the unused Objects import.

SSLContextAndOptions.java

  • getCipherSuites(): return null when ssl.ciphersuites is unset, instead of falling back to X509Util.getDefaultCipherSuites().
  • Constructor: null-guard cipherSuitesAsList (Collections.unmodifiableList(Arrays.asList(null)) would NPE; both configureSslParameters and createNettyJdkSslContext already accept null = "use JDK defaults").
  • getEnabledProtocols(): return sslContext.getDefaultSSLParameters().getProtocols() (the JDK's curated default list for the chosen protocol) instead of new String[]{sslContext.getProtocol()}.

X509UtilTest.java

  • Augment testCreateSSLContextWithoutCustomProtocol to assert TLSv1.3 is selected on JDKs that support it (and the enabled-protocol list contains both TLSv1.2 and TLSv1.3); otherwise assert TLSv1.2-only. Mirrors the upstream ZOOKEEPER-4415 test additions, in JUnit 4 style.
  • Switch the "TLSv1.1" string literal in testCreateSSLContextWithCustomProtocol to the new X509Util.TLS_1_1 constant.
  • Pin testClientRenegotiationFails to TLSv1.2 — renegotiation is not a TLSv1.3 feature, so this test must explicitly force 1.2 to stay meaningful once 1.3 becomes the default.
  • Delete the six obsolete testGetDefaultCipherSuites* tests that exercised the now-deleted Java-version-aware cipher selector.
  • Add testCreateSSLContextWithoutCipherSuites as a focused regression guard for the null-handling chain — clears ssl.ciphersuites, rebuilds the X509Util, and asserts the SSLContextAndOptions constructor returns without NPE. Pre-fix, Collections.unmodifiableList(Arrays.asList(null)) would throw before the SSLContext was produced.

QuorumSSLTest.java

  • testProtocolVersion: adopt the new X509Util.TLS_1_1 / X509Util.TLS_1_2 constants in place of raw "TLSv1.x" literals, for consistency with the new convention introduced in this PR. No behaviour change.

zookeeperAdmin.md

  • Document the new defaults: TLSv1.3 when JDK supports it; JDK default protocol/cipher list when properties are unset.

Compatibility notes

  • X509Util.DEFAULT_PROTOCOL is no longer a compile-time constant. It is still declared public static final String but its value is computed at class-load time via SSLContext.getDefault().getSupportedSSLParameters(). Callers that were compiled against an earlier branch-3.6 with the literal "TLSv1.2" inlined by javac will keep seeing "TLSv1.2" until recompiled against the new JAR. For this fork, all consumers ship together as part of the same release, so this is a non-issue — flagged for anyone backporting further.
  • X509Util.getDefaultCipherSuites() and getDefaultCipherSuitesForJavaVersion() are removed. Both were package-private (static, not public static). The only intra-repo caller was SSLContextAndOptions.getCipherSuites(), which is updated in the same commit. A grep -rn 'getDefaultCipherSuites' zookeeper-server/src/main confirms no other callers. Anyone calling these via reflection from outside this repo will break; we are not aware of such callers in the LinkedIn ZK ecosystem.
  • Per-deployment overrides are preserved. ssl.enabledProtocols and ssl.ciphersuites continue to honour user-supplied values exactly as before. Setting ssl.enabledProtocols=TLSv1.2 on a specific ensemble pins that ensemble to TLSv1.2 even on a JDK that supports TLSv1.3.

Testing done

Verified locally with Maven 3.6.3 / OpenJDK 17.0.5:

Test class Result
X509UtilTest 312/312 pass (304 from the existing suite + 8 parametrized invocations of the new null-cipher regression test, all on JDK 17 which exercises the TLSv1.3 path)
QuorumSSLTest 12/13 pass; only testOCSP fails — confirmed pre-existing on unmodified branch-3.6 HEAD (same failure, same elapsed time), unrelated to TLS protocol/cipher logic
X509SpiffeAuthIntegrationTest 6 failures, all NoSuchMethodError — pre-existing test-classpath issue, unrelated
Full zookeeper-server module compile BUILD SUCCESS, 384 source files

The ClientX509Util.java parts of upstream ZOOKEEPER-4912 are not included because branch-3.6's ClientX509Util is a 39-line stub that does not own any of the Netty SslContextBuilder calls that the upstream commit modifies — there is nothing in branch-3.6 to update there.

Diff stat

zookeeper-docs/.../zookeeperAdmin.md                  |  6 +-
zookeeper-server/.../common/SSLContextAndOptions.java | 12 +++-
zookeeper-server/.../common/X509Util.java             | 69 ++++++++--------------
zookeeper-server/.../common/X509UtilTest.java         | 75 ++++++++++----------------
zookeeper-server/.../quorum/QuorumSSLTest.java        |  5 +-
5 files changed, 75 insertions(+), 92 deletions(-)

…pher overrides

Combined backport of two upstream commits to branch-3.6:

  - ZOOKEEPER-4415 (Apache 66771be, in 3.9.2+): adds TLSv1.3 support
    to the server by picking TLSv1.3 as the default SSLContext protocol
    when the running JDK supports it, and returning the JDK's default
    enabled protocols (TLSv1.3 + TLSv1.2 on JDK 11+; TLSv1.2-only on
    older JDKs) when `ssl.enabledProtocols` is unset.

  - ZOOKEEPER-4912 (Apache master 2aaeff8, not yet in any 3.9.x
    release): removes the hardcoded TLSv1.2-only default cipher list
    from X509Util. SSLContextAndOptions.getCipherSuites now returns
    null when `ssl.ciphersuites` is unset, which lets the JDK supply
    its own default cipher list at SSLEngine construction time.

Motivation
==========
With branch-3.6 today, the server installs a fixed TLSv1.2-only cipher
set on every SSLEngine even when the JDK supports more recent ciphers
and protocols. Any attempt to enable TLSv1.3 from configuration runs
into two problems:

  1. The hardcoded cipher list contains no TLSv1.3 ciphers, so TLSv1.3
     handshakes have no overlap on the server side and fail.

  2. The default protocol is pinned to TLSv1.2 regardless of what the
     JDK supports.

The MP-side workaround of shipping `ssl.ciphersuites` with explicit
TLSv1.3 cipher names is brittle against JDK drift: if the running JDK
does not recognise a cipher name (e.g. TLS_CHACHA20_POLY1305_SHA256 on
OpenJDK 11.0.8, which only added it in 11.0.11), SSLEngine throws
`IllegalArgumentException: Unsupported CipherSuite` at engine setup
and every handshake fails.

With this backport, neither X509Util nor any downstream config has to
carry a cipher list at all. The JDK is the single source of truth for
both protocol selection and cipher selection, so JDK upgrades
automatically pull in new ciphers and protocols without an MP change.

Per-deployment overrides via `ssl.enabledProtocols` /
`ssl.ciphersuites` continue to work exactly as before.

Changes
=======
zookeeper-server/.../X509Util.java
  - Add TLS_1_1 / TLS_1_2 / TLS_1_3 string constants.
  - Replace `DEFAULT_PROTOCOL = "TLSv1.2"` with `defaultTlsProtocol()`,
    which returns TLSv1.3 when the JDK supports it, TLSv1.2 otherwise.
  - Delete getGCMCiphers / getCBCCiphers / concatArrays /
    DEFAULT_CIPHERS_JAVA8 / DEFAULT_CIPHERS_JAVA9 /
    getDefaultCipherSuites / getDefaultCipherSuitesForJavaVersion.
  - Drop the now-irrelevant "Default cipher suites" class doc-comment
    and the unused `Objects` import.

zookeeper-server/.../SSLContextAndOptions.java
  - getCipherSuites: return null when `ssl.ciphersuites` is unset
    instead of falling back to X509Util.getDefaultCipherSuites().
  - Constructor: null-guard cipherSuitesAsList (Collections.asList of
    null would NPE; configureSslParameters and createNettyJdkSslContext
    both already accept null = "use JDK defaults").
  - getEnabledProtocols: return sslContext.getDefaultSSLParameters()
    .getProtocols() (the JDK's curated default list for the chosen
    protocol) instead of new String[]{sslContext.getProtocol()}.

zookeeper-server/.../X509UtilTest.java
  - Augment testCreateSSLContextWithoutCustomProtocol to assert
    TLSv1.3 is selected on JDKs that support it (and the enabled
    protocol list contains both TLSv1.2 and TLSv1.3); otherwise
    assert TLSv1.2-only.
  - Switch "TLSv1.1" string literal to X509Util.TLS_1_1 constant.
  - Pin testClientRenegotiationFails to TLSv1.2 (renegotiation is not
    a TLSv1.3 feature, so this test must explicitly force 1.2 to
    remain meaningful once 1.3 becomes the default).
  - Delete the six obsolete testGetDefaultCipherSuites* tests that
    exercised the now-deleted Java-version-aware cipher selector.

zookeeper-docs/.../zookeeperAdmin.md
  - Document the new defaults: TLSv1.3 when JDK supports it; JDK
    default protocol/cipher list when properties are unset.

Local verification
==================
  X509UtilTest: 304/304 pass, including the new TLSv1.3 assertions.
  QuorumSSLTest: 12/13 pass; testOCSP failure is pre-existing on
    branch-3.6 (verified against unmodified HEAD) and unrelated to
    TLS protocol/cipher logic.
  Full zookeeper-server module compiles (384 source files, BUILD
    SUCCESS).
@Sanju98 Sanju98 self-assigned this May 26, 2026
Sanju98 added 2 commits May 26, 2026 18:30
testProtocolVersion still used the raw "TLSv1.2" / "TLSv1.1" strings even
after the ZOOKEEPER-4415 backport introduced the TLS_1_1 / TLS_1_2
constants on X509Util. Switch the two System.setProperty calls to use
the constants for consistency with the new convention.

No behaviour change. Verified by running testProtocolVersion locally
(1/1 pass).
Add a focused regression guard for the null-handling chain introduced by
the ZOOKEEPER-4912 backport. testCreateSSLContextWithoutCustomProtocol
exercises the unset-ssl.ciphersuites path implicitly, but does not pin
the assertion to that condition — a future regression that reintroduces
a hardcoded default cipher list would still let it pass.

testCreateSSLContextWithoutCipherSuites explicitly clears the property,
rebuilds the X509Util, and asserts SSLContextAndOptions is constructed
without NPE. Pre-fix, the constructor would fail on
Collections.unmodifiableList(Arrays.asList(null)) before returning.
Copy link
Copy Markdown
Collaborator

@laxman-ch laxman-ch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Test it locally and provide a report with combinations (JDK version, Client version, Client side protocols version).
  • Do we need to backport any more related changes (bug fixes, etc) after this initial set of commits from 3.9 branch?

@Sanju98
Copy link
Copy Markdown
Author

Sanju98 commented May 27, 2026

1. Local test report with combinations — published as a Google Doc: TLSv1.3 Enablement — Test Plan. The Local testing and Local performance testing sections at the end cover:

  • Server JDK: JDK 11.0.13-msft (matches product-spec.json)
  • Client JDK matrix: JDK 8u282-msft, 11.0.8-msft, 17.0.5-msft, 21.0.6-msft — for both default (TLS 1.3) and forced TLS 1.2
  • Performance comparison at ~10k ops/sec
  • Negotiated protocol + cipher captured for every row

Bottom line: every JDK × protocol combination succeeds end-to-end. No Unsupported CipherSuite errors on any JDK (including 11.0.8, which is what caused incident-12090). No performance regression at production-representative load.

@Sanju98
Copy link
Copy Markdown
Author

Sanju98 commented May 27, 2026

@laxman-ch — triage done.

Scanned all commits on upstream/branch-3.9 not in branch-3.6 touching X509Util / SSL* / ClientX509 / QuorumX509 / NettyServerCnxnFactory paths (~40 commits).

Nothing critical to backport. No bug fix on branch-3.9 blocks this rollout. The only commit that directly addresses the incident-12090 failure mode is ZK-4912 (remove hardcoded cipher overrides), which lives on upstream/master only (not in any 3.9.x release) and is already in this PR alongside ZK-4415.

What's left on branch-3.9 falls into:

  • 4 good-to-have TLS bug fixes — ZK-4986 (disable reverse DNS lookup in TLS), ZK-4955 (ssl.crl/ssl.ocsp JVM-wide property leak), ZK-4954 (FIPS-style hostname verification when no custom trust manager), ZK-4940 (OCSP with JRE TLS provider error)
  • ~8 operational improvements — hot cert reload (ZK-3806), password-from-file (ZK-4396), TLS/Netty metrics (ZK-3846 / ZK-3847 / ZK-3978), configurable early TLS connection drop (ZK-4453), configurable client hostname verification (ZK-4790), full exception logging on JAAS failures (ZK-4906)
  • 3 FIPS-specific — skip (LinkedIn ZK does not run in FIPS mode)
  • Netty-TcNative OpenSSL provider (ZK-4622) — major feature, deserves its own evaluation, not a fix
  • ~10 cosmetic / unrelated commits — skip

@Sanju98 Sanju98 requested a review from laxman-ch May 28, 2026 12:12
laxman-ch
laxman-ch previously approved these changes May 28, 2026
Copy link
Copy Markdown
Collaborator

@adityaagg09 adityaagg09 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the backport — overall LGTM. Clean, well-documented change and the substantive TLS logic is correct. The one genuinely risky part — the null-cipher chain from ZOOKEEPER-4912 — checks out: getCipherSuites() returning null is correctly handled by both consumers (configureSslParameters guards with if (cipherSuites != null), and createNettyJdkSslContext hands cipherSuitesAsList to Netty's JdkSslContext, which reads null as "use JDK defaults"). The constructor null-guard is therefore necessary and right.

Left a batch of inline comments — all minor / non-blocking. Summary:

  • getEnabledProtocols now widens the enabled set to JDK defaults when only ssl.protocol is set (doc note worth adding).
  • testCreateSSLContextWithoutCipherSuites hardcodes ClientX509Util, so the parametrized QuorumX509Util run never exercises a distinct path.
  • A couple of small nits in defaultTlsProtocol().

None of these block the merge.

Comment thread zookeeper-server/src/test/java/org/apache/zookeeper/common/X509UtilTest.java Outdated
Comment thread zookeeper-docs/src/main/resources/markdown/zookeeperAdmin.md
…note

Address two non-blocking review comments from @adityaagg09 on PR linkedin#143.

* X509UtilTest.testCreateSSLContextWithoutCipherSuites: loop the regression
  guard over both ClientX509Util (zookeeper.ssl.* prefix) and
  QuorumX509Util (zookeeper.ssl.quorum.* prefix) so a future change to
  either subclass's config prefix or defaults path is caught. The
  null-handling lives in the shared SSLContextAndOptions, but exercising
  both subclasses removes the prior duplication where all eight parametrized
  invocations ran the identical ClientX509Util path. Verified locally:
  X509UtilTest 312/312 pass on JDK 17.

* SSLContextAndOptions.getEnabledProtocols and zookeeperAdmin.md: document
  that setting only ssl.protocol selects the SSLContext protocol but does
  not by itself restrict the enabled-protocol set (the JDK's default enabled
  list for the chosen context applies). Operators who want strict
  single-protocol pinning must set ssl.enabledProtocols explicitly. The
  behavior matches upstream ZOOKEEPER-4415; this just makes the implication
  explicit in the operator-facing docs and in the source comment so future
  readers understand the widening.

No functional change.
@Sanju98
Copy link
Copy Markdown
Author

Sanju98 commented May 29, 2026

Thanks @adityaagg09 for the careful review. Pushed cf39e49b5 addressing the two actionable items:

  1. X509UtilTest.testCreateSSLContextWithoutCipherSuites — now loops over both ClientX509Util and QuorumX509Util, calling x509TestContext.setSystemProperties(util, …) for each prefix. The regression guard now exercises both subclasses on every parametrized invocation; the previously-duplicated 4-scenario coverage becomes genuine 8-scenario coverage. Verified locally: X509UtilTest 312/312 pass on JDK 17.
  2. SSLContextAndOptions.getEnabledProtocols source comment + zookeeperAdmin.md — added a sentence each noting that setting only ssl.protocol selects the SSLContext protocol but does not by itself restrict the enabled-protocol set; operators who want strict single-protocol pinning must set ssl.enabledProtocols (or ssl.quorum.enabledProtocols) explicitly.

Skipped the other nits as you suggested (new ArrayList<>()Collections.emptyList(), SSLContext.getDefault() vs getInstance("TLS"), JDK-conditional assertion coverage) — all match upstream and you flagged them as "fine to leave."

Copy link
Copy Markdown
Collaborator

@adityaagg09 adityaagg09 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Collaborator

@laxman-ch laxman-ch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Sanju98 Sanju98 merged commit fbd911f into linkedin:branch-3.6 May 29, 2026
10 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants