Skip to content

tls: replace NPN with ALPN for client connections#10822

Merged
edsiper merged 1 commit intofluent:masterfrom
zzbamboo:tls-client-npn-to-alpn
Sep 11, 2025
Merged

tls: replace NPN with ALPN for client connections#10822
edsiper merged 1 commit intofluent:masterfrom
zzbamboo:tls-client-npn-to-alpn

Conversation

@zzbamboo
Copy link
Copy Markdown
Contributor

@zzbamboo zzbamboo commented Sep 2, 2025

Replace SSL_CTX_set_next_proto_select_cb with SSL_CTX_set_alpn_protos
for client-side protocol negotiation. This modernizes the TLS
implementation by using the standardized ALPN (RFC 7301) instead
of the deprecated NPN protocol.


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • Bug Fixes
    • Improved client TLS/ALPN negotiation by changing how the client advertises protocols, increasing compatibility with more servers.
    • Reduced connection failures and handshake inconsistencies caused by previous ALPN handling.
    • More consistent client-side TLS setup for improved stability without requiring server changes.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Sep 2, 2025

Walkthrough

Client-side ALPN handling in src/tls/openssl.c is refactored: the client no longer registers a next-proto selection callback and instead configures ALPN directly via SSL_CTX_set_alpn_protos. Server-side ALPN selection remains unchanged; non-zero return from SSL_CTX_set_alpn_protos causes an error return.

Changes

Cohort / File(s) Summary
TLS client ALPN setup
src/tls/openssl.c
Removed the client ALPN select callback and its SSL_CTX_set_next_proto_select_cb registration. Client now uses SSL_CTX_set_alpn_protos(ctx->ctx, (const unsigned char *)&ctx->alpn[1], (unsigned int)ctx->alpn[0]) to provision ALPN; on non-zero return the function returns -1. ALPN buffer layout unchanged (ctx->alpn[0]=len, ctx->alpn[1..]=protocols). Server-side ALPN flow unchanged.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant App as Client App
  participant Ctx as TLS Context
  participant SSL as OpenSSL (SSL_CTX)
  participant Srv as Server (Handshake)

  rect rgba(220,240,255,0.4)
  note over App,Ctx: Configuration Phase (Client)
  App->>Ctx: tls_context_alpn_set(alpn)
  alt Prior implementation
    Ctx->>SSL: SSL_CTX_set_next_proto_select_cb(client_alpn_select)
    note right of SSL: registers callback for handshake-time selection
  else New implementation
    Ctx->>SSL: SSL_CTX_set_alpn_protos(alpn_bytes)
    SSL-->>Ctx: return (0=ok, !=0=error)
  end
  end

  rect rgba(235,255,235,0.4)
  note over Ctx,Srv: Handshake Phase
  App->>Srv: ClientHello (ALPN ext)
  alt Prior implementation
    SSL->>SSL: invoke client_alpn_select during handshake
  else New implementation
    SSL->>SSL: consult configured ALPN list directly
  end
  Srv-->>App: ServerHello (selected ALPN)
  end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I nibbled callbacks, one by one,
Packed protocols in a tidy run.
No handshake juggling, neat and done,
A byte-length list beneath the sun.
Thump—client hops; server hums—fun! 🐇✨


📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0dc3e49 and 1d8ae53.

📒 Files selected for processing (1)
  • src/tls/openssl.c (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/tls/openssl.c
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (7)
src/tls/openssl.c (7)

264-276: Guard ALPN calls for OpenSSL builds without ALPN

If older OpenSSL (<1.0.2) or OPENSSL_NO_ALPN is in use, this will not compile. Add a compile-time guard or confirm minimum supported OpenSSL version and document it.

         else {
-            if (SSL_CTX_set_alpn_protos(ctx->ctx, 
-                (const unsigned char *) &ctx->alpn[1], 
-                (unsigned int)ctx->alpn[0]) != 0) {
-                result = -1;
-            }
+#if defined(OPENSSL_NO_ALPN) || (OPENSSL_VERSION_NUMBER < 0x1000200fL)
+            flb_error("[tls] ALPN not supported by this OpenSSL build/version");
+            result = -1;
+#else
+            if (ctx->alpn != NULL) {
+                unsigned int alpn_len = (unsigned char) ctx->alpn[0];
+                if (SSL_CTX_set_alpn_protos(ctx->ctx,
+                                            (const unsigned char *) &ctx->alpn[1],
+                                            alpn_len) != 0) {
+                    result = -1;
+                }
+            }
+#endif
         }

239-246: Validate token length (1..255) and avoid strcpy; use memcpy

Prevents malformed ALPN vectors and truncation. Also avoids copying a trailing NUL into the buffer.

-            wire_format_alpn[wire_format_alpn_index] = \
-                (char) strlen(alpn_token);
-
-            strcpy(&wire_format_alpn[wire_format_alpn_index + 1],
-                   alpn_token);
-
-            wire_format_alpn_index += strlen(alpn_token) + 1;
+            size_t token_len = strlen(alpn_token);
+            if (token_len == 0 || token_len > 255) {
+                flb_free(wire_format_alpn);
+                free(alpn_working_copy);
+                return -1;
+            }
+            wire_format_alpn[wire_format_alpn_index] = (char) token_len;
+            memcpy(&wire_format_alpn[wire_format_alpn_index + 1],
+                   alpn_token, token_len);
+            wire_format_alpn_index += token_len + 1;

252-255: Free previous ctx->alpn to avoid leaks on reconfiguration

If tls_context_alpn_set is called more than once, the previous buffer leaks.

-            wire_format_alpn[0] = (char) wire_format_alpn_index - 1;
-            ctx->alpn = wire_format_alpn;
+            wire_format_alpn[0] = (char) wire_format_alpn_index - 1;
+            if (ctx->alpn != NULL) {
+                flb_free(ctx->alpn);
+            }
+            ctx->alpn = wire_format_alpn;

183-188: Avoid length sign-extension in server ALPN selection too

Mirror the unsigned-char cast where length is passed.

-                                       (unsigned int) ctx->alpn[0],
+                                       (unsigned int) (unsigned char) ctx->alpn[0],

1093-1103: Avoid double-setting ALPN (context-level + per-session)

After setting via SSL_CTX_set_alpn_protos, this per-session SSL_set_alpn_protos is redundant. Pick one (prefer context-level for defaults) to reduce code paths and potential drift.

-            if (ctx->alpn != NULL) {
-                ret = SSL_set_alpn_protos(session->ssl,
-                                          (const unsigned char *) &ctx->alpn[1],
-                                          (unsigned int) ctx->alpn[0]);
-
-                if (ret != 0) {
-                    flb_error("[tls] error: alpn setup failed : %d", ret);
-                    pthread_mutex_unlock(&ctx->mutex);
-                    return -1;
-                }
-            }

649-652: Fix log context: prints crt_file when reporting key_file error

Tiny but confusing during debugging.

-            flb_error("[tls] key_file '%s' %lu: %s",
-                      crt_file, ERR_get_error(), err_buf);
+            flb_error("[tls] key_file '%s' %lu: %s",
+                      key_file, ERR_get_error(), err_buf);

1067-1069: Typo in error message

“vailidation” → “validation”.

-        flb_error("[tls] error: hostname parameter vailidation is failed : %s",
+        flb_error("[tls] error: hostname parameter validation failed: %s",
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 35885ce and 5a3d70f.

📒 Files selected for processing (1)
  • src/tls/openssl.c (1 hunks)

Comment thread src/tls/openssl.c Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
src/tls/openssl.c (1)

271-279: Don’t hard-fail when ALPN is unset; guard NULL and fix length cast

Requiring ALPN (returning -1 if ctx->alpn is NULL) breaks clients that don’t configure ALPN. Also, casting a possibly-signed char directly to unsigned int risks sign-extension. Guard for NULL and coerce length via unsigned char; optionally log on failure.

-            if (ctx->alpn == NULL) {
-                return -1;
-            }
-            if (SSL_CTX_set_alpn_protos(
-                ctx->ctx, 
-                (const unsigned char *) &ctx->alpn[1], 
-                (unsigned int) ctx->alpn[0]) != 0) {
-                return -1;
-            }
+            if (ctx->alpn != NULL) {
+                unsigned int alpn_len = (unsigned char) ctx->alpn[0];
+                if (SSL_CTX_set_alpn_protos(ctx->ctx,
+                                            (const unsigned char *) &ctx->alpn[1],
+                                            alpn_len) != 0) {
+                    flb_error("[tls] error: failed to set ALPN on SSL context");
+                    return -1;
+                }
+            }
🧹 Nitpick comments (2)
src/tls/openssl.c (2)

1097-1107: Avoid duplicate ALPN configuration (ctx and per-session)

You already set ALPN on SSL_CTX; setting again on each SSL with SSL_set_alpn_protos is redundant. Prefer one source of truth to reduce complexity and ambiguity. If per-session override isn’t needed, remove this block.

-            if (ctx->alpn != NULL) {
-                ret = SSL_set_alpn_protos(session->ssl,
-                                          (const unsigned char *) &ctx->alpn[1],
-                                          (unsigned int) ctx->alpn[0]);
-
-                if (ret != 0) {
-                    flb_error("[tls] error: alpn setup failed : %d", ret);
-                    pthread_mutex_unlock(&ctx->mutex);
-                    return -1;
-                }
-            }
+            /* ALPN configured at SSL_CTX level; per-session override not required */

239-246: Build ALPN vector without transient NULs and enforce per-protocol length ≤ 255

Use memcpy with explicit lengths and validate each token’s length fits a single-octet prefix. This avoids relying on strcpy side effects and makes the wire format explicit.

-            wire_format_alpn[wire_format_alpn_index] = \
-                (char) strlen(alpn_token);
-
-            strcpy(&wire_format_alpn[wire_format_alpn_index + 1],
-                   alpn_token);
-
-            wire_format_alpn_index += strlen(alpn_token) + 1;
+            size_t tlen = strlen(alpn_token);
+            if (tlen == 0 || tlen > 255) {
+                flb_free(wire_format_alpn);
+                free(alpn_working_copy);
+                return -1;
+            }
+            wire_format_alpn[wire_format_alpn_index++] = (unsigned char) tlen;
+            memcpy(&wire_format_alpn[wire_format_alpn_index], alpn_token, tlen);
+            wire_format_alpn_index += tlen;
@@
-            wire_format_alpn[0] = (char) wire_format_alpn_index - 1;
+            wire_format_alpn[0] = (unsigned char) (wire_format_alpn_index - 1);

Also applies to: 252-255

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 5a3d70f and 0dc3e49.

📒 Files selected for processing (1)
  • src/tls/openssl.c (1 hunks)

Copy link
Copy Markdown
Contributor

@cosmo0920 cosmo0920 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It is considered as appropriate patch for using ALPN protocol on HELLO in TLS negotiations.

@cosmo0920 cosmo0920 added this to the Fluent Bit v4.1 milestone Sep 5, 2025
Signed-off-by: Biao Zhu <zhumouren0623@qq.com>
@zzbamboo zzbamboo force-pushed the tls-client-npn-to-alpn branch from 0dc3e49 to 1d8ae53 Compare September 7, 2025 03:26
@zzbamboo zzbamboo requested a review from cosmo0920 September 7, 2025 03:28
@zzbamboo
Copy link
Copy Markdown
Contributor Author

zzbamboo commented Sep 7, 2025

#10822
Squash two commits into one for a cleaner history.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants