Skip to content
This repository was archived by the owner on Jan 23, 2026. It is now read-only.

Parallel SSL cert retrieval#717

Merged
NickCao merged 1 commit intomainfrom
parallel-insecure-tls
Oct 20, 2025
Merged

Parallel SSL cert retrieval#717
NickCao merged 1 commit intomainfrom
parallel-insecure-tls

Conversation

@mangelajo
Copy link
Copy Markdown
Member

@mangelajo mangelajo commented Oct 20, 2025

When using insecure TLS, if the first address of a hostname DNS fails, our code does not try with the other IPs as gRPC does later, so we cannot connect.

This patch attempts to retrieve the insecure cert in parallel over all the available IP addresses.

Summary by CodeRabbit

  • Improvements

    • Faster, more resilient server connections via parallel multi‑IP TLS attempts with early-success cancellation.
    • Better handling of insecure TLS flows with per-address error reporting and aggregated failure reporting as a single connection error.
    • Added diagnostic logging for connection progress, warnings, and failures to aid troubleshooting.
  • Documentation

    • Clarified TLS credential behavior and added a descriptive docstring for the TLS helper.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Oct 20, 2025

Walkthrough

Adds module logging and two async helpers to perform parallel, per‑IP TLS connections to extract server certificates for insecure gRPC channel credential creation; refactors ssl_channel_credentials to delegate insecure flows to the new multi‑IP concurrent extraction logic with error aggregation and task cancellation.

Changes

Cohort / File(s) Change Summary
gRPC insecure TLS concurrency
packages/jumpstarter/jumpstarter/common/grpc.py
Add module logger; add async def _try_connect_and_extract_cert(ip_address: str, port: int, ssl_context: ssl.SSLContext, hostname: str, timeout: float) -> bytes to open TLS to a single IP and return PEM cert bytes; add async def _ssl_channel_credentials_insecure(target: str, timeout: float) -> grpc.ChannelCredentials to resolve all IPs, run per‑IP certificate extraction tasks in parallel, return first successful credentials, cancel remaining tasks, log per‑IP errors, and raise an aggregated ConnectionError if all attempts fail; update async def ssl_channel_credentials(target: str, tls_config, timeout=5) to route insecure TLS cases through the new helper and add a docstring; preserve CA‑based and default credential flows.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant ssl_channel_credentials
    participant DNS as "DNS Resolver"
    participant InsecureFlow as "_ssl_channel_credentials_insecure"
    participant Tasks as "Per-IP Tasks"
    participant Connect as "_try_connect_and_extract_cert"

    Caller->>ssl_channel_credentials: request credentials (insecure)
    ssl_channel_credentials->>InsecureFlow: delegate insecure handling
    InsecureFlow->>DNS: resolve hostname -> IP list

    alt DNS fails
        DNS-->>InsecureFlow: DNS error
        InsecureFlow-->>ssl_channel_credentials: propagate error
    else DNS succeeds
        DNS-->>InsecureFlow: IP list
        InsecureFlow->>Tasks: spawn tasks for each IP (parallel)

        par parallel attempts
            Tasks->>Connect: connect to IP1 (TLS)
            Tasks->>Connect: connect to IP2 (TLS)
            Tasks->>Connect: connect to IPn (TLS)
        end

        alt first TLS success
            Connect-->>Tasks: PEM cert chain (success)
            Tasks->>Tasks: cancel remaining tasks
            Tasks-->>InsecureFlow: root cert bytes
            InsecureFlow-->>ssl_channel_credentials: return grpc.ssl_channel_credentials
        else all fail
            Tasks-->>InsecureFlow: aggregated per-IP errors
            InsecureFlow-->>ssl_channel_credentials: raise ConnectionError
        end
    end

    ssl_channel_credentials-->>Caller: credentials or error
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 I hopped through DNS, checked burrows far and near,
fetched chains from many hosts until one hummed clear.
I waved the rest goodbye, canceled with a cheer,
wrapped certs in PEM and leapt — secure paths now appear. 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "Parallel SSL cert retrieval" directly and accurately reflects the primary change in the pull request. The changeset introduces new async functions that resolve all DNS-resolved IP addresses for a target and perform parallel TLS connection attempts to extract certificates, replacing the prior single-IP flow. This aligns perfectly with the PR's stated objective of handling insecure certificate retrieval in parallel across multiple addresses. The title is concise, specific, and clearly communicates the main enhancement without being vague or overly broad.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch parallel-insecure-tls

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@netlify
Copy link
Copy Markdown

netlify Bot commented Oct 20, 2025

Deploy Preview for jumpstarter-docs ready!

Name Link
🔨 Latest commit 1d42653
🔍 Latest deploy log https://app.netlify.com/projects/jumpstarter-docs/deploys/68f67a771cab62000807d476
😎 Deploy Preview https://deploy-preview-717--jumpstarter-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
packages/jumpstarter/jumpstarter/common/grpc.py (1)

36-36: Consider the fragility of accessing private SSL attributes.

The code accesses ._sslobj, which is a private attribute of Python's SSL implementation. While this is sometimes necessary for low-level certificate extraction, it could break in future Python versions. Consider adding a comment documenting this dependency and the Python version compatibility requirements.

Example documentation:

     # Extract certificates
+    # Note: Accessing _sslobj is implementation-dependent and tested with Python 3.x
+    # See: https://github.com/python/cpython/blob/main/Lib/ssl.py
     cert_chain = writer.get_extra_info("ssl_object")._sslobj.get_unverified_chain()
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2863bcd and d96305a.

📒 Files selected for processing (1)
  • packages/jumpstarter/jumpstarter/common/grpc.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
packages/jumpstarter/jumpstarter/common/grpc.py (1)
packages/jumpstarter/jumpstarter/common/exceptions.py (2)
  • ConfigurationError (48-51)
  • ConnectionError (42-45)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: pytest-matrix (macos-15, 3.12)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.12)
  • GitHub Check: pytest-matrix (macos-15, 3.13)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
  • GitHub Check: pytest-matrix (macos-15, 3.11)
  • GitHub Check: e2e
🔇 Additional comments (3)
packages/jumpstarter/jumpstarter/common/grpc.py (3)

3-3: LGTM: Logging setup is appropriate.

Standard logging configuration for adding observability to the certificate extraction process.

Also applies to: 16-17


103-109: Verify the SSL error fast-fail strategy aligns with real-world scenarios.

When an SSL error occurs on one IP, the code cancels all remaining connection attempts and immediately raises the error. The assumption is that SSL protocol errors (TLS version mismatch, cipher incompatibility, etc.) will affect all IPs of the same hostname.

However, edge cases exist where different IPs might have different SSL configurations (e.g., misconfigured load balancer, A/B testing, gradual rollout). Consider whether continuing to try other IPs on SSL errors could improve resilience.

The current fast-fail behavior is reasonable for the common case, but you may want to verify it against your deployment scenarios. If different IPs could have different SSL configurations, you might want to treat SSL errors like connection errors (log and continue).


130-138: LGTM: Clean delegation to the new parallel certificate extraction.

The refactoring properly delegates insecure mode to the new parallel implementation while preserving existing behavior for CA-based and default TLS configurations.

Comment thread packages/jumpstarter/jumpstarter/common/grpc.py Outdated
Comment thread packages/jumpstarter/jumpstarter/common/grpc.py Outdated
Comment thread packages/jumpstarter/jumpstarter/common/grpc.py Outdated
try:
root_certificates = await task
# Success! Cancel remaining tasks and return
for remaining_task in pending:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we wrap everything in a taskgroup to avoid having to cancel the remaining tasks manually?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes the return on first success a bit harder, but let me try.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't manage with a task group to cleanly return it on first success , but claude has a nice cleanup to use finally to handle pending tasks.

@mangelajo mangelajo force-pushed the parallel-insecure-tls branch 2 times, most recently from 1d5649b to 43153d8 Compare October 20, 2025 17:06
@mangelajo mangelajo requested a review from NickCao October 20, 2025 17:08
@mangelajo mangelajo force-pushed the parallel-insecure-tls branch from 43153d8 to fa01f93 Compare October 20, 2025 17:31
@mangelajo mangelajo requested a review from kirkbrauer October 20, 2025 17:31
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
packages/jumpstarter/jumpstarter/common/grpc.py (1)

37-45: Make cert-chain extraction robust across Python versions (bytes vs str, DER vs PEM).

Current concatenation assumes cert.public_bytes() returns str; on other runtimes it returns bytes or get_unverified_chain() yields DER bytes, causing crashes. Handle all cases and always return PEM bytes.

Apply this diff:

-        cert_chain = writer.get_extra_info("ssl_object")._sslobj.get_unverified_chain()
-        root_certificates = ""
-        for cert in cert_chain:
-            root_certificates += cert.public_bytes()
+        cert_chain = writer.get_extra_info("ssl_object")._sslobj.get_unverified_chain()
+        root_certs_bytes = bytearray()
+        for cert in cert_chain:
+            # CPython <3.13: items may be _ssl.Certificate with .public_bytes() (str or bytes)
+            # CPython >=3.13: items may be DER bytes
+            if hasattr(cert, "public_bytes"):
+                pem = cert.public_bytes()
+                if isinstance(pem, str):
+                    pem = pem.encode()
+                root_certs_bytes.extend(pem)
+            else:
+                # DER -> PEM string, then to bytes
+                if isinstance(cert, str):
+                    cert = cert.encode()
+                pem_str = ssl.DER_cert_to_PEM_cert(cert)
+                root_certs_bytes.extend(pem_str.encode())
         logger.debug(f"Successfully extracted {len(cert_chain)} certificate(s) from {ip_address}:{port}")
-
-        return root_certificates.encode()
+        return bytes(root_certs_bytes)
     finally:
         writer.close()
Does Python 3.13’s ssl.SSLSocket.get_unverified_chain() return a list of DER-encoded bytes rather than certificate objects?

Note: optionally await writer.wait_closed() to drain TLS close; you previously opted not to.

🧹 Nitpick comments (2)
packages/jumpstarter/jumpstarter/common/grpc.py (2)

72-76: De-duplicate resolved IPs before spawning tasks.

Avoid redundant connections when getaddrinfo returns duplicates (A/AAAA, multiple records).

Apply this diff:

-            # Log resolved IPs
-            resolved_ips = [sockaddr[0] for _, _, _, _, sockaddr in addr_info]
+            # De-duplicate by IP
+            unique_by_ip = {}
+            for item in addr_info:
+                ip = item[4][0]
+                unique_by_ip.setdefault(ip, item)
+            addr_info = list(unique_by_ip.values())
+            # Log resolved IPs
+            resolved_ips = [sockaddr[0] for _, _, _, _, sockaddr in addr_info]
             logger.debug(
-                f"Resolved {parsed.hostname} to {len(resolved_ips)} IP(s): {', '.join(resolved_ips)}"
+                f"Resolved {parsed.hostname} to {len(resolved_ips)} unique IP(s): {', '.join(resolved_ips)}"
             )

121-124: Cancel and drain remaining tasks to avoid “Task exception was never retrieved” warnings.

After cancellation, await the tasks with return_exceptions=True.

Apply this diff:

             finally:
                 # Cancel any remaining tasks
-                for task in tasks:
-                    if not task.done():
-                        task.cancel()
+                pending = [t for t in tasks if not t.done()]
+                for t in pending:
+                    t.cancel()
+                if pending:
+                    await asyncio.gather(*pending, return_exceptions=True)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d96305a and fa01f93.

📒 Files selected for processing (1)
  • packages/jumpstarter/jumpstarter/common/grpc.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
packages/jumpstarter/jumpstarter/common/grpc.py (1)
packages/jumpstarter/jumpstarter/common/exceptions.py (2)
  • ConfigurationError (48-51)
  • ConnectionError (42-45)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Redirect rules - jumpstarter-docs
  • GitHub Check: Header rules - jumpstarter-docs
  • GitHub Check: Pages changed - jumpstarter-docs
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
  • GitHub Check: pytest-matrix (macos-15, 3.12)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.12)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
  • GitHub Check: pytest-matrix (macos-15, 3.11)
  • GitHub Check: pytest-matrix (macos-15, 3.13)
  • GitHub Check: e2e
🔇 Additional comments (2)
packages/jumpstarter/jumpstarter/common/grpc.py (2)

16-17: LGTM on module logger.

Good addition for diagnostics; no concerns.


125-129: The review comment is incorrect; no changes are needed.

anyio.fail_after() raises the plain built-in TimeoutError, not anyio.exceptions.TimeoutError. The current code at line 127 already correctly catches the builtin TimeoutError exception. Additionally, anyio is already imported at line 12 (specifically from anyio import fail_after), so no additional import is necessary.

The existing exception handler is functioning as intended.

Likely an incorrect or invalid review comment.

Comment thread packages/jumpstarter/jumpstarter/common/grpc.py
When using insecure TLS, if the first address of a hostname DNS fails,
our code does not try with the other IPs as gRPC does later, so we cannot connect.

This patch attempts to retrieve the insecure cert in parallel over all the
available IP addresses.
@mangelajo mangelajo force-pushed the parallel-insecure-tls branch from fa01f93 to 1d42653 Compare October 20, 2025 18:07
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/jumpstarter/jumpstarter/common/grpc.py (1)

11-13: Catch AnyIO timeouts, not just built-in TimeoutError.

fail_after() raises anyio.exceptions.TimeoutError. The current except TimeoutError won’t catch it, so callers won’t get a mapped ConnectionError.

Apply:

 import grpc
-from anyio import fail_after
+from anyio import fail_after
+import anyio
@@
-    except TimeoutError as e:
+    except (asyncio.TimeoutError, anyio.exceptions.TimeoutError) as e:
         raise ConnectionError(f"Timeout connecting to {parsed.hostname}:{port}") from e

Also applies to: 126-127

🧹 Nitpick comments (4)
packages/jumpstarter/jumpstarter/common/grpc.py (4)

120-124: Await cancelled tasks to avoid “Task was destroyed but it is pending!” warnings.

After task.cancel(), await them with gather(..., return_exceptions=True) to let cancellations settle.

             finally:
                 # Cancel any remaining tasks
                 for task in tasks:
                     if not task.done():
                         task.cancel()
+                # Ensure cancelled tasks are awaited
+                await asyncio.gather(*(t for t in tasks if not t.done()), return_exceptions=True)

73-76: Deduplicate resolved IPs to avoid redundant connection attempts.

getaddrinfo may return duplicate addresses (IPv4/IPv6 or multiple rows per IP). Dedup to reduce load.

-            resolved_ips = [sockaddr[0] for _, _, _, _, sockaddr in addr_info]
+            resolved_ips = []
+            for _, _, _, _, sockaddr in addr_info:
+                ip = sockaddr[0]
+                if ip not in resolved_ips:
+                    resolved_ips.append(ip)
@@
-            for _family, _type, _proto, _canonname, sockaddr in addr_info:
-                ip_address = sockaddr[0]
-                task = asyncio.create_task(try_with_ip(ip_address))
+            for ip_address in resolved_ips:
+                task = asyncio.create_task(try_with_ip(ip_address))
                 tasks.append(task)

Also applies to: 91-95


43-45: Optional: also await writer.wait_closed() for graceful close.

Non-blocking close() is OK, but awaiting closure improves hygiene under high churn.

     finally:
-        writer.close()
+        writer.close()
+        try:
+            await writer.wait_closed()
+        except Exception:
+            pass  # best-effort

131-134: Broaden env var truthiness for insecure mode.

Accept common true values to ease ops usage.

-    if tls_config.insecure or os.getenv("JUMPSTARTER_GRPC_INSECURE") == "1":
+    insecure_env = os.getenv("JUMPSTARTER_GRPC_INSECURE", "").strip().lower()
+    if tls_config.insecure or insecure_env in {"1", "true", "yes", "on"}:
         return await _ssl_channel_credentials_insecure(target, timeout)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fa01f93 and 1d42653.

📒 Files selected for processing (1)
  • packages/jumpstarter/jumpstarter/common/grpc.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
packages/jumpstarter/jumpstarter/common/grpc.py (1)
packages/jumpstarter/jumpstarter/common/exceptions.py (2)
  • ConfigurationError (48-51)
  • ConnectionError (42-45)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Redirect rules - jumpstarter-docs
  • GitHub Check: Header rules - jumpstarter-docs
  • GitHub Check: Pages changed - jumpstarter-docs
  • GitHub Check: e2e
  • GitHub Check: pytest-matrix (macos-15, 3.11)
  • GitHub Check: pytest-matrix (macos-15, 3.12)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
  • GitHub Check: pytest-matrix (macos-15, 3.13)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.12)
🔇 Additional comments (2)
packages/jumpstarter/jumpstarter/common/grpc.py (2)

16-17: Logger setup looks good.


35-43: Now let me check the actual code context in grpc.py to see what exception handling exists:

Based on my investigation, I have found critical information about the review comment. In Python 3.13, you can now get the raw TLS certificate chains from TLS connections via ssl.SSLSocket.get_verified_chain() and ssl.SSLSocket.get_unverified_chain() methods.

More importantly, looking at real-world implementations (gevent), for Python 3.13+ implementations, the methods still return Certificate objects that have a public_bytes() method, indicating that the APIs still return Certificate objects with methods to extract DER bytes, not raw bytes.

However, the current code is still problematic: it concatenates string results (root_certificates = ""), then calls .encode() at the end. This mixed str/bytes approach is fragile regardless of Python version. Additionally, anyio.fail_after() creates a context manager which raises a TimeoutError if it does not finish in time, and catching TimeoutError is the correct pattern.

Rewritten Review Comment:

Fix certificate extraction to handle both Python versions and ensure consistent PEM output.

The current code mixes str and bytes handling fragilly (concatenates strings, then encodes). While Python 3.13 added ssl.SSLSocket.get_unverified_chain() methods, real-world usage shows these still return Certificate objects. To ensure robust cross-version compatibility and correct output format, convert all certificates to bytes in one pass:

-        # Extract certificates
-        cert_chain = writer.get_extra_info("ssl_object")._sslobj.get_unverified_chain()
-        root_certificates = ""
-        for cert in cert_chain:
-            root_certificates += cert.public_bytes()
-        logger.debug(f"Successfully extracted {len(cert_chain)} certificate(s) from {ip_address}:{port}")
-
-        return root_certificates.encode()
+        # Extract certificates (support both Python ≤3.12 and 3.13+)
+        import ssl as ssl_module
+        ssl_obj = writer.get_extra_info("ssl_object")._sslobj
+        cert_chain = ssl_obj.get_unverified_chain()
+        root_certificates = b""
+        for cert in cert_chain:
+            # Handle both _ssl.Certificate objects and potential future bytes formats
+            if isinstance(cert, bytes):
+                root_certificates += ssl_module.DER_cert_to_PEM_cert(cert).encode()
+            else:
+                root_certificates += cert.public_bytes(ssl_module._ssl.ENCODING_PEM)
+        logger.debug(f"Successfully extracted {len(cert_chain)} certificate(s) from {ip_address}:{port}")
+
+        return root_certificates

@NickCao NickCao merged commit becf908 into main Oct 20, 2025
18 checks passed
@NickCao NickCao deleted the parallel-insecure-tls branch October 20, 2025 18:29
@jumpstarter-backport-bot
Copy link
Copy Markdown

Successfully created backport PR for release-0.7:

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants