Skip to content

unify TLS logic and caching in one place#8713

Merged
ajtmccarty merged 4 commits intodevelopfrom
ajtm-03262026-tls-registry
Mar 30, 2026
Merged

unify TLS logic and caching in one place#8713
ajtmccarty merged 4 commits intodevelopfrom
ajtm-03262026-tls-registry

Conversation

@ajtmccarty
Copy link
Copy Markdown
Contributor

@ajtmccarty ajtmccarty commented Mar 26, 2026

Why

TLS context creation logic was duplicated in this enterprise PR to add a new TCPTransport class for log forwarding. Consolidating the TLS context creation code in one place that can be reused is better than duplicating it. This also adds a TLS context caching component TlsContextRegistry so that any TLS context can be computed one time per worker and then cached

What changed

  • infrahub/tls/context_builder.py — new TlsContextBuilder class with a single build(insecure, ca_bundle, force_verify) static method containing the canonical CA bundle resolution logic (replaces both HTTPSettings.get_tls_context and build_tls_context)
  • infrahub/tls/registry.py — new TlsContextRegistry class that wraps TlsContextBuilder, caches built contexts by (insecure, ca_bundle, force_verify) key, and exposes a validate() method for startup-time pre-warming
  • workers/dependencies.pybuild_tls_registry() / get_tls_registry() added following the existing build_* / get_* singleton pattern; build_http_service and build_workflow now inject TlsContextRegistry into their respective components
  • HttpxAdapter — now accepts TlsContextRegistry as a constructor argument; @cached_property tls_context / tls_context_verified removed in favour of direct registry calls in verify_tls()
  • WorkflowWorkerExecution — class-level _http_adapter = HttpxAdapter() workaround removed (the comment explicitly anticipated this); now accepts TlsContextRegistry directly
  • HTTPSettings.get_tls_context — removed; the model validator calls TlsContextBuilder.build() directly for startup validation
  • build_tls_context in enterprise transport/base.py — removed; tcp.py now calls TlsContextBuilder.build() directly
  • server.py, workers/infrahub_async.py, cli/__init__.py, cli/git_agent.py — updated to pass a constructed HttpxAdapter into InfrahubServices.new()

No public API changes, no schema changes, no config/env changes.

Suggested review order

  1. infrahub/tls/context_builder.py and infrahub/tls/registry.py — the new primitives
  2. workers/dependencies.py — DI wiring
  3. services/adapters/http/httpx.py and services/adapters/workflow/worker.py — the two main consumers
  4. config.py — removal of get_tls_context
  5. enterprise/transport/base.py and tcp.py — removal of build_tls_context
  6. Tests

How to review

Focus on tls/registry.py — specifically that the cache key (insecure, ca_bundle, force_verify) is correct and that validate() surfaces SSL errors at startup rather than at first use.

How to test

# Unit tests for TlsContextBuilder
uv run invoke backend.test-unit -- -k tls

Impact & rollout

  • Backward compatibility: No breaking changes. HTTPSettings.get_tls_context() was internal; no public API removed.
  • Performance: SSL contexts are now shared across all components rather than built per-instance or per-reconnect.

Checklist

  • Tests added/updated
  • Changelog entry added (uv run towncrier create ...)
  • External docs updated (if user-facing or ops-facing change)
  • Internal .md docs updated (internal knowledge and AI code tools knowledge)
  • I have reviewed AI generated content

Summary by CodeRabbit

  • Refactor

    • Centralized TLS certificate/context handling into a registry and builder for more reliable and consistent TLS behavior across HTTP and workflow components.
    • Updated service startup to wire the new TLS/HTTP dependencies throughout the app, improving TLS validation and reuse.
  • Tests

    • Added unit tests for TLS context construction and updated integration tests to exercise the new TLS wiring.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 26, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2b0e4723-f4cd-4583-b547-6fb90011f1ca

📥 Commits

Reviewing files that changed from the base of the PR and between 92447bd and dbd6daa.

📒 Files selected for processing (1)
  • backend/infrahub/config.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/infrahub/config.py

Walkthrough

TLS context creation was extracted from HTTPSettings into two new components: TlsContextBuilder and TlsContextRegistry. HTTPSettings.get_tls_context was removed. HttpxAdapter and WorkflowWorkerExecution were changed to accept a TlsContextRegistry and obtain SSL contexts from it. Service wiring was updated so InfrahubServices is constructed with an HTTP dependency and adapters are instantiated with a TLS registry; CLI, server, and worker initialization paths were updated to pass these dependencies. New registry builder and dependency provider functions were added and tests adjusted accordingly.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'unify TLS logic and caching in one place' accurately captures the main objective of the PR, which is consolidating TLS context creation logic and adding caching via TlsContextRegistry.
Description check ✅ Passed The PR description comprehensively covers the required sections: Why (duplication problem, consolidation goal), What changed (detailed changes by component), How to review (focus areas), How to test (specific commands), and Impact & rollout (backward compatibility and performance notes).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the group/backend Issue related to the backend (API Server, Git Agent) label Mar 26, 2026
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Mar 26, 2026

Merging this PR will not alter performance

✅ 12 untouched benchmarks


Comparing ajtm-03262026-tls-registry (95bd587) with develop (106e7eb)

Open in CodSpeed


workflow = config.OVERRIDE.workflow or (
WorkflowWorkerExecution()
WorkflowWorkerExecution(tls_registry=build_tls_registry())
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WorkflowWorkerExecution had a confusing kludge where it instantiated HttpxAdapter as a class variable so that it could use HttpxAdapeter.verify_tls() in a context manager for running a deployment.
passing in the new TlxContextRegistry lets us remove that weird workaround

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good, having said that, we should probably remove this cli command git_agent.py... but should be done in another PR as it's not related to this change.


_settings: config.HTTPSettings | None = None
def __init__(self, tls_registry: TlsContextRegistry) -> None:
self._tls_registry = tls_registry
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new dependency that needs to be injected. allows removing the @cached_propertys below and moves them to the TlsContextRegistry which is now responsible for caching TLS context instances

def build_tls_registry() -> TlsContextRegistry:
if "tls_registry" not in _singletons:
_singletons["tls_registry"] = TlsContextRegistry()
return _singletons["tls_registry"]
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new singleton so everywhere can use the same TlsContextRegistry in a given worker


return self

def get_tls_context(self, force_verify: bool = False) -> ssl.SSLContext:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to TlsContextBuilder.build

with patch.dict(os.environ, {"INFRAHUB_STORAGE_MAX_FILE_SIZE": "75"}):
assert StorageSettings().max_file_size == 75
assert isinstance(SETTINGS.storage.max_file_size, int)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to their own file b/c they are no longer part of config

@ajtmccarty ajtmccarty marked this pull request as ready for review March 26, 2026 21:38
@ajtmccarty ajtmccarty requested a review from a team as a code owner March 26, 2026 21:38
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/infrahub/config.py (1)

537-544: ⚠️ Potential issue | 🟠 Major

Force-verify the configured CA bundle during startup validation.

TlsContextBuilder.build() short-circuits when tls_insecure=True, so this validator never parses tls_ca_bundle. A malformed bundle or path then survives boot and only fails later on the first forced-verification request in backend/infrahub/services/adapters/http/httpx.py:55-60; the same short-circuit also means backend/infrahub/tls/registry.py:23-28 cannot prewarm that path today. If early SSL failures are part of the contract, validate the forced-verification variant here as well.

🐛 Possible fix
-            TlsContextBuilder.build(insecure=self.tls_insecure, ca_bundle=self.tls_ca_bundle)
+            TlsContextBuilder.build(
+                insecure=self.tls_insecure,
+                ca_bundle=self.tls_ca_bundle,
+                force_verify=bool(self.tls_ca_bundle),
+            )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/infrahub/config.py` around lines 537 - 544, The validator
set_tls_context currently calls TlsContextBuilder.build with
insecure=self.tls_insecure which short-circuits when tls_insecure=True and skips
parsing tls_ca_bundle; change the validation to explicitly attempt a
forced-verification build (call TlsContextBuilder.build with insecure=False)
when a tls_ca_bundle is configured (or always attempt a forced-verify build in
addition to the original call) so malformed bundle paths are caught at startup,
and keep the existing except ssl.SSLError -> raise ValueError behavior
referencing tls_ca_bundle; use the existing set_tls_context and
TlsContextBuilder.build symbols to locate where to add the extra forced-verify
build attempt.
🧹 Nitpick comments (2)
backend/infrahub/tls/registry.py (1)

14-21: Canonicalize file-path CA bundles before using them as cache keys.

key = (insecure, ca_bundle, force_verify) is keyed on the caller's raw string, so the same CA file can be cached multiple times via relative, absolute, or symlinked spellings. That weakens the per-worker reuse this registry is supposed to provide and makes prewarming depend on how the path was written. Normalize existing file paths before constructing the key.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/infrahub/tls/registry.py` around lines 14 - 21, The cache key
currently uses the raw ca_bundle string, so normalize the CA file path before
building the key in get; if ca_bundle is not None, resolve it to a canonical
absolute path (e.g., via pathlib.Path(ca_bundle).resolve() or os.path.realpath)
and convert to a stable string, otherwise keep None, then use that normalized
value in the key tuple (insecure, normalized_ca_bundle, force_verify) and
proceed to read/create the SSL context via TlsContextBuilder.build and store it
in self._cache to avoid duplicate entries caused by relative paths or symlinks.
backend/tests/unit/tls/test_context_builder.py (1)

9-79: Add direct tests for the new registry contract.

These cases exercise TlsContextBuilder, but they never instantiate TlsContextRegistry. A regression in backend/infrahub/tls/registry.py—for example losing force_verify from the cache key or not validating the forced-verification path—would still pass. Please add a small unit suite around TlsContextRegistry.get() and validate() so the new caching layer is covered directly.

As per coding guidelines, "Write tests for new functionality in Python backend code".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/tests/unit/tls/test_context_builder.py` around lines 9 - 79, Add
direct tests for the TlsContextRegistry by creating a TlsContextRegistry
instance and exercising its get() and validate() methods (in addition to
existing TlsContextBuilder tests) to cover the caching layer and
forced-verification behavior; specifically, call TlsContextRegistry.get() with
different parameter combinations (insecure True/False, force_verify True/False,
ca_bundle as path and as PEM string) and assert that (1) entries keyed
differently by force_verify produce distinct cached contexts, (2) validate()
enforces the forced-verification path (i.e., returns a context with verify_mode
== ssl.CERT_REQUIRED and check_hostname True when force_verify=True and a valid
ca_bundle is provided), and (3) long PEM strings that raise OSError when treated
as paths are handled as PEM content by validate(); reference
TlsContextRegistry.get, TlsContextRegistry.validate, and TlsContextBuilder.build
when adding these assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@backend/infrahub/config.py`:
- Around line 537-544: The validator set_tls_context currently calls
TlsContextBuilder.build with insecure=self.tls_insecure which short-circuits
when tls_insecure=True and skips parsing tls_ca_bundle; change the validation to
explicitly attempt a forced-verification build (call TlsContextBuilder.build
with insecure=False) when a tls_ca_bundle is configured (or always attempt a
forced-verify build in addition to the original call) so malformed bundle paths
are caught at startup, and keep the existing except ssl.SSLError -> raise
ValueError behavior referencing tls_ca_bundle; use the existing set_tls_context
and TlsContextBuilder.build symbols to locate where to add the extra
forced-verify build attempt.

---

Nitpick comments:
In `@backend/infrahub/tls/registry.py`:
- Around line 14-21: The cache key currently uses the raw ca_bundle string, so
normalize the CA file path before building the key in get; if ca_bundle is not
None, resolve it to a canonical absolute path (e.g., via
pathlib.Path(ca_bundle).resolve() or os.path.realpath) and convert to a stable
string, otherwise keep None, then use that normalized value in the key tuple
(insecure, normalized_ca_bundle, force_verify) and proceed to read/create the
SSL context via TlsContextBuilder.build and store it in self._cache to avoid
duplicate entries caused by relative paths or symlinks.

In `@backend/tests/unit/tls/test_context_builder.py`:
- Around line 9-79: Add direct tests for the TlsContextRegistry by creating a
TlsContextRegistry instance and exercising its get() and validate() methods (in
addition to existing TlsContextBuilder tests) to cover the caching layer and
forced-verification behavior; specifically, call TlsContextRegistry.get() with
different parameter combinations (insecure True/False, force_verify True/False,
ca_bundle as path and as PEM string) and assert that (1) entries keyed
differently by force_verify produce distinct cached contexts, (2) validate()
enforces the forced-verification path (i.e., returns a context with verify_mode
== ssl.CERT_REQUIRED and check_hostname True when force_verify=True and a valid
ca_bundle is provided), and (3) long PEM strings that raise OSError when treated
as paths are handled as PEM content by validate(); reference
TlsContextRegistry.get, TlsContextRegistry.validate, and TlsContextBuilder.build
when adding these assertions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e130ebbc-05f8-4c17-a339-a9698a18a4a8

📥 Commits

Reviewing files that changed from the base of the PR and between 270cfb1 and 92447bd.

📒 Files selected for processing (19)
  • backend/infrahub/cli/__init__.py
  • backend/infrahub/cli/git_agent.py
  • backend/infrahub/cli/tasks.py
  • backend/infrahub/config.py
  • backend/infrahub/server.py
  • backend/infrahub/services/__init__.py
  • backend/infrahub/services/adapters/http/httpx.py
  • backend/infrahub/services/adapters/workflow/worker.py
  • backend/infrahub/tls/__init__.py
  • backend/infrahub/tls/context_builder.py
  • backend/infrahub/tls/registry.py
  • backend/infrahub/workers/dependencies.py
  • backend/infrahub/workers/infrahub_async.py
  • backend/tests/integration/services/adapters/http/test_httpx.py
  • backend/tests/integration/services/adapters/workflow/test_workflow_execution.py
  • backend/tests/unit/config/test_config.py
  • backend/tests/unit/tls/__init__.py
  • backend/tests/unit/tls/test_context_builder.py
  • pyproject.toml
💤 Files with no reviewable changes (1)
  • backend/tests/unit/config/test_config.py

Copy link
Copy Markdown
Contributor

@ogenstad ogenstad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great improvement!

One additional thing to highlight could potentially be this part:

def build_client() -> InfrahubClient:
    client_config = Config(address=config.SETTINGS.main.internal_address, retry_on_failure=True)
    client_config.set_ssl_context(context=get_http().verify_tls())
    client = InfrahubClient(config=client_config)
    # Populate client schema cache using our internal schema cache
    if registry.schema:
        for branch in registry.schema.get_branches():
            client.schema.set_cache(schema=registry.schema.get_sdk_schema_branch(name=branch), branch=branch)

    return client

Reaching into get_http().verify_tls() here still feels somewhat hacky. On the other hand even if we were to change this we'd still be using the settings from the HTTP adapter. Have a look at this and see if you want to change anything, otherwise we can leave it as is for now. I think the next step after this is merged would be to introduce a common TLS setting across Infrahub which will be the default configuration option that could be overridden by a specific adapter if needed.


workflow = config.OVERRIDE.workflow or (
WorkflowWorkerExecution()
WorkflowWorkerExecution(tls_registry=build_tls_registry())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good, having said that, we should probably remove this cli command git_agent.py... but should be done in another PR as it's not related to this change.

Copy link
Copy Markdown
Contributor

@fatih-acar fatih-acar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this long awaited refactor, LGTM!

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 19 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="backend/infrahub/tls/registry.py">

<violation number="1" location="backend/infrahub/tls/registry.py:23">
P2: `validate()` omits `force_verify`, so when `insecure=True` the builder short-circuits to an unverified context and never touches the CA bundle. A broken CA bundle won't be caught at startup — the error will surface only at first request time with `force_verify=True`, defeating the purpose of this method.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

)
return self._cache[key]

def validate(self, insecure: bool = False, ca_bundle: str | None = None) -> None:
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: validate() omits force_verify, so when insecure=True the builder short-circuits to an unverified context and never touches the CA bundle. A broken CA bundle won't be caught at startup — the error will surface only at first request time with force_verify=True, defeating the purpose of this method.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/infrahub/tls/registry.py, line 23:

<comment>`validate()` omits `force_verify`, so when `insecure=True` the builder short-circuits to an unverified context and never touches the CA bundle. A broken CA bundle won't be caught at startup — the error will surface only at first request time with `force_verify=True`, defeating the purpose of this method.</comment>

<file context>
@@ -0,0 +1,28 @@
+            )
+        return self._cache[key]
+
+    def validate(self, insecure: bool = False, ca_bundle: str | None = None) -> None:
+        """Build and cache the SSL context for the given config, raising ValueError on failure."""
+        try:
</file context>
Fix with Cubic

@ajtmccarty ajtmccarty merged commit 5574812 into develop Mar 30, 2026
50 checks passed
@ajtmccarty ajtmccarty deleted the ajtm-03262026-tls-registry branch March 30, 2026 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

group/backend Issue related to the backend (API Server, Git Agent)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants