fix(outlook): verify Exchange TLS with an in-memory CA to remove the cert-file race#4094
Conversation
Concurrent Exchange Server syncs shared a single on-disk outlook_cert.cer file, causing intermittent TLS failures when one sync truncated the file while another read it. Load the PEM into an ssl.SSLContext and inject it via a custom HTTP adapter instead of writing to disk. Co-authored-by: Cursor <cursoragent@cursor.com>
- Restore InMemoryCAAdapter.ssl_context (not just HTTP_ADAPTER_CLS) in the reset fixture to prevent cross-test global-state leakage. - Add unit tests asserting the adapter forwards its ssl_context into init_poolmanager/proxy_manager_for, and omits it when unset. Co-authored-by: Cursor <cursoragent@cursor.com>
Declaring the class attribute as bare None made pyright infer its type as None, so assigning an ssl.SSLContext failed typecheck. Annotate it as ssl.SSLContext | None to fix the lint error. Co-authored-by: Cursor <cursoragent@cursor.com>
- Introduced a new exception, SSLCertificateError, to handle cases where SSL is enabled but no CA certificate is provided or the configured certificate is invalid. - Updated the get_user_accounts method to raise SSLCertificateError with clear messages for misconfigurations, ensuring that connection-wide SSL issues fail loudly rather than silently. - Enhanced unit tests to verify that the appropriate exceptions are raised under these conditions. Co-authored-by: Cursor <cursoragent@cursor.com>
…ndling - Updated the SSL context creation to explicitly raise a ValueError when no SSL certificate is provided, ensuring that misconfigurations are caught early. - Enhanced unit tests to cover scenarios where invalid or markerless certificates are provided, ensuring that appropriate exceptions are raised. - Improved documentation comments to clarify the SSL context building process and its implications for certificate validation. Co-authored-by: Cursor <cursoragent@cursor.com>
…on comments - Updated the `close` method to unbind the LDAP connection only if it was created, preventing potential errors during cleanup. - Enhanced comments in the `get_user_accounts` method to clarify the implications of SSL configuration and certificate validation, ensuring better understanding for future maintenance.
The LDAP connection cleanup is a pre-existing concern unrelated to the in-memory SSL certificate change, so it is removed from this PR to keep the scope focused on the TLS/cert rework. Co-authored-by: Cursor <cursoragent@cursor.com>
…ndling - Updated comments in `client.py` and `datasource.py` to clarify the implications of SSL configuration and certificate validation. - Enhanced error handling to ensure that missing or unusable CA certificates raise appropriate exceptions, preventing silent failures. - Improved unit tests to verify that SSL misconfigurations are handled correctly, ensuring robust error reporting. Co-authored-by: Cursor <cursoragent@cursor.com>
| BaseProtocol.HTTP_ADAPTER_CLS = ( | ||
| RootCAAdapter if self.ssl_enabled else NoVerifyHTTPAdapter | ||
| ) | ||
| # exchangelib applies HTTP_ADAPTER_CLS (and our CA context) process-wide; |
There was a problem hiding this comment.
Would that be a problem if you run multiple Outlook syncs for different tenants in one process?
There was a problem hiding this comment.
Yes, unfortunately. While by moving it from the file system to memory - we're preventing it from conflicting with itself, 2 connectors that are configured with different certificates would still be conflicting with each other. Same as with the file, I must say.
There was a problem hiding this comment.
I don't believe there is an easy fix for that.
There was a problem hiding this comment.
Cool, if it's not a regression we can go ahead with it.
It's also worth mentioning it in known issues in our docs, can you do that?
There was a problem hiding this comment.
Sure, I'll make sure to update the known issues.
…e the cert-file race (#4094) (#4114) Backports the following commits to 9.3: - fix(outlook): verify Exchange TLS with an in-memory CA to remove the cert-file race (#4094) Co-authored-by: Jan-Kazlouski-elastic <jan.kazlouski@elastic.co> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
…e the cert-file race (#4094) (#4115) Backports the following commits to 9.4: - fix(outlook): verify Exchange TLS with an in-memory CA to remove the cert-file race (#4094) Co-authored-by: Jan-Kazlouski-elastic <jan.kazlouski@elastic.co> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
…ve the cert-file race (#4094) (#4116) Manual backport of #4094 to `8.19`. The [auto-backport bot failed for 8.19](#4094) ("Commit could not be cherrypicked due to conflicts") because the connector layout differs between branches: `main` splits Outlook into a package (`app/connectors_service/connectors/sources/outlook/{client,constants,datasource}.py`) while `8.19` keeps it as a single module (`connectors/sources/outlook.py`). The change has therefore been hand-ported; it is **functionally identical** to the merged PR. ## Summary - Load the configured CA directly into an `ssl.SSLContext` via `ssl.create_default_context(cadata=...)` and inject it into `urllib3` through a new `InMemoryCAAdapter` (`init_poolmanager`/`proxy_manager_for`). - Remove the on-disk certificate machinery (`ManageCertificate`, `RootCAAdapter`, `SSLFailed`, `CERT_FILE` / `outlook_cert.cer`); `ExchangeUsers.close()` no longer cleans up a file. - SSL enabled with a missing/unusable CA now raises `SSLCertificateError` instead of silently downgrading to an unverified/system-CA connection. - `validate_config` confirms the cert actually loads (via `create_default_context`) so misconfig fails up front rather than mid-sync. This structurally eliminates the shared-file race behind the intermittent `NO_CERTIFICATE_OR_CRL_FOUND` SSL error. TLS verification behaviour is otherwise unchanged (`CERT_REQUIRED` + hostname check). ## Test plan - [x] `tests/sources/test_outlook.py` — **48 passed** (includes the new in-memory adapter, SSL-context, error-path, and no-cert-file-written tests ported from the original PR). - [x] `ruff check` + `ruff format --check` clean on changed files. ## Release Note **Outlook Server connector**: fixed an intermittent `NO_CERTIFICATE_OR_CRL_FOUND` SSL error that could abort syncs when multiple syncs ran close together. The configured CA certificate is now verified entirely in memory instead of through a shared temporary file, removing the race condition. TLS verification behaviour is otherwise unchanged. Backport of #4094. Made with [Cursor](https://cursor.com) Co-authored-by: Cursor <cursoragent@cursor.com>
Backport of the connectors known-issues entry for the intermittent NO_CERTIFICATE_OR_CRL_FOUND SSL error on the Outlook connector, caused by a shared on-disk certificate file race and fixed in elastic/connectors#4094 (shipped in 8.19.18).
Add a known-issues entry for the intermittent NO_CERTIFICATE_OR_CRL_FOUND SSL error on the Outlook connector, caused by a shared on-disk certificate file race and fixed in elastic/connectors#4094 (8.19.18, 9.3.7, 9.4.3, 9.5.0).
Relates to https://github.com/elastic/sdh-search/issues/1898
Follow-up to #4085. That PR fixed the case where SSL was enabled with an empty
certificate. This PR addresses the remaining, intermittent failure the
customer reported:
SSLError: [X509] no certificate or crl found (_ssl.c:4279) (NO_CERTIFICATE_OR_CRL_FOUND)that appears "once in a while"with no configuration change between runs — a classic race condition.
Root cause — shared certificate file race
The Outlook Server connector verified the Exchange server's TLS certificate by
writing the configured CA to a fixed file on disk (
outlook_cert.cerin theprocess working directory) and pointing
requests/urllib3at that path:ManageCertificate.store_certificate()opened the file in"w"mode, whichtruncates it to zero bytes before the new contents are written.
RootCAAdapter.cert_verify()handed that single shared path to the TLS layer.ExchangeUsers.close()deleted the file at the end of a sync.Because the filename is process-global and shared by every sync, concurrent or
overlapping Outlook syncs race on it. The failure window is small but real:
"w"mode, truncating it to empty.NO_CERTIFICATE_OR_CRL_FOUND.A second variant: Sync A finishes and deletes the file while Sync B still
expects it to exist for a new connection. Either way the error is timing-
dependent, non-reproducible, and unrelated to the (unchanged) configuration —
exactly matching the customer's description.
Fix — keep the CA in memory, never touch disk
The certificate is now loaded straight from the configured PEM string into an
ssl.SSLContextviassl.create_default_context(cadata=self.ssl_ca), and asmall
requestsadapter (InMemoryCAAdapter) injects that context intourllib3throughinit_poolmanager/proxy_manager_for. No file is created,read, or deleted, so the shared-file race is structurally impossible.
Behavioural notes:
create_default_context()returns a context with
verify_mode = CERT_REQUIREDandcheck_hostname = True, so a bad/expired/mismatched server certificate stillfails the handshake.
still falls back to
NoVerifyHTTPAdapterand logs a clear warning, matchingthe behaviour introduced in fix(outlook): skip mailbox-less accounts and stop SSL misconfig from aborting sync #4085.
ManageCertificate,RootCAAdapter, theSSLFailedexception and theCERT_FILEconstant are removed;close()no longer needs to clean up a file.Testing
Because there is no containerizable Microsoft Exchange/EWS + Active Directory
image, this connector cannot use the repo's Docker-based
ftestharness. Insteadthe change was validated locally in four layers — from static checks up to real
TLS handshakes and concurrency, using freshly generated self-signed
certificates and the production code (no Exchange server required). All checks
passed.
Layer 1 — Static checks + automated tests
ManageCertificate,RootCAAdapter,SSLFailed,CERT_FILE, oroutlook_cert.cersymbols anywhere in the repo.tests/sources/test_outlook.py(48) +tests/test_utils.py(110) = 158 passed.Layer 2 — Real TLS handshake through the production adapter
Drove
InMemoryCAAdapteragainst a local HTTPS server using a self-signed certificate over real sockets (no mocks):CERT_REQUIRED+check_hostname.SSLError.outlook_cert.cerwritten to disk.Layer 3 — Connector code path with the real
sslmoduleExercised
ExchangeUsers.get_user_accounts()end-to-end with the realssl.create_default_context(only AD/LDAP lookups andexchangelib.Accountmocked):InMemoryCAAdapter, and the context it builds verifies a live TLS server.get_pem_format) →SSLCertificateError, with no silent fallback to system CAs.SSLCertificateError.NoVerifyHTTPAdapterselected.Layer 4 — Concurrency & filesystem independence (the root-cause scenario)
outlook_cert.cerever appears during the run — the file race behindNO_CERTIFICATE_OR_CRL_FOUNDis structurally eliminated.Raw output of the local verification run (Layers 2–4)
Checklists
Pre-Review Checklist
config.yml.example)v7.13.2,v7.14.0,v8.0.0)Changes Requiring Extra Attention
ssl.SSLContext. Reviewers should confirm verification is still enforced (CERT_REQUIRED+ hostname check) and that the no-certificate fallback to unverified connections remains the desired posture.Related Pull Requests
Release Note
Outlook Server connector: fixed an intermittent
NO_CERTIFICATE_OR_CRL_FOUNDSSL error that could abort syncs when multiple syncs ran close together. The configured CA certificate is now verified entirely in memory instead of through a shared temporary file, removing the race condition. TLS verification behaviour is otherwise unchanged.