Skip to content

fix(outlook): verify Exchange TLS with an in-memory CA to remove the cert-file race#4094

Merged
Jan-Kazlouski-elastic merged 11 commits into
mainfrom
jan-kazlouski/outlook-in-memory-ssl-cert
Jun 29, 2026
Merged

fix(outlook): verify Exchange TLS with an in-memory CA to remove the cert-file race#4094
Jan-Kazlouski-elastic merged 11 commits into
mainfrom
jan-kazlouski/outlook-in-memory-ssl-cert

Conversation

@Jan-Kazlouski-elastic

@Jan-Kazlouski-elastic Jan-Kazlouski-elastic commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Relates to https://github.com/elastic/sdh-search/issues/1898

Follow-up to #4085. That PR fixed the case where SSL was enabled with an empty
certificate. This PR addresses the remaining, intermittent failure the
customer reported: SSLError: [X509] no certificate or crl found (_ssl.c:4279) (NO_CERTIFICATE_OR_CRL_FOUND) that appears "once in a while"
with no configuration change between runs — a classic race condition.

Root cause — shared certificate file race

The Outlook Server connector verified the Exchange server's TLS certificate by
writing the configured CA to a fixed file on disk (outlook_cert.cer in the
process working directory) and pointing requests/urllib3 at that path:

  • ManageCertificate.store_certificate() opened the file in "w" mode, which
    truncates it to zero bytes before the new contents are written.
  • RootCAAdapter.cert_verify() handed that single shared path to the TLS layer.
  • ExchangeUsers.close() deleted the file at the end of a sync.

Because the filename is process-global and shared by every sync, concurrent or
overlapping Outlook syncs race on it. The failure window is small but real:

  1. Sync A is mid-handshake and the TLS layer reads the CA file.
  2. Sync B starts and opens the same file in "w" mode, truncating it to empty.
  3. Sync A's read now sees an empty fileNO_CERTIFICATE_OR_CRL_FOUND.

A second variant: Sync A finishes and deletes the file while Sync B still
expects it to exist for a new connection. Either way the error is timing-
dependent, non-reproducible, and unrelated to the (unchanged) configuration —
exactly matching the customer's description.

Fix — keep the CA in memory, never touch disk

The certificate is now loaded straight from the configured PEM string into an
ssl.SSLContext via ssl.create_default_context(cadata=self.ssl_ca), and a
small requests adapter (InMemoryCAAdapter) injects that context into
urllib3 through init_poolmanager/proxy_manager_for. No file is created,
read, or deleted, so the shared-file race is structurally impossible.

Behavioural notes:

  • TLS verification is genuinely enforced. create_default_context()
    returns a context with verify_mode = CERT_REQUIRED and
    check_hostname = True, so a bad/expired/mismatched server certificate still
    fails the handshake.
  • No-certificate fallback is preserved. SSL enabled with no certificate
    still falls back to NoVerifyHTTPAdapter and logs a clear warning, matching
    the behaviour introduced in fix(outlook): skip mailbox-less accounts and stop SSL misconfig from aborting sync #4085.
  • ManageCertificate, RootCAAdapter, the SSLFailed exception and the
    CERT_FILE constant are removed; close() no longer needs to clean up a file.

Testing

Because there is no containerizable Microsoft Exchange/EWS + Active Directory
image, this connector cannot use the repo's Docker-based ftest harness. Instead
the change was validated locally in four layers — from static checks up to real
TLS handshakes
and concurrency, using freshly generated self-signed
certificates and the production code (no Exchange server required). All checks
passed.

Layer 1 — Static checks + automated tests

  • No remaining references to the removed ManageCertificate, RootCAAdapter, SSLFailed, CERT_FILE, or outlook_cert.cer symbols anywhere in the repo.
  • Unit/integration suites green: tests/sources/test_outlook.py (48) + tests/test_utils.py (110) = 158 passed.

Layer 2 — Real TLS handshake through the production adapter

Drove InMemoryCAAdapter against a local HTTPS server using a self-signed certificate over real sockets (no mocks):

  • Correct in-memory CA → connection verified (HTTP 200).
  • Context enforces CERT_REQUIRED + check_hostname.
  • Wrong in-memory CA → handshake rejected with SSLError.
  • No outlook_cert.cer written to disk.

Layer 3 — Connector code path with the real ssl module

Exercised ExchangeUsers.get_user_accounts() end-to-end with the real ssl.create_default_context (only AD/LDAP lookups and exchangelib.Account mocked):

  • Connector selects InMemoryCAAdapter, and the context it builds verifies a live TLS server.
  • Marker-less junk cert (reduced to empty by get_pem_format) → SSLCertificateError, with no silent fallback to system CAs.
  • Non-empty but unloadable PEM → SSLCertificateError.
  • SSL disabled → NoVerifyHTTPAdapter selected.
  • No cert file written.

Layer 4 — Concurrency & filesystem independence (the root-cause scenario)

  • SSL setup succeeds in a read-only working directory and writes nothing — proving the disk dependency is gone.
  • 4 concurrent SSL syncs (different CAs) complete without error, and no shared outlook_cert.cer ever appears during the run — the file race behind NO_CERTIFICATE_OR_CRL_FOUND is structurally eliminated.
Raw output of the local verification run (Layers 2–4)
=== Layer 2: production InMemoryCAAdapter over real TLS ===
  [PASS] correct in-memory CA -> TLS verified (HTTP 200)
  [PASS] context enforces verification (CERT_REQUIRED + check_hostname)
  [PASS] wrong in-memory CA -> handshake rejected (SSLError)
  [PASS] no outlook_cert.cer written to disk

=== Layer 3: connector code path with REAL ssl module ===
  [PASS] connector selected InMemoryCAAdapter
  [PASS] connector-built context verifies live server
  [PASS] markerless cert -> SSLCertificateError (no system-CA fallback)
  [PASS] unloadable PEM -> SSLCertificateError
  [PASS] SSL disabled -> NoVerifyHTTPAdapter selected
  [PASS] no outlook_cert.cer written to disk

=== Layer 4: concurrency + read-only working directory ===
  [PASS] SSL setup succeeds in read-only working directory
  [PASS] no cert file created in read-only dir
  [PASS] 4 concurrent SSL syncs complete without error
  [PASS] no shared cert file ever appears during concurrent syncs

ALL CHECKS PASSED

A full content sync against a live Microsoft Exchange Server remains a manual staging step (with a self-signed CA in a non-production tenant). The layers above cover SSL adapter selection, real TLS verification semantics, validation/rejection of bad certificates, and the concurrency/file-race fix without it.

Checklists

Pre-Review Checklist

  • this PR does NOT contain credentials of any kind, such as API keys or username/passwords (double check config.yml.example)
  • this PR has a meaningful title
  • this PR links to all relevant github issues that it fixes or partially addresses
  • if there is no GH issue, please create it. Each PR should have a link to an issue
  • this PR has a thorough description
  • Covered the changes with automated tests
  • Tested the changes locally
  • Added a label for each target release version (example: v7.13.2, v7.14.0, v8.0.0)
  • For bugfixes: backport safely to all minor branches still receiving patch releases
  • Considered corresponding documentation changes
  • Contributed any configuration settings changes to the configuration reference
  • if you added or changed Rich Configurable Fields for a Native Connector, you made a corresponding PR in Kibana

Changes Requiring Extra Attention

  • Security-related changes (encryption, TLS, SSRF, etc) — the CA is now held in process memory instead of written to a file on disk, and TLS verification is performed via an explicit ssl.SSLContext. Reviewers should confirm verification is still enforced (CERT_REQUIRED + hostname check) and that the no-certificate fallback to unverified connections remains the desired posture.

Related Pull Requests

Release Note

Outlook Server connector: fixed an intermittent NO_CERTIFICATE_OR_CRL_FOUND SSL error that could abort syncs when multiple syncs ran close together. The configured CA certificate is now verified entirely in memory instead of through a shared temporary file, removing the race condition. TLS verification behaviour is otherwise unchanged.

Jan-Kazlouski-elastic and others added 3 commits June 23, 2026 16:12
Concurrent Exchange Server syncs shared a single on-disk outlook_cert.cer
file, causing intermittent TLS failures when one sync truncated the file
while another read it. Load the PEM into an ssl.SSLContext and inject it
via a custom HTTP adapter instead of writing to disk.

Co-authored-by: Cursor <cursoragent@cursor.com>
- Restore InMemoryCAAdapter.ssl_context (not just HTTP_ADAPTER_CLS) in the
  reset fixture to prevent cross-test global-state leakage.
- Add unit tests asserting the adapter forwards its ssl_context into
  init_poolmanager/proxy_manager_for, and omits it when unset.

Co-authored-by: Cursor <cursoragent@cursor.com>
Declaring the class attribute as bare None made pyright infer its type
as None, so assigning an ssl.SSLContext failed typecheck. Annotate it as
ssl.SSLContext | None to fix the lint error.

Co-authored-by: Cursor <cursoragent@cursor.com>
Jan-Kazlouski-elastic and others added 6 commits June 23, 2026 20:59
- Introduced a new exception, SSLCertificateError, to handle cases where SSL is enabled but no CA certificate is provided or the configured certificate is invalid.
- Updated the get_user_accounts method to raise SSLCertificateError with clear messages for misconfigurations, ensuring that connection-wide SSL issues fail loudly rather than silently.
- Enhanced unit tests to verify that the appropriate exceptions are raised under these conditions.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ndling

- Updated the SSL context creation to explicitly raise a ValueError when no SSL certificate is provided, ensuring that misconfigurations are caught early.
- Enhanced unit tests to cover scenarios where invalid or markerless certificates are provided, ensuring that appropriate exceptions are raised.
- Improved documentation comments to clarify the SSL context building process and its implications for certificate validation.

Co-authored-by: Cursor <cursoragent@cursor.com>
…on comments

- Updated the `close` method to unbind the LDAP connection only if it was created, preventing potential errors during cleanup.
- Enhanced comments in the `get_user_accounts` method to clarify the implications of SSL configuration and certificate validation, ensuring better understanding for future maintenance.
The LDAP connection cleanup is a pre-existing concern unrelated to the
in-memory SSL certificate change, so it is removed from this PR to keep
the scope focused on the TLS/cert rework.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ndling

- Updated comments in `client.py` and `datasource.py` to clarify the implications of SSL configuration and certificate validation.
- Enhanced error handling to ensure that missing or unusable CA certificates raise appropriate exceptions, preventing silent failures.
- Improved unit tests to verify that SSL misconfigurations are handled correctly, ensuring robust error reporting.

Co-authored-by: Cursor <cursoragent@cursor.com>
@Jan-Kazlouski-elastic Jan-Kazlouski-elastic marked this pull request as ready for review June 26, 2026 10:38
@Jan-Kazlouski-elastic Jan-Kazlouski-elastic requested a review from a team as a code owner June 26, 2026 10:38
BaseProtocol.HTTP_ADAPTER_CLS = (
RootCAAdapter if self.ssl_enabled else NoVerifyHTTPAdapter
)
# exchangelib applies HTTP_ADAPTER_CLS (and our CA context) process-wide;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would that be a problem if you run multiple Outlook syncs for different tenants in one process?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, unfortunately. While by moving it from the file system to memory - we're preventing it from conflicting with itself, 2 connectors that are configured with different certificates would still be conflicting with each other. Same as with the file, I must say.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe there is an easy fix for that.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, if it's not a regression we can go ahead with it.

It's also worth mentioning it in known issues in our docs, can you do that?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll make sure to update the known issues.

@Jan-Kazlouski-elastic Jan-Kazlouski-elastic enabled auto-merge (squash) June 29, 2026 12:29
@Jan-Kazlouski-elastic Jan-Kazlouski-elastic merged commit 5027d0c into main Jun 29, 2026
2 checks passed
@Jan-Kazlouski-elastic Jan-Kazlouski-elastic deleted the jan-kazlouski/outlook-in-memory-ssl-cert branch June 29, 2026 14:13
@github-actions

Copy link
Copy Markdown

💔 Failed to create backport PR(s)

Status Branch Result
9.3 #4114
8.19 Commit could not be cherrypicked due to conflicts
9.4 #4115

Successful backport PRs will be merged automatically after passing CI.

To backport manually run:
backport --pr 4094 --autoMerge --autoMergeMethod squash

Jan-Kazlouski-elastic added a commit that referenced this pull request Jun 29, 2026
…e the cert-file race (#4094) (#4114)

Backports the following commits to 9.3:
- fix(outlook): verify Exchange TLS with an in-memory CA to remove the
cert-file race (#4094)

Co-authored-by: Jan-Kazlouski-elastic <jan.kazlouski@elastic.co>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Jan-Kazlouski-elastic added a commit that referenced this pull request Jun 29, 2026
…e the cert-file race (#4094) (#4115)

Backports the following commits to 9.4:
- fix(outlook): verify Exchange TLS with an in-memory CA to remove the
cert-file race (#4094)

Co-authored-by: Jan-Kazlouski-elastic <jan.kazlouski@elastic.co>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Jan-Kazlouski-elastic added a commit that referenced this pull request Jun 30, 2026
…ve the cert-file race (#4094) (#4116)

Manual backport of #4094 to `8.19`.

The [auto-backport bot failed for
8.19](#4094) ("Commit could
not be cherrypicked due to conflicts") because the connector layout
differs between branches: `main` splits Outlook into a package
(`app/connectors_service/connectors/sources/outlook/{client,constants,datasource}.py`)
while `8.19` keeps it as a single module
(`connectors/sources/outlook.py`). The change has therefore been
hand-ported; it is **functionally identical** to the merged PR.

## Summary
- Load the configured CA directly into an `ssl.SSLContext` via
`ssl.create_default_context(cadata=...)` and inject it into `urllib3`
through a new `InMemoryCAAdapter`
(`init_poolmanager`/`proxy_manager_for`).
- Remove the on-disk certificate machinery (`ManageCertificate`,
`RootCAAdapter`, `SSLFailed`, `CERT_FILE` / `outlook_cert.cer`);
`ExchangeUsers.close()` no longer cleans up a file.
- SSL enabled with a missing/unusable CA now raises
`SSLCertificateError` instead of silently downgrading to an
unverified/system-CA connection.
- `validate_config` confirms the cert actually loads (via
`create_default_context`) so misconfig fails up front rather than
mid-sync.

This structurally eliminates the shared-file race behind the
intermittent `NO_CERTIFICATE_OR_CRL_FOUND` SSL error. TLS verification
behaviour is otherwise unchanged (`CERT_REQUIRED` + hostname check).

## Test plan
- [x] `tests/sources/test_outlook.py` — **48 passed** (includes the new
in-memory adapter, SSL-context, error-path, and no-cert-file-written
tests ported from the original PR).
- [x] `ruff check` + `ruff format --check` clean on changed files.

## Release Note
**Outlook Server connector**: fixed an intermittent
`NO_CERTIFICATE_OR_CRL_FOUND` SSL error that could abort syncs when
multiple syncs ran close together. The configured CA certificate is now
verified entirely in memory instead of through a shared temporary file,
removing the race condition. TLS verification behaviour is otherwise
unchanged.

Backport of #4094.

Made with [Cursor](https://cursor.com)

Co-authored-by: Cursor <cursoragent@cursor.com>
Jan-Kazlouski-elastic added a commit to elastic/elasticsearch that referenced this pull request Jul 1, 2026
Backport of the connectors known-issues entry for the intermittent
NO_CERTIFICATE_OR_CRL_FOUND SSL error on the Outlook connector,
caused by a shared on-disk certificate file race and fixed in
elastic/connectors#4094 (shipped in 8.19.18).
Jan-Kazlouski-elastic added a commit to elastic/elasticsearch that referenced this pull request Jul 1, 2026
Add a known-issues entry for the intermittent
NO_CERTIFICATE_OR_CRL_FOUND SSL error on the Outlook connector,
caused by a shared on-disk certificate file race and fixed in
elastic/connectors#4094 (8.19.18, 9.3.7, 9.4.3, 9.5.0).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants