Skip to content

[SAP] Fix race condition in NetApp REST client session handling#335

Open
hemna wants to merge 2 commits into
stable/2025.1-m3from
fix/rest-client-session-race-2025.1
Open

[SAP] Fix race condition in NetApp REST client session handling#335
hemna wants to merge 2 commits into
stable/2025.1-m3from
fix/rest-client-session-race-2025.1

Conversation

@hemna
Copy link
Copy Markdown

@hemna hemna commented May 12, 2026

Summary

  • Cherry-pick of ec2db68 from stable/2023.1-m3
  • Fixes race condition where concurrent greenthreads overwrite each other's session headers (X-Dot-SVM-Name tunneling) by making _build_session() return a local session object instead of storing on self._session
  • Includes adaptation for certificate auth support present in 2025.1

Create the clone on the template DS and later moves to target ds via svmotion post clone

Change-Id: Icc4dda70f98498723c622913dfc383fb27b25da6
The RestNaServer.send_http_request() method was rebuilding self._session
on every call via _build_session() without any locking or thread-local
isolation. In an eventlet environment with concurrent greenthreads, this
caused a race condition where one greenthread's session headers (including
the critical X-Dot-SVM-Name vserver tunneling header) could be silently
overwritten by another greenthread's _build_session() call before the
HTTP request was actually sent.

This manifested in production as the get_operational_lif_addresses() REST
call intermittently returning LIFs for the wrong vserver (or an empty
set), because the tunneling header was lost due to the race. The driver
then logged 'Address not found for NFS share' for all configured shares,
reported zero pools to the scheduler, and the backend became invisible
to the volume controller's pool cache — permanently, since the cache has
no TTL or retry logic.

The fix eliminates the shared mutable state by making _build_session()
return a new local Session object instead of storing it on self._session.
Each concurrent REST call now gets its own isolated session with the
correct headers, preventing any cross-greenthread contamination.

This is the proper fix for the class of issues previously worked around
in commit deebedf ('use zapi in _get_flexvol_to_pool_map'), which
forced get_flexvol calls through the ZAPI fallback path to avoid this
same REST client race condition.

Change-Id: Ida5dbde04e4976b41d88fcd82c0573df7721cb0a
@hemna hemna force-pushed the fix/rest-client-session-race-2025.1 branch from 9105951 to b541985 Compare May 12, 2026 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants