Skip to content

Fix native linux device [main]#1709

Merged
badrishc merged 4 commits intomainfrom
badrishc/update-native-device
Apr 17, 2026
Merged

Fix native linux device [main]#1709
badrishc merged 4 commits intomainfrom
badrishc/update-native-device

Conversation

@badrishc
Copy link
Copy Markdown
Collaborator

@badrishc badrishc commented Apr 16, 2026

The t64 ABI transition renamed libaio.so.1 to libaio.so.1t64, breaking libnative_device.so which has a hard DT_NEEDED of libaio.so.1. Previously we worked around this with system-wide symlinks in every Dockerfile and CI workflow.

Fix this properly in the loader itself:

  • CMakeLists.txt now sets RPATH=$ORIGIN (via INSTALL_RPATH + BUILD_WITH_INSTALL_RPATH + --disable-new-dtags) so libnative_device.so searches its own directory for dependencies. This lets a managed-side compat symlink next to the native library satisfy the linker without any LD_LIBRARY_PATH contortions from the caller.

  • libaio_compat.h (new) pins the libaio entry points to the specific symbol versions that make libaio's userspace fast paths kick in:
    io_setup @LIBAIO_0.4
    io_destroy @LIBAIO_0.4
    io_getevents@LIBAIO_0.4 (userspace ring fast path)
    io_submit @LIBAIO_0.1
    Older libaio-dev marked LIBAIO_0.4 as the default version so a plain
    link picked these up automatically. On t64 (libaio1t64-dev) the default
    is gone and libaio.h has no .symver redirects for x86_64, so a fresh
    link produces UNVERSIONED references that at runtime resolve to the
    slower LIBAIO_0.1 io_getevents which always syscalls and blocks -
    which caused NativeStorageDevice probe/TryComplete paths to hang.

  • NativeStorageDevice.ImportResolver now resolves NativeLibraryPath to an absolute path (fixing a latent bug where the relative path bypassed .NET's runtimes/ probing) and, on Linux, catches DllNotFoundException referencing libaio.so.1, locates libaio.so.1t64 in standard multiarch paths, and drops a compat symlink next to libnative_device.so. The symlink creation tolerates the race where multiple processes start simultaneously and another process has already created a usable symlink. If repair still fails, the loader throws a descriptive DllNotFoundException explaining the t64 transition and offering three remediation options.

  • VectorManager.Initialize() and ResumePostRecovery() now early-return when IsEnabled is false. Vector Set preview is off by default; there is no reason these paths should touch storage when the feature is disabled.

With the loader + build fixes in place, remove the now-redundant workarounds:

  • Dockerfile and Dockerfile.ubuntu: drop the ln -sf libaio.so.1 line. (Dockerfile.alpine and Dockerfile.azurelinux ship libaio.so.1 natively. Dockerfile.chiseled uses a restricted runtime and was not touched.)

  • .github/workflows/ci.yml and nightly.yml: drop the ubuntu-latest libaio pre-step; the managed ImportResolver now handles repair automatically and the test suite actually exercises the repair path.

  • validate_docker_images.py: accept either libaio.so.1 or libaio.so.1t64, since the former is only materialized lazily (on first native device init) for glibc images now.

The bundled libnative_device.so has been rebuilt against the above sources with '-O3 -g -DNDEBUG' (project Release defaults). Verified via objdump -T that io_* references are correctly versioned.

Copilot AI review requested due to automatic review settings April 16, 2026 21:23
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses Linux native device breakage caused by the Debian/Ubuntu t64 transition renaming libaio.so.1libaio.so.1t64, by making the native-device loader self-heal at runtime and by updating build/test infrastructure to remove now-redundant symlink workarounds.

Changes:

  • Update NativeStorageDevice’s DllImport resolver to load the native library via an absolute path and (on Linux) auto-repair missing libaio.so.1 by creating a local compat symlink to libaio.so.1t64, with improved diagnostics.
  • Add a C++ libaio symbol-version pinning header and set RPATH=$ORIGIN on libnative_device.so to allow colocated dependency resolution.
  • Remove Docker/CI symlink workaround steps and relax Docker image validation to accept either libaio.so.1 or libaio.so.1t64.

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
test/docker-tests/validate_docker_images.py Accept either libaio SONAME in image validation.
libs/storage/Tsavorite/cs/src/core/Device/NativeStorageDevice.cs Absolute-path native load + libaio t64 auto-repair + diagnostics.
libs/storage/Tsavorite/cc/src/device/libaio_compat.h New header to force specific versioned libaio symbols at link time.
libs/storage/Tsavorite/cc/src/device/file_linux.h Include the new libaio compatibility header.
libs/storage/Tsavorite/cc/src/CMakeLists.txt Add $ORIGIN RPATH and disable new dtags (DT_RPATH behavior).
libs/server/Resp/Vector/VectorManager.cs No-op initialization/recovery when vector feature disabled.
Dockerfile Remove build-time global libaio symlink workaround.
Dockerfile.ubuntu Remove build-time global libaio symlink workaround.
.github/workflows/ci.yml Remove Ubuntu libaio workaround step.
.github/workflows/nightly.yml Remove Ubuntu libaio workaround step.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread libs/storage/Tsavorite/cs/src/core/Device/NativeStorageDevice.cs Outdated
Comment thread libs/storage/Tsavorite/cs/src/core/Device/NativeStorageDevice.cs Outdated
Comment thread test/docker-tests/validate_docker_images.py
Comment thread libs/storage/Tsavorite/cc/src/CMakeLists.txt
@badrishc badrishc force-pushed the badrishc/update-native-device branch from bffadf8 to 180bf91 Compare April 16, 2026 21:36
…nsition)

The t64 ABI transition renamed libaio.so.1 to libaio.so.1t64, breaking
libnative_device.so which has a hard DT_NEEDED of libaio.so.1. Fix the
problem in three places so both Docker and non-Docker users on t64 hosts
get a working native device without manual intervention.

1) libaio_compat.h (new) pins the libaio entry points to specific symbol
   versions at link time:
     io_setup    @LIBAIO_0.4
     io_destroy  @LIBAIO_0.4
     io_getevents@LIBAIO_0.4   (userspace ring fast path)
     io_submit   @LIBAIO_0.1
   Older libaio-dev marked LIBAIO_0.4 as the default version so a plain
   link picked these up automatically. On t64 (libaio1t64-dev) the default
   is gone and libaio.h has no .symver redirects for x86_64, so a fresh
   link produces UNVERSIONED references that at runtime resolve to the
   slower LIBAIO_0.1 io_getevents - which always syscalls and blocks -
   causing NativeStorageDevice probe/TryComplete paths to hang. With
   libaio_compat.h included first, any future rebuild on any distro
   reproduces the correct versioned bindings.

2) CMakeLists.txt sets RPATH=$ORIGIN (via INSTALL_RPATH +
   BUILD_WITH_INSTALL_RPATH + --disable-new-dtags) so libnative_device.so
   searches its own directory for dependencies. This enables the managed
   loader's fallback (below).

3) NativeStorageDevice.ImportResolver resolves NativeLibraryPath to an
   absolute path (fixing a latent bug where the relative path bypassed
   .NET's runtimes/ probing) and, on Linux, catches DllNotFoundException
   referencing libaio.so.1, locates libaio.so.1t64 in standard multiarch
   paths, and drops a compat symlink next to libnative_device.so. The
   symlink creation tolerates the race where multiple processes start
   simultaneously and another process has already created a usable
   symlink. If repair still fails, the loader throws a descriptive
   DllNotFoundException explaining the t64 transition and offering three
   remediation options. This path is primarily for non-Docker users
   (developers running dotnet GarnetServer on their own Debian 13 /
   Ubuntu 24.04 machines).

Also:

- VectorManager.Initialize() and ResumePostRecovery() now early-return
  when IsEnabled is false. Vector Set preview is off by default; there
  is no reason these paths should touch storage when the feature is
  disabled.

- Dockerfile and Dockerfile.ubuntu still install libaio1t64 and
  pre-create the libaio.so.1 -> libaio.so.1t64 symlink at build time
  for maximum robustness (works on read-only filesystems and under
  restrictive seccomp profiles that block symlink(2)). The managed
  loader fallback is belt-and-braces for non-Docker users.
  (Dockerfile.alpine and Dockerfile.azurelinux ship libaio.so.1
  natively. Dockerfile.chiseled uses a restricted runtime image and
  was not changed - it already stages libaio.so.1 from a build stage.)

- .github/workflows/ci.yml and nightly.yml drop the ubuntu-latest
  libaio pre-step; the managed ImportResolver now handles repair
  automatically on any host.

- validate_docker_images.py accepts either libaio.so.1 or
  libaio.so.1t64 when checking library presence.

The bundled libnative_device.so has been rebuilt against the above
sources with '-O3 -g -DNDEBUG' (project Release defaults). Verified via
objdump -T that io_* references are correctly versioned.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@badrishc badrishc force-pushed the badrishc/update-native-device branch from 180bf91 to 8af8d4d Compare April 16, 2026 21:41
badrishc and others added 2 commits April 16, 2026 14:55
- CMakeLists.txt: fix FNATIVE_DEVICE_HEADERS typo so file_linux.h and
  libaio_compat.h are actually associated with the native_device target
  (cosmetic, does not affect compiled binary).
- NativeStorageDevice: wrap Directory.GetCurrentDirectory() in a
  TryGetCurrentDirectory helper so a deleted/inaccessible CWD cannot
  block native library resolution when the library exists in the
  assembly or AppContext directory.
- NativeStorageDevice.BuildLibaioDiagnostic: expand architecture mapping
  (x64, Arm64, Arm) with a null fallback that emits a distro-agnostic
  fix instruction, and correct the remediation advice to suggest a
  valid DeviceType value ('RandomAccess') instead of the non-existent
  'Managed'.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@badrishc badrishc merged commit 3c19ed1 into main Apr 17, 2026
34 checks passed
@badrishc badrishc deleted the badrishc/update-native-device branch April 17, 2026 04:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants