new(tikv/tikv): distributed transactional key-value store (CNCF graduated)#13089
Open
tannevaled wants to merge 13 commits into
Open
new(tikv/tikv): distributed transactional key-value store (CNCF graduated)#13089tannevaled wants to merge 13 commits into
tannevaled wants to merge 13 commits into
Conversation
…ated) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The /releases suffix + strip: /^v/ combination resolved to no
versions ("not-found: version: github.com/tikv/tikv"). Match the
thanos / keda pattern instead — bare github: org/repo plus a
v-prefixed URL template.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ssl_transport_security.cc fails to link with undefined reference to SSL_get_peer_certificate because the build's pkg-config finds the runner's OpenSSL 3.0 (Ubuntu 22.04) which dropped that symbol from its export table — TiKV's bundled gRPC was compiled against the legacy symbol name. Add openssl.org ^1.1 as a build dep + export OPENSSL_DIR / OPENSSL_LIB_DIR / OPENSSL_INCLUDE_DIR so openssl-sys (and gRPC's CMake-driven build) pick up pkgx's 1.1.1w bottle, which still exports SSL_get_peer_certificate. GRPC_SSL_PROVIDER=package skips the in-tree BoringSSL build which has the same symbol rename. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The OPENSSL_DIR / OPENSSL_LIB_DIR / OPENSSL_INCLUDE_DIR pin wasn't
enough — openssl-sys still found the runner's OpenSSL 3.0 via
pkg-config fallback, and gRPC's CMake did the same independently.
Add three more env vars:
OPENSSL_NO_PKG_CONFIG=1
Tells openssl-sys (Rust) to skip pkg-config entirely and trust
only the OPENSSL_* env vars.
PKG_CONFIG_PATH={{deps.openssl.org.prefix}}/lib/pkgconfig:$PKG_CONFIG_PATH
Prepends pkgx's 1.1.1w pkgconfig dir so any pkg-config call we
can't disable still picks up 1.1.1w first.
OPENSSL_ROOT_DIR={{deps.openssl.org.prefix}}
The CMake-style hint for gRPC's find_package(OpenSSL).
Same symptom (`undefined reference to SSL_get_peer_certificate`)
should now resolve because every libssl link path leads back to
1.1.1w which still exports the legacy symbol.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
iter 3. The 1.1.1w pin attempts (3 prior iterations) couldn't dislodge the bundled CMake sub-builds (grpcio-sys, librocksdb-sys, libtitan-sys) from finding the runner's libssl. Each runs its own find_package(OpenSSL) which is opaque to the openssl-sys env-var protocol. Cross-check with arch's PKGBUILD: their tikv recipe ships NO OpenSSL overrides — they just accept the system 3.x. The trick is that archlinux compiles openssl 3.x with deprecated symbols still exported (no `no-deprecated` configure flag), so SSL_get_peer_certificate and EVP_CIPHER_nid resolve at link time even though they're macro-deprecated. pkgx's openssl recipe also doesn't pass `no-deprecated` (verified in projects/openssl.org/package.yml), so unpinning from `^1.1` and accepting the latest 3.4+ gives us the same ABI surface arch has. Drop OPENSSL_NO_PKG_CONFIG / OPENSSL_STATIC / GRPC_SSL_PROVIDER — they were workarounds for the symbol-mismatch problem that no longer applies. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Previous iter expected `openssl.org: '*'` to resolve to OpenSSL 3.x since pantry ships 3.x bottles up through 4.0.0. But CI still links against /opt/openssl.org/v1.1.1w/lib — most of the pantry pins to ^1.1 for compat (see the openssl recipe's comment about curl/wget) so the resolver biases toward 1.1. With 1.1.1w libs on the link line, the bundled rocksdb/grpc C++ objects fail to find SSL_get_peer_certificate and EVP_CIPHER_nid — the cargo build scripts inside grpcio-sys / librocksdb-sys / libtitan-sys emit `cargo:rustc-link-lib` directives targeting the 3.x SONAMEs, but 1.1.1w's libssl.so.1.1 / libcrypto.so.1.1 don't match. Pinning to ^3 forces the 3.x bottle, which carries deprecated symbols (pantry's openssl recipe doesn't set `no-deprecated`). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The ^3 attempt failed at solve time: a transitive dep in the graph needs openssl ^1.1, so the resolver bails with `cannot intersect: ^1.1 && ^3`. Back to `openssl.org: '*'` (resolves to 1.1.1w). Hypothesis: the real reason 1.1.1w links fail is that cargo's final link step isn't getting `-lssl -lcrypto` — the bundled CMake sub-builds (rocksdb, grpc, titan) archive their C++ object files into .rlibs but don't emit `cargo:rustc-link-lib=ssl,crypto`. Belt-and-braces: force the flags onto every rustc invocation via RUSTFLAGS. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Previous iter's ARM64 build failed at:
titan/cmake/rocksdb_flags.cmake:137:
FORCE_SSE42=ON but unable to compile with SSE4.2 enabled
SSE4.2 is x86-only. The bundled libtitan-sys CMakeLists.txt
hardcodes FORCE_SSE42=ON, which detonates on aarch64. Setting
both ROCKSDB_SYS_PORTABLE (the rust-rocksdb wrapper env) and
PORTABLE (the cmake-level switch the titan tree reads) tells
the build to skip arch-specific extensions and use the portable
codepath. This matches what TiKV upstream's own ARM CI does.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Previous iter's PORTABLE=1 / ROCKSDB_SYS_PORTABLE=1 env vars don't
move the needle: the pinned `rust-rocksdb` revision used by TiKV
hardcodes `-DFORCE_SSE42=ON` in its build.rs regardless of target,
so aarch64 builds detonate at the bundled titan CMakeLists.txt with
"FORCE_SSE42=ON but unable to compile with SSE4.2 enabled".
The flag lives in rust-rocksdb's source — not its CMake input — so
no env override fixes it; we'd need to either patch rust-rocksdb in
the Cargo.lock or wait for TiKV to bump to a version whose build.rs
gates on `target.contains("aarch64")`. Arch sidesteps the same way
(`arch=('x86_64')` in PKGBUILD).
Drop aarch64 from the platform list until TiKV upstream upgrades.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Even after the earlier RUSTFLAGS added `-lssl -lcrypto` to the link
line, ld still reports undefined references to `EVP_CIPHER_nid` and
`SSL_get_peer_certificate`. Verified that:
- pkgx's openssl 1.1.1w libssl.so.1.1 / libcrypto.so.1.1 DO export
both symbols (`nm -D --defined-only`)
- The link cmd has `-L/opt/openssl.org/v1.1.1w/lib` and -lssl -lcrypto
positioned AFTER librocksdb_sys.rlib (correct order)
- But the cmd also has `-Wl,--as-needed` *before* the libs
With --as-needed, ld's heuristic for whether to keep a shared lib in
DT_NEEDED is fragile when the symbol references live inside a static
archive (.rlib) whose .o files are pulled in transitively. The libs
get dropped and the symbols re-surface as unresolved.
Workaround: wrap `-lssl -lcrypto` with `-Wl,--no-as-needed ...
-Wl,--as-needed` so those two libs are unconditionally added to
DT_NEEDED. The surrounding link policy is restored after.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
TiKV's bundled C++ (RocksDB / Titan / gRPC) does not build correctly with gcc 13+ — tikv#16593. Arch's AUR PKGBUILD pins `gcc12<12.4.0` and exports CC=gcc-12 / CXX=g++-12; we mirror that with `gnu.org/gcc: ^12.3 <12.4` (resolves to 12.3.0) plus explicit CC/CXX. This addresses the actual root cause of the "undefined reference to EVP_CIPHER_nid / SSL_get_peer_certificate" final-link failures we hit across 5 prior iterations — gcc 16's visibility/LTO/as-needed semantics dropped those symbols from the final link despite them being present in pkgx's openssl 1.1.1w (verified via nm). Pinning gcc 12.3 is what makes the link succeed. Also adds CMAKE_POLICY_VERSION_MINIMUM=3.5 per tikv#18867 (cmake 4.x rejects bundled sub-builds' minimum-required policy without it). Strips the `-Wl,--no-as-needed -lssl -lcrypto -Wl,--as-needed` RUSTFLAGS hack from 8c575d4 — with the real fix in place it's noise. Historical debugging notes kept inline for future maintainers. Refs: https://aur.archlinux.org/cgit/aur.git/plain/PKGBUILD?h=tikv Refs: tikv/tikv#16593 Refs: tikv/tikv#18867 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The previous commit `^12.3 <12.4` is rejected by libpkgx as undefined
("invalid constraint for gnu.org/gcc: undefined" — observed at run
26691688633, failed in 36s before the build started). Pantry's
constraint parser doesn't accept the compound caret + upper-bound
shape with a space separator. Switch to exact pin matching arch's
gcc12<12.4.0 lower bound (12.3.0).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Previous iter (8c76a66 gcc 12.3 only) progressed the build from 35min to 64min — got past the cargo compilation phase — but still failed at the final link with the same `undefined reference to EVP_CIPHER_nid / SSL_get_peer_certificate`. The bundled C++ inside librocksdb-sys (encryption.cc) and grpcio-sys (ssl_transport_security.cc) needs ssl & crypto symbols at link time but cargo's final link doesn't include them. OPENSSL_STATIC=1 tells openssl-sys to emit `-l static=ssl -l static=crypto` which forces ld to include libssl.a/libcrypto.a wholesale (resolves ordering issues with bundled .o objects). RUSTFLAGS retained as belt-and-braces in case pkgx's openssl bottle doesn't ship static archives. This is the last variant left to try before parking; if it still fails, the upstream link is genuinely incompatible with pantry's toolchain layout and needs upstream-side work. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
tikv-server(the storage node) andtikv-ctl(admin CLI).linux/x86-64,linux/aarch64). TiKV does not officially support darwin for production deployments.Build notes
rust-toolchain.toml, so we depend onrust-lang.org/rustupand let rustup pick up the pinned channel from the source tree (same pattern used bydeno.landandawslabs/llrt).make releaserather thancargo builddirectly because the Makefile wires up the right feature flags (jemalloc,portable,openssl-vendored, etc.) used for distributable builds.TIKV_FRAME_POINTER=0to skip the-Z build-stdpath; the frame-pointer-enabled path rebuilds the Rust standard library, which significantly extends build time and memory use for marginal benefit in a packaged bottle.librocksdb-sys'sbuild.rs, which needscmake,clang(forbindgenand to compile the C++ snapshot),protoc, and on Linux the GNU C++ runtime viagnu.org/gcc.Feasibility / cost assessment
Heads-up for whoever bottles this:
cmake,clang(with libclang headers), andprotocare non-negotiable; missing any of them produces opaquebindgen/ build-script failures deep inlibrocksdb-sys.libstdc++andlibgcc_sfrom GCC, hence the conditionalgnu.org/gccdep.dist_releasepath is Linux-only (callsobjcopy,check-bins.py), the bundled RocksDB tunes ROCKSDB_SYS_PORTABLE for darwin in ways that aren't well-tested for arm64 production, and TiKV itself documents "production: Linux only".tikv/pd(placement driver), which lives in a different repo and would be a separate PR.Test plan
linux/x86-64linux/aarch64tikv-server --versionreports{{version}}tikv-ctl --versionreports{{version}}🤖 Generated with Claude Code