Skip to content

Add Python (pybind11) bindings + hello_search.py example#9

Merged
evilmucedin merged 1 commit into
evilmucedin:mainfrom
DenisRaskovalov:feat/python-bindings
May 28, 2026
Merged

Add Python (pybind11) bindings + hello_search.py example#9
evilmucedin merged 1 commit into
evilmucedin:mainfrom
DenisRaskovalov:feat/python-bindings

Conversation

@DenisRaskovalov

Copy link
Copy Markdown
Contributor

Summary

Optional CPython extension module that exposes the SearchPlusPlus core
to Python via pybind11, plus a Python example that mirrors
examples/hello_search/main.cpp line-for-line.

  • Binding (python/src/searchplusplus.cpp) covers the v0.1 surface — Schema, Document, IndexWriter, IndexReader, Searcher, SearchResult, Hit. Python-side methods are snake_case. Status / Expected<T> failures raise ValueError / KeyError / IndexError / RuntimeError based on the underlying StatusCode, so callers get normal raise semantics instead of having to check .ok().

  • Example (examples/python/hello_search.py) reproduces the C++ example's output exactly:

    inverted index           total=1
        b   score=1.9253
    bm25                     total=1
        c   score=1.3863
    title:tokenizer          total=1
        e   score=1.3863
    
  • Off by defaultSPP_BUILD_PYTHON=OFF. The regular build stays self-contained; no Python dev-header dependency unless you opt in.

  • pybind11 added under a vcpkg python feature so vcpkg install only pulls it in when the bindings are actually requested. Justified in DESIGN.md against principle 2: header-only, ~200 KB, single-purpose, alternative is hand-rolling every method against the raw CPython C API.

Build

# macOS:  brew install pybind11
# Ubuntu: apt-get install python3-dev pybind11-dev
cmake --preset release -DSPP_BUILD_PYTHON=ON
cmake --build --preset release --target searchplusplus -j
PYTHONPATH=build/release/python python3 examples/python/hello_search.py

The README documents the one gotcha: don't build the bindings under the default (ASan) preset — ASan can't initialize when the .so is dlopen'd into a non-instrumented Python interpreter.

Test plan

  • Local: extension module builds clean under the release preset (macOS, Apple Clang 21.0.0)
  • Local: hello_search.py runs end-to-end and produces output byte-identical to the C++ example
  • CI lanes stay green (bindings are off in CI; the only files exercised by CI here are the CMake option plumbing and the docs)

🤖 Generated with Claude Code

A new optional `searchplusplus` CPython extension module that exposes
Schema, Document, IndexWriter, IndexReader, and Searcher to Python.
Methods are snake_case; Status / Expected<T> failures raise normal
Python exceptions (ValueError / KeyError / IndexError / RuntimeError
depending on the StatusCode).

- python/CMakeLists.txt + python/src/searchplusplus.cpp: the binding
  itself. Off by default — gated by SPP_BUILD_PYTHON=OFF so the
  regular C++ build doesn't grow a Python dev-header requirement.
- examples/python/hello_search.py: a line-for-line mirror of
  examples/hello_search/main.cpp. Same corpus, same queries, same
  printed output (verified locally on macOS):

    inverted index           total=1
        b   score=1.9253
    bm25                     total=1
        c   score=1.3863
    title:tokenizer          total=1
        e   score=1.3863

- vcpkg.json: pybind11 lives under a `python` feature so vcpkg only
  pulls it in when bindings are actually requested.
- DESIGN.md: pybind11 added to the build-time / dev-only dependency
  list with the same justification depth as the existing entries
  (header-only, ~200 KB, single-purpose, alternative is hand-rolling
  every method against the CPython C API — clears the principle-2
  dependency bar).
- python/README.md + examples/python/README.md: build instructions
  and a one-time gotcha (the Python module must be built without
  AddressSanitizer — the default preset wires ASan into spp_core,
  and ASan can't be loaded into a non-instrumented Python via
  dlopen).

The binding covers only the v0.1 surface (schema, writer, searcher);
v0.2 LTR ranker registration and per-token-weight features are not
wired up here — straightforward extension when there's a caller.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@evilmucedin evilmucedin merged commit acc4a0e into evilmucedin:main May 28, 2026
6 of 7 checks passed
DenisRaskovalov added a commit to DenisRaskovalov/SearchPlusPlus that referenced this pull request May 28, 2026
evilmucedin#9 merged before the clang-format follow-up commit on that PR landed,
so main still carries the un-formatted python/src/searchplusplus.cpp.
That makes the clang-format lane red on every subsequent PR, including
this README-only one, even though no PR after evilmucedin#9 has touched the file.

Apply the same in-tree .clang-format edits the follow-up commit on evilmucedin#9
made (no behavior change): IncludeBlocks: Regroup moves stdlib above
third-party, and a few one-line lambdas / class_ instantiations fit on
a single line under the 100-column limit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants