Skip to content

perf: add B-Tree index for equality search#611

Closed
matheusvir wants to merge 3 commits intomsiemens:masterfrom
matheusvir:optimization/btree-document-index
Closed

perf: add B-Tree index for equality search#611
matheusvir wants to merge 3 commits intomsiemens:masterfrom
matheusvir:optimization/btree-document-index

Conversation

@matheusvir
Copy link

What was done

This PR introduces an optional in-memory B-Tree index to improve document lookup performance in TinyDB.

By default, TinyDB performs searches using a full linear scan over in-memory documents, resulting in O(n) lookup complexity.
With this change, indexed fields use a B-Tree structure, reducing key lookup complexity to O(log n).

The index is created on demand using create_index(field_name) and does not modify TinyDB's default behavior unless explicitly enabled.

Implementation details

  • Added a new module index.py implementing a B-Tree index.
  • Modified Table.__init__() to maintain an internal index registry.
  • Updated:
    • search() to use the index when available.
    • insert() to update indexes on insertion.
    • update() to maintain index consistency.
    • remove() to remove entries from indexes.
  • No external dependencies were introduced.

Complexity impact

  • Default search: O(n)
  • Indexed search: O(log n) for key lookup + O(k) to retrieve matching documents

Where:

  • n is the number of indexed documents
  • k is the number of matching results

Performance observations

The implementation does not add new test files. The existing suite (204 passed, 1 skipped) exercises all affected code paths — search(), insert(), update(), and remove() — through the parametrized fixtures in tests/test_operations.py and tests/test_tables.py. All tests pass with the index enabled and with the default behavior (no index) unchanged.


Performance

All benchmarks were executed inside Docker containers to isolate the runtime environment and eliminate host-specific variance from CPU scheduling, OS caching, and library versions.

Methodology

  • 50 total runs per dataset scale; first 10 (warmup) and last 10 (cooldown) discarded; 30 effective runs measured.
  • Workload: 100 equality searches per run, split evenly between existing and non-existing values, to avoid overfitting to either case.
  • Storage: JSONStorage, reflecting real TinyDB usage.
  • Dataset scales: 1,000 / 10,000 / 50,000 documents.
  • Timing: time.perf_counter_ns() with GC disabled during measurement.

Results

Scale Baseline mean (ms) Optimized mean (ms) Improvement
1k docs 126.36 ± 24.99 38.41 ± 1.62 69.60%
10k docs 1,453.87 ± 181.34 437.95 ± 8.24 69.88%
50k docs 5,746.27 ± 331.01 2,077.95 ± 120.19 63.84%

B-Tree benchmark comparison

Analysis

The gain is consistent across all scales, ranging from ~64% to ~70%. The baseline cost grows nearly linearly with document count, while the indexed path scales significantly better. The reduction in standard deviation in the optimized runs also indicates more predictable latency.

The write-time overhead from index maintenance is an expected and acceptable trade-off for workloads that are read-heavy.

Reproducing the benchmark

The full benchmark infrastructure is available in the research repository at matheusvir/eda-oss-performance.

Relevant files:

To run:

# From the root of eda-oss-performance
bash experiments/tinydb/run_btree.sh

This builds the Docker image from setup/tinydb/Dockerfile, mounts the repository, and runs the benchmark script inside the container. Results are written to results/tinydb/result_tinydb_btree.json.


Feedback on the index implementation, API design, and edge cases is welcome.


Relates to #480 and #544.

ManoelNetto26 and others added 3 commits March 11, 2026 22:27
Co-authored-by: Matheus Virgolino <matheus.virgolino.abilio.da.silva@ccc.ufcg.edu.br>
Co-authored-by: Manoel Netto <manoel.da.nobrega.eustaqueo.netto@ccc.ufcg.edu.br>
Co-authored-by: Pedro <pedroalmeida1896@gmail.com>
Co-authored-by: Lucaslg7 <lucasmoizinholg7@gmail.com>
Co-authored-by: RailtonDantas <railtondantas.code@gmail.com>
Co-authored-by: João Pereira <joao.pereira.de.oliveira@ccc.ufcg.edu.br>
…tion

Co-authored-by: Matheus Virgolino <matheus.virgolino.abilio.da.silva@ccc.ufcg.edu.br>
Co-authored-by: Manoel Netto <manoel.da.nobrega.eustaqueo.netto@ccc.ufcg.edu.br>
Co-authored-by: Pedro <pedroalmeida1896@gmail.com>
Co-authored-by: Lucaslg7 <lucasmoizinholg7@gmail.com>
Co-authored-by: RailtonDantas <railtondantas.code@gmail.com>
Co-authored-by: João Pereira <joao.pereira.de.oliveira@ccc.ufcg.edu.br>
Co-authored-by: Matheus Virgolino <matheus.virgolino.abilio.da.silva@ccc.ufcg.edu.br>
Co-authored-by: Manoel Netto <manoel.da.nobrega.eustaqueo.netto@ccc.ufcg.edu.br>
Co-authored-by: Pedro <pedroalmeida1896@gmail.com>
Co-authored-by: Lucaslg7 <lucasmoizinholg7@gmail.com>
Co-authored-by: RailtonDantas <railtondantas.code@gmail.com>
Co-authored-by: João Pereira <joao.pereira.de.oliveira@ccc.ufcg.edu.br>
@matheusvir matheusvir force-pushed the optimization/btree-document-index branch from a188ae2 to 0d60d3b Compare March 12, 2026 01:38
@matheusvir matheusvir changed the title perf(tinydb): add B-Tree index for equality search perf: add B-Tree index for equality search Mar 12, 2026
@msiemens
Copy link
Owner

Thanks for the PR. The benchmarks and write-up are thoughtful, and I appreciate the work here.

However, I think this is out of scope for TinyDB core. Even as an optional feature, adding indexing introduces a fair bit of complexity to the table implementation and raises the maintenance burden around inserts, updates, removes, and query behavior. The performance gains are clear, but this feels more like a step toward a more feature-heavy database engine than something that belongs in the core. My rule of thumb is that at a size where TinyDB's performance becomes an issue, one will gain more performance by using e.g. SQLite (possibly with their JSON extensions) than even the most optimized Python code could offer.

That being said, I’d be more comfortable seeing this explored as an extension instead. If you publish this as an extension, I'd be glad to link to it from the docs extension list.

@msiemens msiemens closed this Mar 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants