perf: add B-Tree index for equality search by matheusvir · Pull Request #611 · msiemens/tinydb

matheusvir · 2026-03-12T01:31:24Z

What was done

This PR introduces an optional in-memory B-Tree index to improve document lookup performance in TinyDB.

By default, TinyDB performs searches using a full linear scan over in-memory documents, resulting in O(n) lookup complexity.
With this change, indexed fields use a B-Tree structure, reducing key lookup complexity to O(log n).

The index is created on demand using create_index(field_name) and does not modify TinyDB's default behavior unless explicitly enabled.

Implementation details

Added a new module index.py implementing a B-Tree index.
Modified Table.__init__() to maintain an internal index registry.
Updated:
- search() to use the index when available.
- insert() to update indexes on insertion.
- update() to maintain index consistency.
- remove() to remove entries from indexes.
No external dependencies were introduced.

Complexity impact

Default search: O(n)
Indexed search: O(log n) for key lookup + O(k) to retrieve matching documents

Where:

n is the number of indexed documents
k is the number of matching results

Performance observations

The implementation does not add new test files. The existing suite (204 passed, 1 skipped) exercises all affected code paths — search(), insert(), update(), and remove() — through the parametrized fixtures in tests/test_operations.py and tests/test_tables.py. All tests pass with the index enabled and with the default behavior (no index) unchanged.

Performance

All benchmarks were executed inside Docker containers to isolate the runtime environment and eliminate host-specific variance from CPU scheduling, OS caching, and library versions.

Methodology

50 total runs per dataset scale; first 10 (warmup) and last 10 (cooldown) discarded; 30 effective runs measured.
Workload: 100 equality searches per run, split evenly between existing and non-existing values, to avoid overfitting to either case.
Storage: JSONStorage, reflecting real TinyDB usage.
Dataset scales: 1,000 / 10,000 / 50,000 documents.
Timing: time.perf_counter_ns() with GC disabled during measurement.

Results

Scale	Baseline mean (ms)	Optimized mean (ms)	Improvement
1k docs	126.36 ± 24.99	38.41 ± 1.62	69.60%
10k docs	1,453.87 ± 181.34	437.95 ± 8.24	69.88%
50k docs	5,746.27 ± 331.01	2,077.95 ± 120.19	63.84%

Analysis

The gain is consistent across all scales, ranging from ~64% to ~70%. The baseline cost grows nearly linearly with document count, while the indexed path scales significantly better. The reduction in standard deviation in the optimized runs also indicates more predictable latency.

The write-time overhead from index maintenance is an expected and acceptable trade-off for workloads that are read-heavy.

Reproducing the benchmark

The full benchmark infrastructure is available in the research repository at matheusvir/eda-oss-performance.

Relevant files:

Dockerfile: setup/tinydb/Dockerfile
Benchmark script: experiments/tinydb/benchmark_btree.py
Runner script: experiments/tinydb/run_btree.sh

To run:

# From the root of eda-oss-performance
bash experiments/tinydb/run_btree.sh

This builds the Docker image from setup/tinydb/Dockerfile, mounts the repository, and runs the benchmark script inside the container. Results are written to results/tinydb/result_tinydb_btree.json.

Feedback on the index implementation, API design, and edge cases is welcome.

Relates to #480 and #544.

Co-authored-by: Matheus Virgolino <matheus.virgolino.abilio.da.silva@ccc.ufcg.edu.br> Co-authored-by: Manoel Netto <manoel.da.nobrega.eustaqueo.netto@ccc.ufcg.edu.br> Co-authored-by: Pedro <pedroalmeida1896@gmail.com> Co-authored-by: Lucaslg7 <lucasmoizinholg7@gmail.com> Co-authored-by: RailtonDantas <railtondantas.code@gmail.com> Co-authored-by: João Pereira <joao.pereira.de.oliveira@ccc.ufcg.edu.br>

…tion Co-authored-by: Matheus Virgolino <matheus.virgolino.abilio.da.silva@ccc.ufcg.edu.br> Co-authored-by: Manoel Netto <manoel.da.nobrega.eustaqueo.netto@ccc.ufcg.edu.br> Co-authored-by: Pedro <pedroalmeida1896@gmail.com> Co-authored-by: Lucaslg7 <lucasmoizinholg7@gmail.com> Co-authored-by: RailtonDantas <railtondantas.code@gmail.com> Co-authored-by: João Pereira <joao.pereira.de.oliveira@ccc.ufcg.edu.br>

Co-authored-by: Matheus Virgolino <matheus.virgolino.abilio.da.silva@ccc.ufcg.edu.br> Co-authored-by: Manoel Netto <manoel.da.nobrega.eustaqueo.netto@ccc.ufcg.edu.br> Co-authored-by: Pedro <pedroalmeida1896@gmail.com> Co-authored-by: Lucaslg7 <lucasmoizinholg7@gmail.com> Co-authored-by: RailtonDantas <railtondantas.code@gmail.com> Co-authored-by: João Pereira <joao.pereira.de.oliveira@ccc.ufcg.edu.br>

msiemens · 2026-03-12T10:49:58Z

Thanks for the PR. The benchmarks and write-up are thoughtful, and I appreciate the work here.

However, I think this is out of scope for TinyDB core. Even as an optional feature, adding indexing introduces a fair bit of complexity to the table implementation and raises the maintenance burden around inserts, updates, removes, and query behavior. The performance gains are clear, but this feels more like a step toward a more feature-heavy database engine than something that belongs in the core. My rule of thumb is that at a size where TinyDB's performance becomes an issue, one will gain more performance by using e.g. SQLite (possibly with their JSON extensions) than even the most optimized Python code could offer.

That being said, I’d be more comfortable seeing this explored as an extension instead. If you publish this as an extension, I'd be glad to link to it from the docs extension list.

ManoelNetto26 and others added 3 commits March 11, 2026 22:27

matheusvir force-pushed the optimization/btree-document-index branch from a188ae2 to 0d60d3b Compare March 12, 2026 01:38

matheusvir changed the title ~~perf(tinydb): add B-Tree index for equality search~~ perf: add B-Tree index for equality search Mar 12, 2026

msiemens closed this Mar 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: add B-Tree index for equality search#611

perf: add B-Tree index for equality search#611
matheusvir wants to merge 3 commits intomsiemens:masterfrom
matheusvir:optimization/btree-document-index

matheusvir commented Mar 12, 2026

Uh oh!

msiemens commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

matheusvir commented Mar 12, 2026

What was done

Implementation details

Complexity impact

Performance observations

Performance

Methodology

Results

Analysis

Reproducing the benchmark

Uh oh!

msiemens commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants