Minor performance improvements by oschwald · Pull Request #112 · oschwald/maxminddb-rust

oschwald · 2026-04-25T14:09:28Z

Optimize search tree traversal
Optimize direct value deserialization
Optimize lookup address dispatch

Summary by CodeRabbit

Refactor
- Faster, more consistent decoding and lookup paths with improved type validation and clearer mismatch errors; tuple/tuple-struct length mismatches now error.
Documentation
- Expanded usage examples (path macro), iteration guidance, and feature-enable snippet.
Examples
- Example prints matched network earlier and demonstrates decoding a country code.
Benchmarks
- Simplified benchmark reader setup.
Tests
- New tests for decoding, pointer/error handling, and iteration stability.

coderabbitai · 2026-04-25T14:09:40Z

Warning

Rate limit exceeded

@oschwald has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 50 minutes and 57 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 50 minutes and 57 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4f290f9d-f50e-4dc3-8d01-7db571354964

📥 Commits

Reviewing files that changed from the base of the PR and between ca19569 and 4b9368a.

📒 Files selected for processing (13)

CHANGELOG.md
README.md
benches/common.rs
benches/lookup.rs
benches/serde_usage.rs
dev-bin/release.sh
examples/lookup.rs
src/decoder.rs
src/lib.rs
src/reader.rs
src/reader_test.rs
src/result.rs
src/within.rs

Walkthrough

Introduces a direct decoding path with strict MaxMind type checking and pointer-following in the decoder; implements explicit serde deserializer methods for scalars, sequences, and maps; refactors Reader lookup into IPv4/IPv6-specific traversals with record-size-specialized search-tree readers and caches node metrics.

Changes

Cohort / File(s)	Summary
Decoder: explicit deserialization `src/decoder.rs`	Adds `decode_direct` and type-mismatch errors, follows `TYPE_POINTER` indirections with nested-depth tracking and pointer→pointer rejection, implements explicit `deserialize_*` methods for scalars (bool, ints, floats), strings/bytes, sequences/tuples, and maps/structs; replaces many `deserialize_any` forwards and enforces fixed-size tuple/tuple-struct lengths.
Reader: lookup & search tree `src/reader.rs`	Splits `lookup` by `IpAddr::V4/V6`, introduces `SearchTreeRecord` trait with `RecordSize24/28/32` traversal implementations, centralizes pointer→`LookupResult` mapping, adds checked pointer arithmetic and clearer invalid-pointer errors, and caches `record_size`, `node_count`, and `node_byte_size` on construction.
LookupResult / API usage `src/result.rs`, `src/lib.rs`	Centralizes decoder construction via a private `decoder(offset)` helper used by `LookupResult::decode`/`decode_path`; adds `signed_index_to_path_element` for `From<i32>/From<isize>`; updates docs/tests to use `path!` and new examples.
Within iterator & IpInt helper `src/within.rs`	Adds `IpInt::set_bit` and refactors `Within` traversal to use it when expanding the right child.
Tests & examples `src/reader_test.rs`, `examples/lookup.rs`	Consolidates test config/helpers, adds tests for mutated record_size and invalid pointer handling, and updates example to print `network()` and demonstrate `decode_path(path![...])`.
Benchmarks: shared reader helper `benches/common.rs`, `benches/lookup.rs`, `benches/serde_usage.rs`	Adds `common::open_reader` to select `mmap` vs readfile at compile time; benchmark files updated to use it and remove inline feature-gated reader setup.
Docs/README & changelog `README.md`, `CHANGELOG.md`	README examples expanded (Cargo feature snippet, `path!`, `network()`, `decode_path`, iterating networks, bench commands); Changelog documents new fast deserialization paths, record-size-specialized traversal, tuple-length enforcement, and pointer-error behavior.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant Reader
  participant LookupResult
  participant Decoder
  participant Visitor

  Client->>Reader: lookup(ip)
  Reader->>Reader: find_address_in_tree_v4/v6 (RecordSize*)
  Reader-->>LookupResult: return LookupResult(offset/pointer)
  Client->>LookupResult: decode::<T>()
  LookupResult->>Decoder: decoder(offset)
  Decoder->>Decoder: decode_direct (follow pointers, check type_num)
  Decoder->>Visitor: visit_seq / visit_map / visit_* (drive serde Visitor)
  Visitor-->>Client: deserialized value or error

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Minor code simplifications #107 — Overlapping changes in src/decoder.rs around integer decoding and pointer-following/error handling.
Performance improvements #108 — Related structural edits touching decoder pointer/type logic and reader search-tree traversal.
Performance improvements #105 — Touches decoder string/identifier deserialization and result path handling that overlap src/decoder.rs and src/result.rs.

Poem

🐇 I hopped through bytes and pointer bends,

Chased types, followed pointers, checked the ends,
Tuples strict and scalars neat,
Cached nodes help my little feet,
A rabbit cheers — decode complete!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the primary objective of the changeset, which focuses on performance improvements across search tree traversal, serde deserialization, and address lookup dispatch.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch greg/perf-improvement

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/reader.rs`:
- Around line 677-773: The SearchTreeRecord trait resolves record-size branching
for per-bit reads; to unify and avoid duplication, change Reader::read_node and
find_ipv4_start to use the same generic API (SearchTreeRecord::read_node)
instead of matching on record_size at runtime: create generic variants (e.g.,
Reader::read_node<R: SearchTreeRecord> and find_ipv4_start<R: SearchTreeRecord>)
or have the existing Reader::read_node delegate to a R::read_node impl chosen
where the reader is constructed, and update the within traversal to call that
shared helper so all bit-extraction logic lives in the SearchTreeRecord
implementations (preserving normalize_lookup_result usage).
- Around line 44-45: Add a brief doc comment above the Reader struct fields
node_count and node_byte_size stating that these cached fields are the canonical
source of truth for hot-path lookups and should be used instead of reading
metadata.node_count or metadata.record_size because Metadata's public mutable
fields may be changed after Reader construction; reference the Reader type and
the Metadata type in the comment so future maintainers know why methods use
self.node_count/self.node_byte_size rather than self.metadata.*.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: ec271dbd-1a55-42c8-ae24-ac6ed438574d

📥 Commits

Reviewing files that changed from the base of the PR and between 15b48e9 and fd8857d.

📒 Files selected for processing (2)

src/decoder.rs
src/reader.rs

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/reader.rs (1)
494-505: ⚠️ Potential issue | 🟡 Minor

resolve_data_pointer underflows on a malformed pointer below node_count + 16.

pointer - self.node_count - DATA_SECTION_SEPARATOR_SIZE is unchecked subtraction. A corrupt search tree could yield a pointer in node_count..node_count+16 (or, in pathological cases, even below node_count); in debug builds this panics, in release it wraps to a huge usize, which the subsequent >= bounds check then catches and converts into a sane InvalidDatabase error.

Pre-existing behavior, not introduced here, but since this is now centralized through lookup_result (and only reached when pointer != 0, i.e. pointer > node_count), it's worth a checked_sub to make the failure mode identical in debug and release:
🛡️ Suggested fix
     pub(crate) fn resolve_data_pointer(&self, pointer: usize) -> Result<usize, MaxMindDbError> {
-        let resolved = pointer - self.node_count - DATA_SECTION_SEPARATOR_SIZE;
-
-        // Check bounds using pointer_base which marks the start of the data section
-        if resolved >= (self.buf.as_ref().len() - self.pointer_base) {
+        let resolved = pointer
+            .checked_sub(self.node_count)
+            .and_then(|v| v.checked_sub(DATA_SECTION_SEPARATOR_SIZE))
+            .filter(|&r| r < self.buf.as_ref().len().saturating_sub(self.pointer_base));
+        let Some(resolved) = resolved else {
             return Err(MaxMindDbError::invalid_database(
                 "the MaxMind DB file's data pointer resolves to an invalid location",
             ));
-        }
-
+        };
         Ok(resolved)
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/reader.rs` around lines 494 - 505, The subtraction in
resolve_data_pointer (pointer - self.node_count - DATA_SECTION_SEPARATOR_SIZE)
can underflow for malformed pointers; change it to use checked_sub (or chained
checked_subs) to detect underflow and return the same
MaxMindDbError::invalid_database error when the pointer is too small, then
perform the bounds check against self.buf.as_ref().len() - self.pointer_base as
before; reference resolve_data_pointer, pointer, self.node_count,
DATA_SECTION_SEPARATOR_SIZE, pointer_base and buf to locate and update the
logic.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@README.md`:
- Around line 79-92: The README "Iterating networks" snippet uses top-level ?
operators and a rustdoc-hidden line that will render incorrectly; wrap the
snippet body in a proper main function signature like fn main() -> Result<(),
Box<dyn std::error::Error>> { … } and move the calls to Reader::open_readfile,
IpNetwork::parse, WithinOptions::default().skip_empty_values(), and the
reader.within(...) loop inside that function, then remove the rustdoc hidden
line (# Ok::<(), Box<dyn std::error::Error>>(())). This ensures reader.within,
lookup?, and network()? compile when copied and the documentation no longer
shows doctest artifacts.

In `@src/decoder.rs`:
- Around line 825-855: deserialize_tuple and deserialize_tuple_struct currently
forward to deserialize_seq so they don't enforce the expected tuple length; you
should validate the on-disk array length returned by size_and_type() against the
provided _len before building the ArrayAccess. Fix options: 1) enforce exact
match (return an error if size != len) or 2) allow size >= len by constructing
ArrayAccess with count = len and, after visitor returns, consume/skip the
remaining (size - len) elements from the deserializer; implement the chosen
behavior inside deserialize_tuple and deserialize_tuple_struct (use
size_and_type(), TYPE_ARRAY, and ArrayAccess to locate the code to change).

In `@src/reader_test.rs`:
- Around line 10-11: The tests test_networks and test_networks_within_scenarios
still build database filenames using inline loops over record sizes and ip
versions; replace those loops to reuse the centralized matrix constant by either
(A) mapping TEST_DATABASE_CONFIGS into the u32 pairs those tests expect (e.g.,
iterate TEST_DATABASE_CONFIGS and cast each (usize,usize) to (u32,u32) when
forming filenames) or (B) add a sibling constant TEST_RECORD_SIZES: &[(u32,u32)]
and use it in both tests; update the loops in test_networks and
test_networks_within_scenarios to iterate that constant instead of hardcoded
&[24_u32,28,32] / &[4_u32,6_u32] so the matrix is defined in one place.

In `@src/reader.rs`:
- Around line 400-424: These methods currently read metadata.record_size at
lookup time, which can panic if a downstream user mutates Metadata::record_size
after Reader construction; fix by caching the validated record_size into the
Reader struct at construction (in Reader::from_source) and use that cached field
instead of metadata.record_size inside find_address_in_tree_v4 and
find_address_in_tree_v6 (and anywhere else that currently reads
metadata.record_size, e.g. read_node dispatch), so the monomorphized match
always uses the immutable cached value validated at creation.
- Around line 380-398: Add a one-line comment above the lookup_result function
explaining that pointer == 0 is a sentinel meaning "not found" because
normalize_lookup_result collapses both node == node_count (empty record) and
node < node_count (corrupt/still-internal after full traversal) to 0; reference
lookup_result and normalize_lookup_result in the comment so future readers
understand why a raw 0 is treated as not-found rather than a real pointer.

In `@src/result.rs`:
- Around line 412-418: signed_index_to_path_element currently overflows for n ==
isize::MIN when computing -n - 1; change the negative branch in
signed_index_to_path_element to perform a saturating or checked computation
instead of direct negation: detect isize::MIN (or use checked_neg/checked_sub)
and map that case to the maximum sensible usize (saturate) before constructing
PathElement::IndexFromEnd, otherwise compute (-n - 1) safely and cast to usize;
update references to PathElement::Index and PathElement::IndexFromEnd
accordingly.

---

Outside diff comments:
In `@src/reader.rs`:
- Around line 494-505: The subtraction in resolve_data_pointer (pointer -
self.node_count - DATA_SECTION_SEPARATOR_SIZE) can underflow for malformed
pointers; change it to use checked_sub (or chained checked_subs) to detect
underflow and return the same MaxMindDbError::invalid_database error when the
pointer is too small, then perform the bounds check against
self.buf.as_ref().len() - self.pointer_base as before; reference
resolve_data_pointer, pointer, self.node_count, DATA_SECTION_SEPARATOR_SIZE,
pointer_base and buf to locate and update the logic.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1829e6a2-5014-4e73-9b75-5841e4d54162

📥 Commits

Reviewing files that changed from the base of the PR and between fd8857d and 7f92c6e.

📒 Files selected for processing (11)

README.md
benches/common.rs
benches/lookup.rs
benches/serde_usage.rs
examples/lookup.rs
src/decoder.rs
src/lib.rs
src/reader.rs
src/reader_test.rs
src/result.rs
src/within.rs

Add a shared helper for collecting network strings from Within iterators so the iterator-option tests can focus on their expected outputs instead of repeated traversal boilerplate.

Document the lazy LookupResult workflow, highlight decode_path and network iteration, and add the focused benchmark commands used when evaluating performance changes.

Add public-API tests for two documented LookupResult behaviors: shared offsets for identical records and clean None results when decode_path is used on a lookup without data.

Switch the public decode_path examples over to the path! macro so the rendered docs match the ergonomic API that the crate already exposes and tests. Also wrap the longer calls more naturally in the rendered examples.

Show the matched network and a small decode_path lookup before the full City decode so the runnable example demonstrates the newer lazy lookup workflow.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CHANGELOG.md`:
- Around line 13-15: Clarify user impact of the tuple/tuple-struct
length-mismatch change by updating the CHANGELOG entry to explicitly state that
deserializing a database array into a tuple or tuple struct will now return a
decoding error when lengths differ (instead of silently succeeding), and advise
users to audit their Deserialize implementations and tuple destinations for
arity mismatches during upgrades so they can fix or handle newly surfaced
runtime errors.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: ed13be34-1407-4e6e-a13c-40db4e10e12e

📥 Commits

Reviewing files that changed from the base of the PR and between 7f92c6e and ca19569.

📒 Files selected for processing (12)

CHANGELOG.md
README.md
benches/common.rs
benches/lookup.rs
benches/serde_usage.rs
examples/lookup.rs
src/decoder.rs
src/lib.rs
src/reader.rs
src/reader_test.rs
src/result.rs
src/within.rs

coderabbitai Bot requested changes Apr 25, 2026

View reviewed changes

Comment thread src/reader.rs

Comment thread src/reader.rs

coderabbitai Bot requested changes Apr 25, 2026

View reviewed changes

Comment thread README.md

Comment thread src/decoder.rs

Comment thread src/reader_test.rs

Comment thread src/reader.rs

Comment thread src/reader.rs

Comment thread src/result.rs

oschwald force-pushed the greg/perf-improvement branch from 7f92c6e to 6d8fa42 Compare April 25, 2026 17:18

oschwald added 11 commits April 25, 2026 11:04

Optimize search tree traversal

c32a0d3

Optimize direct value deserialization

de063a1

Optimize lookup address dispatch

edcf5ab

Simplify decoder deserializer boilerplate

e7e25cc

Simplify result and reader helpers

c9cc5ee

Simplify helper setup and traversal

6a83915

Simplify iterator test collection

318b634

Add a shared helper for collecting network strings from Within iterators so the iterator-option tests can focus on their expected outputs instead of repeated traversal boilerplate.

Improve README usage guidance

adaf2f8

Document the lazy LookupResult workflow, highlight decode_path and network iteration, and add the focused benchmark commands used when evaluating performance changes.

Expand lookup result coverage

4725f24

Add public-API tests for two documented LookupResult behaviors: shared offsets for identical records and clean None results when decode_path is used on a lookup without data.

Refresh rustdoc path examples

172470e

Switch the public decode_path examples over to the path! macro so the rendered docs match the ergonomic API that the crate already exposes and tests. Also wrap the longer calls more naturally in the rendered examples.

Improve lookup example output

2b66bb2

Show the matched network and a small decode_path lookup before the full City decode so the runnable example demonstrates the newer lazy lookup workflow.

oschwald force-pushed the greg/perf-improvement branch from ca19569 to 2b66bb2 Compare April 25, 2026 18:04

coderabbitai Bot requested changes Apr 25, 2026

View reviewed changes

Comment thread CHANGELOG.md

Update README.md on release too

4b9368a

oschwald merged commit 42240a1 into main Apr 25, 2026
41 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minor performance improvements#112

Minor performance improvements#112
oschwald merged 12 commits intomainfrom
greg/perf-improvement

oschwald commented Apr 25, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 25, 2026 •

edited

Loading

Rate limit exceeded

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

oschwald commented Apr 25, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

oschwald commented Apr 25, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 25, 2026 •

edited

Loading