Skip to content

Mmap index files#69

Merged
arminsabouri merged 6 commits intomasterfrom
mmap-index-files
Apr 29, 2026
Merged

Mmap index files#69
arminsabouri merged 6 commits intomasterfrom
mmap-index-files

Conversation

@arminsabouri
Copy link
Copy Markdown
Collaborator

MMap'ing the index files is the main contribution. Also included a refactor to clean up how the indecies were being passed around and built

@arminsabouri arminsabouri force-pushed the mmap-index-files branch 3 times, most recently from abf88c9 to fbae67b Compare April 24, 2026 17:34
@arminsabouri
Copy link
Copy Markdown
Collaborator Author

cc @bc1cindy @Mshehu5 @0xZaddyy

Comment thread contrib/bench.sh Outdated
extra_args+=("--bench" "$FILTER")
fi

cargo bench --workspace "${extra_args[@]}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i had a bash array issue when no filter was provided

Suggested change
cargo bench --workspace "${extra_args[@]}"
if [[ ${#extra_args[@]} -eq 0 ]]; then
cargo bench --workspace
else
cargo bench --workspace "${extra_args[@]}"
fi

let len = len_bytes / (N as u64);
let mmap = if len_bytes > 0 {
// Safety: the file is complete and will not be written to via this handle.
Some(unsafe { Mmap::map(&file)? })
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsafe 🔥

}

fn open(path: impl AsRef<Path>, len_error: &'static str) -> io::Result<Self> {
let file = OpenOptions::new().read(true).write(true).open(path)?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to drop the write(true)?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so? Do you see a reason to? We typically write to these files during indexing.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok, got it

I was concerned about the safety comment, but #69 (comment) is responding it

since mmap is not useful during writing I see no reason to drop it

thank you!

Comment thread src/crates/primitives/src/indecies.rs Outdated
Comment thread src/crates/primitives/benches/parser.rs Outdated
Comment thread contrib/bench.sh Outdated
extra_args+=("--bench" "$FILTER")
fi

cargo bench --workspace "${extra_args[@]}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this bench only timing parse_blocks?

the mmap fast-path kicks in after remap(), so a bench that runs after build_indices would exercise it

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the bench could let build() run in full

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the mmap fast-path kicks in after remap()

Correct. mmap is not useful during writing. buffered writes (something like #71) would be. This is mostly an optimization (perhaps a pre-optimization) for reads during analysis

mmap each of the index files as to avoid repeated disk reads
Comment thread contrib/bench.sh Outdated
cargo bench --workspace
else
cargo bench --workspace "${extra_args[@]}"
fi No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a formatting problem here?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No newline at end of file

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whats the formatting issue?

Copy link
Copy Markdown
Contributor

@bc1cindy bc1cindy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK

Copy link
Copy Markdown
Contributor

@0xZaddyy 0xZaddyy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a little concern which shouldn't prevent this from getting merged, ACK

let offset = index * (N as u64);
let offset = (index * N as u64) as usize;
if let Some(mmap) = &self.mmap {
let mut buf = [0u8; N];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was thinking of a bound check here

@arminsabouri arminsabouri merged commit 886c021 into master Apr 29, 2026
3 checks passed
@arminsabouri arminsabouri deleted the mmap-index-files branch April 29, 2026 14:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants