Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Share BinEntry::Moved across all moved bins in a table #56

Merged
merged 2 commits into from
Feb 3, 2020

Conversation

domenicquirl
Copy link
Collaborator

@domenicquirl domenicquirl commented Feb 2, 2020

This originates from #50 and tries to achieve performance improvements by instantiating only one BinEntry::Moved per table, as for all moved bins inside a table the Moved points to the same next table. In particular, this tries to reduce the amount of allocations of Moved by Owned::new, only loading more shared pointers to the Moved instead.

This is submitted as a draft for the following reasons:

  • the history for this contains the changes in Use clear() in benchmarks #55, which is still pending approval (Edit: changed that)
  • the previous implementation of Table::drop contains a section which iterates through all bins, drops Moved entries and checks that none of the bins are non-empty. With these changes, the Moved is only dropped once (and turning all shared pointers to it into_owned would be incorrect), so this loop is not technically necessary any longer. I left the check against Moved in for now, but would like to discuss the possibility of removing it and relying on the other parts of the map implementation to uphold the invariant of all bins being empty at this point.
  • I wanted to check against the results of sanitizer CI tests.
  • in general, I want to give room for discussion on this.

I will add performance results to #50 as soon as I am done submitting this.

@codecov
Copy link

codecov bot commented Feb 2, 2020

Codecov Report

Merging #56 into master will decrease coverage by 0.63%.
The diff coverage is 88.65%.

Impacted Files Coverage Δ
src/iter/traverser.rs 95.9% <100%> (+0.06%) ⬆️
src/node.rs 98.73% <100%> (+7.52%) ⬆️
src/map.rs 87.2% <78.94%> (-1.03%) ⬇️
src/raw/mod.rs 84.69% <82.08%> (-8.33%) ⬇️

src/raw/mod.rs Outdated Show resolved Hide resolved
src/raw/mod.rs Outdated Show resolved Hide resolved
@domenicquirl domenicquirl force-pushed the share-moved branch 2 times, most recently from 6b6da65 to 9793d31 Compare February 2, 2020 16:46
@jonhoo
Copy link
Owner

jonhoo commented Feb 2, 2020

Here's a question: do we even need to store the address inside of Moved anymore? Could Table just have a next_table: Atomic<Table> instead? The fact that the Java version takes care to store the pointer directly inside the Moved makes me think that maybe different Moved entries can have different next_table pointers? Imagine that while walking a given table, two resizes happen, could some of the Moved of the first table point directly to the newest table? Or do they have to point to intermediates (like this PR implicitly assumes) for stuff like iteration to work correctly?

@domenicquirl
Copy link
Collaborator Author

Ah, the sanitizers fail due to a test case which manually constructs Moved and puts it into bins, of course they don't get freed then.

@domenicquirl
Copy link
Collaborator Author

Assuming the Moved can be shared, we would only need Moved as a marker and could then store a pointer to the next table directly (whether as Atomic or *const). However part of the idea of this change was to only instantiate Moved once per table, and if you store the address outside of Moved, this would create multiple instances of the marker again. I don't think we can get away without such a marker for moved bins.

With regard to different next_tables, I don't see how this would happen. The table stored in Moved is always self.next_table, which is set only at the start of a resize and reset to null when the resize finishes. Then the Moved are inserted into the current self.table (the old one, during the resize), and self.table and self.next_table does not change during such a resize. Then self.table becomes self.next_table, so for a second resize, Moved will be inserted into the new table.

@jonhoo
Copy link
Owner

jonhoo commented Feb 2, 2020

Oh, you could still also share the Moved, it just wouldn't contain any fields. If we can get rid of that *const Table and make Moved a true marker variant, that'd make me very happy.

Right, I agree with you that that makes sense, but the Java version very deliberately chooses to store the pointer in every Moved (at least as far as I remember), and that gives me pause.

@domenicquirl
Copy link
Collaborator Author

I think I found out why they did it. The reason I think is that find is implemented on BinEntry to handle the different kinds of nodes. Hence the node requires information about where to look in the Moved case to be contained within it. The alternative of course is to let Table do find.

@jonhoo
Copy link
Owner

jonhoo commented Feb 2, 2020

Ah, I see, interesting.. Or to pass &Table to find.

@domenicquirl
Copy link
Collaborator Author

domenicquirl commented Feb 2, 2020

Okay so I did that (spending half the time rewriting test cases that got broken by all the moving around 😅 ). Performance is comparable shared Moved with pointer so far on my machine. I stuck with Table::find(bin), as BinEntry::find(table) just seems wrong to me. If information from Table is required, then Table should be the responsible type I feel, the other way is kind of "backwards".

Anyways, since the tests causing the asan fail were among the ones I touched anyways, here's hoping I didn't miss anything and the sanitizers stop complaining :P

@domenicquirl
Copy link
Collaborator Author

Nice, fixed the tests. Please review again. Also, if we stick with some form of this change, the question about the checks in Table::drop is still open.

@jonhoo
Copy link
Owner

jonhoo commented Feb 2, 2020

Oh, yeah, I didn't expect that removing the pointer would change performance really, it's just one fewer "weird" thing in the code :)

src/node.rs Outdated Show resolved Hide resolved
src/map.rs Outdated Show resolved Hide resolved
src/map.rs Show resolved Hide resolved
src/raw/mod.rs Outdated Show resolved Hide resolved
src/raw/mod.rs Outdated Show resolved Hide resolved
src/raw/mod.rs Outdated Show resolved Hide resolved
@jonhoo
Copy link
Owner

jonhoo commented Feb 2, 2020

Hmm, okay, stepping out from the sub-comments a bit, I think the loop should be replaced with just a swap of .bins with an empty vector. Then,if cfg!(debug_assertions), walk the taken vector and check that all the entries are either null or Moved. Then, drop() the taken vector before we ultimately drop the shared Moved.

EDIT: I think this is what you originally proposed, just want to make sure I understand it right. We might not need to drop bins explicitly before dropping the shared Moved, but I like doing so as it ensures that no-one later modifies the code and accidentally tries to access one of the Moved references inside bins after we drop the shard Moved they are pointing to.

assert tables match when a shared Moved is used
@domenicquirl
Copy link
Collaborator Author

It was, except I didn't know about the possibility of using cfg! to still do the check in testing. That's really nice! This should be ready then 😊

@domenicquirl domenicquirl marked this pull request as ready for review February 3, 2020 12:26
src/raw/mod.rs Show resolved Hide resolved
src/raw/mod.rs Outdated Show resolved Hide resolved
src/raw/mod.rs Outdated Show resolved Hide resolved
@jonhoo
Copy link
Owner

jonhoo commented Feb 3, 2020

This is looking really good now ­— just a few more comments 😅

make BinEntry::Moved a true marker variant
next pointer is now lifted up to table level and shared across all bins

help_transfer computes next_table from the current table
@jonhoo jonhoo merged commit f85c76f into jonhoo:master Feb 3, 2020
@jonhoo
Copy link
Owner

jonhoo commented Feb 3, 2020

Beautiful, thank you! 🎉

@domenicquirl domenicquirl deleted the share-moved branch February 3, 2020 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants