-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Je/dictionary reform #5780
Je/dictionary reform #5780
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice reduction in complexity!
What does the change in performance look like?
} | ||
} | ||
if (links.size()) | ||
cluster->remove_backlinks(cluster->get_real_key(i), col_key, links, state); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if a Mixed (or TypedLink above) links to the same table? I just noticed that case is not covered by get_owning_table()->links_to_self
above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I will try to find an answer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ironage isn't this piece of code a good candidate to be converted using the new class for links introduced here(#5796 (review))?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah good eye, it is a similar pattern. However, it uses lower level constructs at the cluster level rather than our container accessors. I think that is for performance reasons, so I didn't convert it.
Mixed Dictionary::find_value(Mixed value) const noexcept | ||
{ | ||
size_t ndx = update() ? m_values->find_first(value) : realm::npos; | ||
return (ndx == realm::npos) ? Mixed{} : do_get_key(ndx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't we allow null keys? I think this method should return a util::Optional<Mixed>
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assumption has so far been, that we would not allow null keys. (And today we only support string keys officially)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok thanks, I forgot we don't allow null keys
ClusterNode::State state = m_clusters->try_get(k); | ||
if (!state) { | ||
auto [ndx, actual_key] = find_impl(key); | ||
if (actual_key != key) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
handle the case where the dictionary is empty and the key to insert is null, then find_impl
returns a null Mixed.
if (actual_key != key) { | |
if (ndx == m_keys.size() || actual_key != key) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As stated above, we don't support null as key value. Current implementation only supports string and int as key types - and only strings officially.
@ironage I made this small test regarding performance (numbers are nanoseconds per element):
As you can see, when it comes to smaller dictionaries the new implementation is a bit faster for insertions and a bit slower for lookup. When the size grows, the old implementation wins in both cases. For very small dictionaries, the new implementation seems to be the fastest for both insertion and lookup. |
Thanks for checking the performance, those numbers look good to me! |
src/realm/bplustree.hpp
Outdated
@@ -203,6 +203,10 @@ class BPlusTreeBase { | |||
m_root->bp_set_parent(parent, ndx_in_parent); | |||
} | |||
|
|||
virtual Mixed get_any(size_t) = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should/could be const. virtual Mixed get_any(size_t) const = 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
@@ -892,17 +957,26 @@ void ClusterTree::insert_fast(ObjKey k, const FieldValues& init_values, ClusterN | |||
m_size++; | |||
} | |||
|
|||
ClusterNode::State ClusterTree::insert(ObjKey k, const FieldValues& init_values) | |||
Obj ClusterTree::insert(ObjKey k, const FieldValues& init_values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little counterintuitive, why do we return the Obj just inserted, or we construct one to signal that we have inserted? Do we need some reference to the object right after the insertion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used when we create objects. So the goal is actually to return an object just created.
@@ -914,13 +988,6 @@ bool ClusterTree::is_valid(ObjKey k) const noexcept | |||
return m_root->try_get(k, state); | |||
} | |||
|
|||
ClusterNode::State ClusterTree::get(ObjKey k) const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this either, how do we fetch an Obj? Only via insertion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you got the answer yourself further down,
} | ||
} | ||
if (links.size()) | ||
cluster->remove_backlinks(cluster->get_real_key(i), col_key, links, state); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ironage isn't this piece of code a good candidate to be converted using the new class for links introduced here(#5796 (review))?
Obj insert(ObjKey k, const FieldValues& values); | ||
|
||
// Lookup and return object | ||
Obj get(ObjKey k) const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, OK! This is inline in the header file now…
@@ -172,19 +198,29 @@ class ClusterTree { | |||
friend class ClusterNodeInner; | |||
|
|||
Allocator& m_alloc; | |||
Table* m_owner; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use this ptr everywhere, and we initialize it in the constructor of the cluster tree. Probably, asserting that the owner ptr passed in the ctor is not NULL could prevent to catch possible errors while using the ClusterTree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be ideal. Unfortunately we still use a ClusterTree in a situation where the owner is a nullptr. It is in the Dictionary::migrate() function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, but does this mean that we could potentially crash? Because we use it unprotected in some places..
@@ -128,17 +129,46 @@ class ClusterTree { | |||
m_root->remove_column(col); | |||
} | |||
|
|||
// Create and return object | |||
Obj insert(ObjKey k, const FieldValues& values); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we return a cluster tree iterator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because what we need is an Obj.
something has changed which causes the client reset tests around dictionaries to fail with an exception where before there was none, I think this is something in the test harness than needs to change |
93f8481
to
74cd25c
Compare
@ironage It turns out that the old implementation was somehow wrong. It should have crashed as the test would advance the iterator past the end. And then some test is depending on a value being inserted in a specific position. |
@jedelbo good catch, thanks for fixing that! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved once the test failures and warnings reported by CI are fixed :)
3b5005d
to
4b83999
Compare
What, How & Why?
Fixes #5764
☑️ ToDos