Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Je/dictionary reform #5780

Merged
merged 13 commits into from
Sep 19, 2022
Merged

Je/dictionary reform #5780

merged 13 commits into from
Sep 19, 2022

Conversation

jedelbo
Copy link
Contributor

@jedelbo jedelbo commented Aug 26, 2022

What, How & Why?

Fixes #5764

☑️ ToDos

  • 📝 Changelog update
  • 🚦 Tests (or not relevant)
  • C-API, if public C++ API changed.

@cla-bot cla-bot bot added the cla: yes label Aug 26, 2022
CHANGELOG.md Outdated Show resolved Hide resolved
@bmunkholm bmunkholm marked this pull request as draft September 7, 2022 11:50
@jedelbo jedelbo marked this pull request as ready for review September 7, 2022 11:55
Copy link
Contributor

@ironage ironage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice reduction in complexity!
What does the change in performance look like?

}
}
if (links.size())
cluster->remove_backlinks(cluster->get_real_key(i), col_key, links, state);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if a Mixed (or TypedLink above) links to the same table? I just noticed that case is not covered by get_owning_table()->links_to_self above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I will try to find an answer

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ironage isn't this piece of code a good candidate to be converted using the new class for links introduced here(#5796 (review))?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good eye, it is a similar pattern. However, it uses lower level constructs at the cluster level rather than our container accessors. I think that is for performance reasons, so I didn't convert it.

Mixed Dictionary::find_value(Mixed value) const noexcept
{
size_t ndx = update() ? m_values->find_first(value) : realm::npos;
return (ndx == realm::npos) ? Mixed{} : do_get_key(ndx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we allow null keys? I think this method should return a util::Optional<Mixed> instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assumption has so far been, that we would not allow null keys. (And today we only support string keys officially)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok thanks, I forgot we don't allow null keys

ClusterNode::State state = m_clusters->try_get(k);
if (!state) {
auto [ndx, actual_key] = find_impl(key);
if (actual_key != key) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handle the case where the dictionary is empty and the key to insert is null, then find_impl returns a null Mixed.

Suggested change
if (actual_key != key) {
if (ndx == m_keys.size() || actual_key != key) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As stated above, we don't support null as key value. Current implementation only supports string and int as key types - and only strings officially.

@jedelbo
Copy link
Contributor Author

jedelbo commented Sep 12, 2022

@ironage I made this small test regarding performance (numbers are nanoseconds per element):

         |       old       |       new       |
         | insert | lookup | insert | lookup |
----------------------------------------------
10       |   2642 |    460 |   2100 |    317 |
100      |    736 |    216 |    541 |    242 |
1000     |    800 |    260 |    534 |    309 |
10000    |    852 |    298 |   1090 |    631 |

As you can see, when it comes to smaller dictionaries the new implementation is a bit faster for insertions and a bit slower for lookup. When the size grows, the old implementation wins in both cases. For very small dictionaries, the new implementation seems to be the fastest for both insertion and lookup.

@ironage
Copy link
Contributor

ironage commented Sep 12, 2022

Thanks for checking the performance, those numbers look good to me!

@@ -203,6 +203,10 @@ class BPlusTreeBase {
m_root->bp_set_parent(parent, ndx_in_parent);
}

virtual Mixed get_any(size_t) = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should/could be const. virtual Mixed get_any(size_t) const = 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

@@ -892,17 +957,26 @@ void ClusterTree::insert_fast(ObjKey k, const FieldValues& init_values, ClusterN
m_size++;
}

ClusterNode::State ClusterTree::insert(ObjKey k, const FieldValues& init_values)
Obj ClusterTree::insert(ObjKey k, const FieldValues& init_values)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little counterintuitive, why do we return the Obj just inserted, or we construct one to signal that we have inserted? Do we need some reference to the object right after the insertion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used when we create objects. So the goal is actually to return an object just created.

@@ -914,13 +988,6 @@ bool ClusterTree::is_valid(ObjKey k) const noexcept
return m_root->try_get(k, state);
}

ClusterNode::State ClusterTree::get(ObjKey k) const
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this either, how do we fetch an Obj? Only via insertion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you got the answer yourself further down,

}
}
if (links.size())
cluster->remove_backlinks(cluster->get_real_key(i), col_key, links, state);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ironage isn't this piece of code a good candidate to be converted using the new class for links introduced here(#5796 (review))?

Obj insert(ObjKey k, const FieldValues& values);

// Lookup and return object
Obj get(ObjKey k) const
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, OK! This is inline in the header file now…

@@ -172,19 +198,29 @@ class ClusterTree {
friend class ClusterNodeInner;

Allocator& m_alloc;
Table* m_owner;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use this ptr everywhere, and we initialize it in the constructor of the cluster tree. Probably, asserting that the owner ptr passed in the ctor is not NULL could prevent to catch possible errors while using the ClusterTree

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be ideal. Unfortunately we still use a ClusterTree in a situation where the owner is a nullptr. It is in the Dictionary::migrate() function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but does this mean that we could potentially crash? Because we use it unprotected in some places..

@@ -128,17 +129,46 @@ class ClusterTree {
m_root->remove_column(col);
}

// Create and return object
Obj insert(ObjKey k, const FieldValues& values);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we return a cluster tree iterator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because what we need is an Obj.

@ironage
Copy link
Contributor

ironage commented Sep 14, 2022

something has changed which causes the client reset tests around dictionaries to fail with an exception where before there was none, I think this is something in the test harness than needs to change

@jedelbo
Copy link
Contributor Author

jedelbo commented Sep 15, 2022

@ironage It turns out that the old implementation was somehow wrong. It should have crashed as the test would advance the iterator past the end. And then some test is depending on a value being inserted in a specific position.

@ironage
Copy link
Contributor

ironage commented Sep 15, 2022

@jedelbo good catch, thanks for fixing that!

Copy link
Contributor

@ironage ironage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved once the test failures and warnings reported by CI are fixed :)

@jedelbo jedelbo merged commit 276f03f into next-major Sep 19, 2022
@jedelbo jedelbo deleted the je/dictionary-reform branch September 19, 2022 08:43
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants