Je/dictionary reform #5780

jedelbo · 2022-08-26T07:17:13Z

What, How & Why?

Fixes #5764

☑️ ToDos

📝 Changelog update
🚦 Tests (or not relevant)
C-API, if public C++ API changed.

CHANGELOG.md

ironage

nice reduction in complexity!
What does the change in performance look like?

ironage · 2022-09-07T16:45:49Z

src/realm/cluster_tree.cpp

+                                }
+                            }
+                            if (links.size())
+                                cluster->remove_backlinks(cluster->get_real_key(i), col_key, links, state);


what happens if a Mixed (or TypedLink above) links to the same table? I just noticed that case is not covered by get_owning_table()->links_to_self above.

Good question. I will try to find an answer

@ironage isn't this piece of code a good candidate to be converted using the new class for links introduced here(#5796 (review))?

Yeah good eye, it is a similar pattern. However, it uses lower level constructs at the cluster level rather than our container accessors. I think that is for performance reasons, so I didn't convert it.

ironage · 2022-09-07T17:21:52Z

src/realm/dictionary.cpp

+Mixed Dictionary::find_value(Mixed value) const noexcept
+{
+    size_t ndx = update() ? m_values->find_first(value) : realm::npos;
+    return (ndx == realm::npos) ? Mixed{} : do_get_key(ndx);


don't we allow null keys? I think this method should return a util::Optional<Mixed> instead.

The assumption has so far been, that we would not allow null keys. (And today we only support string keys officially)

ok thanks, I forgot we don't allow null keys

ironage · 2022-09-07T17:38:41Z

src/realm/dictionary.cpp

-    ClusterNode::State state = m_clusters->try_get(k);
-    if (!state) {
+    auto [ndx, actual_key] = find_impl(key);
+    if (actual_key != key) {


handle the case where the dictionary is empty and the key to insert is null, then find_impl returns a null Mixed.

Suggested change

if (actual_key != key) {

if (ndx == m_keys.size() || actual_key != key) {

As stated above, we don't support null as key value. Current implementation only supports string and int as key types - and only strings officially.

jedelbo · 2022-09-12T12:08:01Z

@ironage I made this small test regarding performance (numbers are nanoseconds per element):

         |       old       |       new       |
         | insert | lookup | insert | lookup |
----------------------------------------------
10       |   2642 |    460 |   2100 |    317 |
100      |    736 |    216 |    541 |    242 |
1000     |    800 |    260 |    534 |    309 |
10000    |    852 |    298 |   1090 |    631 |

As you can see, when it comes to smaller dictionaries the new implementation is a bit faster for insertions and a bit slower for lookup. When the size grows, the old implementation wins in both cases. For very small dictionaries, the new implementation seems to be the fastest for both insertion and lookup.

ironage · 2022-09-12T15:20:27Z

Thanks for checking the performance, those numbers look good to me!

nicola-cab · 2022-09-12T17:19:41Z

src/realm/bplustree.hpp

@@ -203,6 +203,10 @@ class BPlusTreeBase {
            m_root->bp_set_parent(parent, ndx_in_parent);
    }

+    virtual Mixed get_any(size_t) = 0;


This should/could be const. virtual Mixed get_any(size_t) const = 0

nicola-cab · 2022-09-12T17:24:02Z

src/realm/cluster_tree.cpp

@@ -892,17 +957,26 @@ void ClusterTree::insert_fast(ObjKey k, const FieldValues& init_values, ClusterN
    m_size++;
 }

-ClusterNode::State ClusterTree::insert(ObjKey k, const FieldValues& init_values)
+Obj ClusterTree::insert(ObjKey k, const FieldValues& init_values)


This is a little counterintuitive, why do we return the Obj just inserted, or we construct one to signal that we have inserted? Do we need some reference to the object right after the insertion?

This is used when we create objects. So the goal is actually to return an object just created.

nicola-cab · 2022-09-12T17:25:13Z

src/realm/cluster_tree.cpp

@@ -914,13 +988,6 @@ bool ClusterTree::is_valid(ObjKey k) const noexcept
    return m_root->try_get(k, state);
 }

-ClusterNode::State ClusterTree::get(ObjKey k) const


I don't understand this either, how do we fetch an Obj? Only via insertion?

I think you got the answer yourself further down,

nicola-cab · 2022-09-12T17:27:16Z

src/realm/cluster_tree.cpp

+                                }
+                            }
+                            if (links.size())
+                                cluster->remove_backlinks(cluster->get_real_key(i), col_key, links, state);


@ironage isn't this piece of code a good candidate to be converted using the new class for links introduced here(#5796 (review))?

nicola-cab · 2022-09-12T17:28:28Z

src/realm/cluster_tree.hpp

+    Obj insert(ObjKey k, const FieldValues& values);
+
+    // Lookup and return object
+    Obj get(ObjKey k) const


Ah, OK! This is inline in the header file now…

nicola-cab · 2022-09-12T17:32:53Z

src/realm/cluster_tree.hpp

@@ -172,19 +198,29 @@ class ClusterTree {
    friend class ClusterNodeInner;

    Allocator& m_alloc;
+    Table* m_owner;


We use this ptr everywhere, and we initialize it in the constructor of the cluster tree. Probably, asserting that the owner ptr passed in the ctor is not NULL could prevent to catch possible errors while using the ClusterTree

That would be ideal. Unfortunately we still use a ClusterTree in a situation where the owner is a nullptr. It is in the Dictionary::migrate() function.

OK, but does this mean that we could potentially crash? Because we use it unprotected in some places..

nicola-cab · 2022-09-12T17:33:29Z

src/realm/cluster_tree.hpp

@@ -128,17 +129,46 @@ class ClusterTree {
        m_root->remove_column(col);
    }

+    // Create and return object
+    Obj insert(ObjKey k, const FieldValues& values);


Why don't we return a cluster tree iterator?

Because what we need is an Obj.

ironage · 2022-09-14T17:44:59Z

something has changed which causes the client reset tests around dictionaries to fail with an exception where before there was none, I think this is something in the test harness than needs to change

jedelbo · 2022-09-15T11:02:03Z

@ironage It turns out that the old implementation was somehow wrong. It should have crashed as the test would advance the iterator past the end. And then some test is depending on a value being inserted in a specific position.

ironage · 2022-09-15T18:33:52Z

@jedelbo good catch, thanks for fixing that!

ironage

Approved once the test failures and warnings reported by CI are fixed :)

jedelbo added 3 commits August 25, 2022 15:59

Dictionary implementation based on BPlusTrees instead of clusters

88bd62e

Add migration function for Dictionary

48ec445

Merge TableClusterTree into ClusterTree

5f00ebf

cla-bot bot added the cla: yes label Aug 26, 2022

bmunkholm reviewed Aug 26, 2022

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

Release note updated

9279861

bmunkholm marked this pull request as draft September 7, 2022 11:50

jedelbo requested review from nicola-cab, ironage and finnschiermer September 7, 2022 11:53

jedelbo marked this pull request as ready for review September 7, 2022 11:55

ironage reviewed Sep 7, 2022

View reviewed changes

Update after review

ac572e7

nicola-cab reviewed Sep 12, 2022

View reviewed changes

jedelbo added 2 commits September 13, 2022 15:36

Update after review

914b6c3

Format changed Json files

6bd96b4

Fix tests

74cd25c

jedelbo force-pushed the je/dictionary-reform branch from 93f8481 to 74cd25c Compare September 15, 2022 10:57

ironage approved these changes Sep 15, 2022

View reviewed changes

tgoyne mentioned this pull request Sep 15, 2022

Replace the typed aggregate functions with simpler untyped ones #5864

Merged

jedelbo added 2 commits September 16, 2022 09:31

Fix compilation

542c9ff

Fix test

4b83999

jedelbo force-pushed the je/dictionary-reform branch from 3b5005d to 4b83999 Compare September 16, 2022 07:59

jedelbo added 2 commits September 16, 2022 11:06

Merge branch 'next-major' into je/dictionary-reform

5680602

Update Package.swift

128dbb1

Fix Dictionary::init_from_parent

82839f3

jedelbo merged commit 276f03f into next-major Sep 19, 2022

jedelbo deleted the je/dictionary-reform branch September 19, 2022 08:43

tgoyne mentioned this pull request Sep 19, 2022

Greatly improve performance of sorting dictionaries #5168

Merged

github-actions bot locked as resolved and limited conversation to collaborators Mar 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Je/dictionary reform #5780

Je/dictionary reform #5780

jedelbo commented Aug 26, 2022 •

edited

ironage left a comment

ironage Sep 7, 2022

jedelbo Sep 12, 2022

nicola-cab Sep 12, 2022

ironage Sep 12, 2022

ironage Sep 7, 2022

jedelbo Sep 12, 2022

ironage Sep 12, 2022

ironage Sep 7, 2022

jedelbo Sep 12, 2022

jedelbo commented Sep 12, 2022 •

edited

ironage commented Sep 12, 2022

nicola-cab Sep 12, 2022

jedelbo Sep 13, 2022

nicola-cab Sep 12, 2022

jedelbo Sep 13, 2022

nicola-cab Sep 12, 2022

jedelbo Sep 13, 2022

nicola-cab Sep 12, 2022

nicola-cab Sep 12, 2022

nicola-cab Sep 12, 2022

jedelbo Sep 13, 2022

nicola-cab Sep 15, 2022

nicola-cab Sep 12, 2022

jedelbo Sep 13, 2022

ironage commented Sep 14, 2022

jedelbo commented Sep 15, 2022

ironage commented Sep 15, 2022

ironage left a comment

	if (actual_key != key) {
	if (ndx == m_keys.size() \|\| actual_key != key) {

Je/dictionary reform #5780

Je/dictionary reform #5780

Conversation

jedelbo commented Aug 26, 2022 • edited

What, How & Why?

☑️ ToDos

ironage left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jedelbo commented Sep 12, 2022 • edited

ironage commented Sep 12, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ironage commented Sep 14, 2022

jedelbo commented Sep 15, 2022

ironage commented Sep 15, 2022

ironage left a comment

Choose a reason for hiding this comment

jedelbo commented Aug 26, 2022 •

edited

jedelbo commented Sep 12, 2022 •

edited