Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor StringIndex interface #6787

Merged
merged 4 commits into from
Aug 25, 2023
Merged

Refactor StringIndex interface #6787

merged 4 commits into from
Aug 25, 2023

Conversation

ironage
Copy link
Contributor

@ironage ironage commented Jul 13, 2023

A first step towards one day having different index types.
The basic interface is cleaned up and extracted to a pure virtual SearchIndex class.
The StringIndex had several templated methods, but they all just converted to a Mixed value anyways, so I was able to tidy that up to simplify the API as well.
Going through a virtual method did cause some (~10%) slowdown according to our benchmarks when adding an index to an existing column, but I was able to make some optimizations for a bulk load to use the cluster directly which resulted in a small (~5%) improvement in performance for this case.

@ironage ironage self-assigned this Jul 13, 2023
@cla-bot cla-bot bot added the cla: yes label Jul 13, 2023
@ironage
Copy link
Contributor Author

ironage commented Jul 14, 2023

Here's some benchmark results. The main thing is that there is a small (~10%) improvement in creating an index on an existing dataset, and no significant change otherwise.

Req runs:   10  CreateIndex (Full   , EncryptionOff):         min  42.59ms (-8.90%)            max  43.06ms (-9.11%)            med  42.71ms (-9.21%)            avg  42.76ms (-9.11%)            stddev   174us (-29.70%)
Req runs:    9  CreateIndex (Full   , EncryptionOn):          min  42.99ms (-13.32%)           max  43.41ms (-20.80%)           med  43.05ms (-13.79%)           avg  43.08ms (-15.67%)           stddev   131us (-93.86%)

Req runs:   10  CreateIndex (MemOnly, EncryptionOff):         min  42.54ms (-8.74%)            max  42.77ms (-10.18%)           med  42.58ms (-8.89%)            avg  42.62ms (-9.02%)            stddev    82us (-73.09%)

Req runs: 1000  FindAllStringFewDupes (MemOnly, EncryptionOff):     min      0us (+0.00%)            max      1us (-10.00%)           med      0us (+0.00%)            avg      0us (+2.22%)            stddev     0us (+1.37%)

Req runs: 1000  FindAllStringManyDupes (MemOnly, EncryptionOff):     min     51us (+3.19%)            max     78us (+8.31%)            med     53us (+6.36%)            avg     53us (+4.30%)            stddev     2us (-21.69%)

Req runs:  228  FindAllFulltextStringManyDupes (MemOnly, EncryptionOff):     min   2.15ms (-0.70%)            max   2.24ms (-7.06%)            med   2.19ms (-0.76%)            avg   2.19ms (-0.65%)            stddev    29us (-25.23%)

Req runs: 1000  FindFirstStringFewDupes (MemOnly, EncryptionOff):            min     82us (+1.50%)            max    100us (+0.50%)            med     82us (+1.45%)            avg     82us (+0.14%)            stddev     1us (-52.13%)

Req runs: 1000  FindFirstStringManyDupes (MemOnly, EncryptionOff):           min     15us (-1.39%)            max     55us (-51.60%)           med     16us (-0.53%)            avg     16us (-1.41%)            stddev     1us (-59.44%)

Req runs:   72  CountStringManyDupesNonIndexed (MemOnly, EncryptionOff):     min   6.87ms (+0.06%)            max   7.37ms (+3.87%)            med   6.89ms (+0.06%)            avg   6.91ms (+0.27%)            stddev    95us (+101.12%)

Req runs: 1000  CountStringManyDupesIndexed (MemOnly, EncryptionOff):        min     15us (+0.58%)            max     56us (+119.90%)          med     15us (+3.12%)            avg     15us (+3.69%)            stddev     2us (+357.67%)

Req runs:  180  QueryInsensitiveStringIndexed (MemOnly, EncryptionOff):      min   1.08ms (+0.99%)            max  49.56ms (-0.50%)            med   2.21ms (+0.47%)            avg   2.96ms (+0.17%)            stddev  6.08ms (-0.26%)

Req runs:  369  QueryChainedOrStringsIndexed (MemOnly, EncryptionOff):       min   1.31ms (+0.96%)            max   1.49ms (-0.74%)            med   1.33ms (+1.01%)            avg   1.34ms (+0.61%)            stddev    38us (-12.92%)

Req runs:   43  QueryNotChainedOrStringsIndexed (MemOnly, EncryptionOff):     min  11.36ms (-0.15%)            max  11.68ms (+0.08%)            med  11.39ms (-0.26%)            avg  11.41ms (-0.18%)            stddev    63us (+11.80%)

Req runs:   28  QueryChainedOrIntsIndexed (MemOnly, EncryptionOff):           min  17.46ms (-2.04%)            max  19.25ms (+5.64%)            med  17.53ms (-2.29%)            avg  17.77ms (-0.98%)            stddev   571us (+580.96%)

Req runs:  849  QueryIntEqualityIndexed (MemOnly, EncryptionOff):             min    584us (+0.22%)            max    666us (+3.62%)            med    587us (+0.29%)            avg    589us (+0.19%)            stddev     7us (+2.76%)

Req runs:   59  QueryEqual<mixed><NonNullable><Indexed> (MemOnly, EncryptionOff):     min   8.37ms (+0.16%)            max   9.09ms (+5.08%)            med   8.39ms (+0.18%)            avg   8.46ms (+0.80%)            stddev   179us (+229.66%)

Req runs: 1000  QueryEqual<uuid><NonNullable><Indexed> (MemOnly, EncryptionOff):      min     48us (+0.17%)            max     64us (-16.66%)           med     49us (+0.26%)            avg     49us (-1.27%)            stddev     1us (-48.57%)

Req runs:   26  QueryEqual<objectId><NonNullable><Indexed> (MemOnly, EncryptionOff):     min  18.60ms (+0.04%)            max     19ms (-0.10%)            med  18.63ms (+0.02%)            avg  18.66ms (-0.05%)            stddev    85us (-17.84%)

Req runs: 1000  QueryEqual<timestamp><NonNullable><Indexed> (MemOnly, EncryptionOff):     min     53us (+3.43%)            max     89us (+25.72%)         * med     55us (+7.27%)          * avg     55us (+6.60%)            stddev     2us (+47.85%)

Req runs:    5  QueryEqual<bool><NonNullable><Indexed> (MemOnly, EncryptionOff):          min 153.67ms (+0.11%)            max 154.47ms (+0.25%)          * med 154.23ms (+0.43%)            avg 154.20ms (+0.29%)            stddev   315us (+6.67%)

Req runs:   59  QueryInsensitiveEqual<mixed><NonNullable><Indexed> (MemOnly, EncryptionOff):     min   8.37ms (+0.11%)            max   8.61ms (+0.80%)            med   8.39ms (+0.15%)            avg   8.39ms (+0.08%)            stddev    32us (-5.93%)

Req runs: 1000  QueryInsensitiveEqual<string><NonNullable><Indexed> (MemOnly, EncryptionOff):     min     65us (+0.71%)            max     85us (+2.36%)            med     66us (-0.06%)            avg     67us (-0.13%)            stddev     1us (-13.80%)

Copy link
Contributor

@finnschiermer finnschiermer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@jedelbo jedelbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this is kind of a half baked cake, but as you say, this is just the first step. But in order to verify that SearchIndex has a usable interface, we should see it in use. I guess that eventually, Table::m_index_accessors will be a vector of SearchIndex, so whay can't we make this change now? This will also expose that you have probably done some shortcuts with the bulk insert :-)

@@ -1544,8 +1540,12 @@ void StringIndex::set<StringData>(ObjKey key, StringData new_value)
tokenizer->reset({old_string.data(), old_string.size()});
old_words = tokenizer->get_all_tokens();
}
StringData str_value;
if (new_value.is_type(type_String)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above


if (this->m_target_column.is_fulltext()) {
auto words = Tokenizer::get_instance()->reset(std::string_view(value)).get_all_tokens();
StringData str_value;
if (value.is_type(type_String)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be an assertion. We can only have fulltext index on a string column.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but this logic also handles inserting null strings into the index. Is this okay with that case in mind?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough - but why not put all the code inside the "true" block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see your point now. I'll update it.

@@ -479,6 +479,9 @@ T Obj::get(ColKey col_key) const
return _get<T>(col_key.get_index());
}

template UUID Obj::_get(ColKey::Idx col_ndx) const;
template util::Optional<UUID> Obj::_get(ColKey::Idx col_ndx) const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What difference does this make?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point I had a linker error saying that these symbols were undefined. I'm not able to reproduce this anymore but I don't think it harms anything to explicitly tell the compiler to define them here.

src/realm/table.cpp Outdated Show resolved Hide resolved
src/realm/index_string.hpp Outdated Show resolved Hide resolved
@ironage
Copy link
Contributor Author

ironage commented Aug 23, 2023

Updated benchmarks with the SearchIndex as a virtual parent:

Req runs:   11  CreateIndex (Full   , EncryptionOff):         min     43ms (-7.64%)            max  48.41ms (-1.15%)            med  43.07ms (-7.83%)            avg  43.79ms (-7.10%)            stddev  1.62ms (+75.41%)
Req runs:   10  CreateIndex (Full   , EncryptionOn):          min  43.06ms (-12.61%)           max  43.27ms (-15.88%)           med  43.13ms (-12.71%)           avg  43.14ms (-13.29%)           stddev    67us (-90.56%)

Req runs:   11  CreateIndex (MemOnly, EncryptionOff):         min  42.75ms (-8.07%)            max  43.26ms (-7.95%)            med  42.80ms (-8.07%)            avg  42.85ms (-8.08%)            stddev   152us (-2.63%)

Req runs:  138  GetLongString (MemOnly, EncryptionOff):       min   3.60ms (+0.07%)            max   3.68ms (-4.90%)          * med   3.67ms (+1.77%)            avg   3.66ms (+1.16%)            stddev    27us (+3.04%)

Req runs:   22  SetLongString (MemOnly, EncryptionOff):       min  22.02ms (-0.28%)            max  22.44ms (+0.99%)            med  22.04ms (-0.41%)            avg  22.07ms (-0.28%)            stddev    88us (+148.61%)

Req runs: 1000  FindAllStringFewDupes (MemOnly, EncryptionOff):     min      0us (+12.50%)           max      1us (-19.05%)           med      0us (+10.00%)           avg      0us (+2.38%)            stddev     0us (-25.32%)

Req runs: 1000  FindAllStringManyDupes (MemOnly, EncryptionOff):     min     50us (+0.17%)            max     63us (-10.87%)           med     51us (+0.17%)            avg     51us (+0.77%)            stddev     1us (+17.15%)

Req runs:  229  FindAllFulltextStringManyDupes (MemOnly, EncryptionOff):     min   2.18ms (+0.96%)            max   2.34ms (+1.59%)            med   2.18ms (+0.99%)            avg   2.18ms (+0.95%)            stddev    17us (+22.32%)

Req runs: 1000  FindFirstStringFewDupes (MemOnly, EncryptionOff):            min     77us (-0.22%)            max     83us (-11.83%)           med     77us (-0.32%)            avg     78us (-0.27%)            stddev     1us (-24.56%)

Req runs: 1000  FindFirstStringManyDupes (MemOnly, EncryptionOff):           min     15us (+0.28%)            max     16us (-15.77%)           med     15us (-0.28%)            avg     15us (-0.21%)            stddev     0us (-33.48%)

Req runs:   72  CountStringManyDupesNonIndexed (MemOnly, EncryptionOff):     min   6.86ms (+0.02%)            max   7.08ms (+0.48%)            med   6.88ms (+0.03%)            avg   6.90ms (+0.11%)            stddev    42us (+34.65%)

Req runs: 1000  CountStringManyDupesIndexed (MemOnly, EncryptionOff):        min     14us (-0.00%)            max     16us (-23.15%)           med     15us (+0.29%)            avg     15us (+0.06%)            stddev     0us (-54.92%)

Req runs:  670  Query (MemOnly, EncryptionOff):                            * min    743us (+0.55%)            max    753us (-13.93%)           med    744us (+0.31%)            avg    745us (+0.33%)            stddev     2us (-80.28%)

Req runs:   83  QueryNot (MemOnly, EncryptionOff):                           min   5.94ms (+1.13%)            max   6.11ms (-1.92%)            med   5.94ms (+0.94%)            avg   5.96ms (+1.04%)            stddev    36us (-21.36%)

Req runs: 1000  QueryLongString (MemOnly, EncryptionOff):                    min    121us (+1.51%)            max    128us (-11.09%)           med    122us (+1.24%)            avg    122us (+1.10%)            stddev     1us (-34.90%)

Req runs:    5  QueryInsensitiveString (MemOnly, EncryptionOff):             min    1.04s (+0.78%)            max    1.06s (+2.13%)            med    1.04s (+1.01%)            avg    1.04s (+1.21%)            stddev  7.21ms (+221.57%)

Req runs:  179  QueryInsensitiveStringIndexed (MemOnly, EncryptionOff):      min   1.07ms (-0.45%)            max  50.77ms (+1.57%)            med   2.21ms (-0.51%)            avg   2.99ms (+0.67%)            stddev  6.22ms (+1.81%)

Req runs:  150  QueryChainedOrStrings (MemOnly, EncryptionOff):            * min   3.32ms (+1.79%)            max   3.51ms (-2.91%)            med   3.33ms (-1.19%)            avg   3.34ms (-1.25%)            stddev    28us (-55.59%)

Req runs:  375  QueryChainedOrStringsIndexed (MemOnly, EncryptionOff):     * min   1.31ms (+2.01%)            max   1.36ms (-15.03%)         * med   1.33ms (+1.54%)            avg   1.33ms (+1.22%)            stddev     9us (-49.26%)

Req runs:   43  QueryNotChainedOrStrings (MemOnly, EncryptionOff):           min  11.28ms (+0.86%)            max  11.68ms (-0.30%)            med  11.34ms (+0.89%)            avg  11.38ms (+0.78%)            stddev   100us (-31.97%)

Req runs:   44  QueryNotChainedOrStringsIndexed (MemOnly, EncryptionOff):     min  11.29ms (+0.96%)            max  11.70ms (-0.81%)            med  11.33ms (+0.20%)            avg  11.41ms (+0.76%)            stddev   145us (+22.27%)

Req runs:  558  QueryChainedOrInts (MemOnly, EncryptionOff):                  min    888us (+0.04%)            max    962us (-1.19%)            med    893us (-0.22%)            avg    895us (-0.23%)            stddev     8us (-10.97%)

Req runs:   28  QueryChainedOrIntsIndexed (MemOnly, EncryptionOff):           min  17.45ms (-2.80%)            max  18.51ms (+1.95%)            med  17.50ms (-2.85%)            avg  17.55ms (-2.61%)            stddev   202us (+275.85%)

Req runs:    5  QueryIntEquality (MemOnly, EncryptionOff):                    min  96.42ms (-0.66%)            max  96.75ms (-0.60%)            med  96.56ms (-0.70%)            avg  96.58ms (-0.64%)            stddev   143us (+15.84%)

Req runs:  829  QueryIntEqualityIndexed (MemOnly, EncryptionOff):           * min    597us (+1.82%)            max    661us (+0.62%)          * med    600us (+1.70%)            avg    601us (+1.21%)            stddev     4us (-60.23%)

Req runs: 1000  QueryIntsVsDoubleColumns (MemOnly, EncryptionOff):            min      0us (-20.00%)           max      1us (-96.80%)           med      0us (+0.00%)            avg      0us (-2.92%)            stddev     0us (-91.92%)

Req runs: 1000  QueryStringOverLinks (MemOnly, EncryptionOff):                min    205us (+0.43%)            max    257us (+1.56%)          * med    217us (+6.28%)          * avg    218us (+5.79%)            stddev     3us (-43.55%)

Req runs:   22  SubqueryStrings (MemOnly, EncryptionOff):                   * min  22.53ms (+1.59%)            max  22.73ms (+1.39%)          * med  22.59ms (+1.45%)          * avg  22.60ms (+1.43%)            stddev    60us (-7.03%)

Req runs:   58  QueryEqual<mixed><NonNullable><Indexed> (MemOnly, EncryptionOff):     min   8.45ms (-0.06%)            max   8.62ms (-0.24%)            med   8.47ms (-0.13%)            avg   8.48ms (-0.30%)            stddev    27us (-39.96%)

Req runs:    5  QueryEqual<mixed><NonNullable><NonIndexed> (MemOnly, EncryptionOff):   * min  97.41ms (+0.50%)            max  97.91ms (+0.70%)            med  97.51ms (+0.40%)          * avg  97.58ms (+0.50%)            stddev   203us (+65.65%)

Req runs: 1000  QueryEqual<uuid><NonNullable><Indexed> (MemOnly, EncryptionOff):         min     49us (+1.22%)            max     57us (-13.22%)           med     49us (+1.11%)            avg     49us (+0.94%)            stddev     1us (-33.17%)

Req runs:   40  QueryEqual<uuid><NonNullable><NonIndexed> (MemOnly, EncryptionOff):      min  12.21ms (-0.33%)            max  12.49ms (+0.45%)            med  12.24ms (-0.77%)            avg  12.25ms (-0.71%)            stddev    43us (-13.51%)

Req runs:   26  QueryEqual<objectId><NonNullable><Indexed> (MemOnly, EncryptionOff):     min  18.80ms (-0.51%)            max  19.06ms (-1.26%)            med  18.87ms (-0.62%)            avg  18.88ms (-0.65%)            stddev    60us (-25.86%)

Req runs:   13  QueryEqual<objectId><NonNullable><NonIndexed> (MemOnly, EncryptionOff):     min  37.20ms (+0.07%)            max  37.53ms (-0.14%)            med  37.26ms (-0.43%)            avg  37.31ms (-0.20%)            stddev   110us (-16.68%)

Req runs: 1000  QueryEqual<timestamp><NonNullable><Indexed> (MemOnly, EncryptionOff):       min     52us (+1.14%)            max     61us (-22.51%)           med     53us (+1.04%)            avg     53us (+0.69%)            stddev     1us (-58.27%)

Req runs:   81  QueryEqual<timestamp><NonNullable><NonIndexed> (MemOnly, EncryptionOff):     min   6.12ms (-0.02%)            max   6.36ms (+2.04%)            med   6.14ms (-0.60%)            avg   6.14ms (-0.34%)            stddev    33us (+0.14%)

Req runs:    5  QueryEqual<bool><NonNullable><Indexed> (MemOnly, EncryptionOff):             min 155.13ms (-0.27%)            max 158.04ms (+0.79%)            med 156.70ms (+0.11%)            avg 156.78ms (+0.36%)            stddev  1.17ms (+94.01%)

Req runs:    5  QueryEqual<bool><NonNullable><NonIndexed> (MemOnly, EncryptionOff):        * min 191.66ms (+0.73%)            max 193.15ms (+0.96%)          * med 192.10ms (+0.67%)          * avg 192.20ms (+0.74%)            stddev   588us (+51.63%)

Req runs:    5  QueryInsensitiveEqual<mixed><NonNullable><NonIndexed> (MemOnly, EncryptionOff):     min 169.38ms (+0.20%)            max 170.45ms (+0.62%)            med 169.69ms (+0.22%)            avg 169.81ms (+0.33%)            stddev   415us (+172.75%)

Req runs:   58  QueryInsensitiveEqual<mixed><NonNullable><Indexed> (MemOnly, EncryptionOff):        min   8.46ms (-0.10%)            max   8.83ms (-0.94%)            med   8.49ms (-0.83%)            avg   8.52ms (-0.62%)            stddev    86us (+17.29%)

Req runs:    7  QueryInsensitiveEqual<string><NonNullable><NonIndexed> (MemOnly, EncryptionOff):     min  65.48ms (-0.14%)            max  65.84ms (-0.66%)            med  65.55ms (-0.58%)            avg  65.58ms (-0.49%)            stddev   121us (-54.02%)

Req runs: 1000  QueryInsensitiveEqual<string><NonNullable><Indexed> (MemOnly, EncryptionOff):        min     65us (+0.58%)            max     92us (-87.43%)           med     66us (-5.69%)            avg     67us (-8.18%)            stddev     2us (-92.66%)

@ironage ironage requested a review from jedelbo August 24, 2023 22:13
};


struct SearchIndex {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this suddenly a struct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change it to a class.

@ironage
Copy link
Contributor Author

ironage commented Aug 25, 2023

test failures are unrelated

@ironage ironage merged commit 4d24835 into next-major Aug 25, 2023
23 of 26 checks passed
@ironage ironage deleted the js/string-index-prep branch August 25, 2023 18:58
nicola-cab added a commit that referenced this pull request Mar 7, 2024
* Prepare next-major

* Remove support for upgrading from pre core-6 (v10) (#6090)

* Optimize size of ArrayDecimal128 (#6111)

Optimize storage of Decimal128 properties so that the individual values will take up 0 bits (if all nulls or all zero), 32 bits, 64 bits or 128 bits depending on what is needed.

* update next major to core 13.4.1 (#6310)

* temporary disable failing c api decimal test

* Revert "update next major to core 13.4.1" (#6312)

* Revert "update next major to core 13.4.1 (#6310)"

This reverts commit 59764a2.

* appease format checks

* Align dictionaries to Lists and Sets when they are cleared.  (#6254)

* Fix storage of Decimal128 NaNs

* Allow Collections to be owned by Collections (#6447)

Introduce a new class - ColletionParent - which a collection will refer to as its owner. This class can be
specialized as an Obj if the nesting level is 0 or a CollectionList if the collection is nested.

* Add interface for defining columns of nested collections

* Add CollectionList class

* Change CollectionBase::set_owner() interface

Make it clear that when an Obj is the owner, then the index must be a ColKey

* Implementation `CollectionList::remove()` (#6381) (#6458)

Co-authored-by: Jørgen Edelbo <jorgen.edelbo@mongodb.com>

* Schema support for nesting collection (#6451)

Co-authored-by: Jørgen Edelbo <jorgen.edelbo@mongodb.com>

* Handle links in nested collections (#6470)

* Handle nullifying links in nested collections
* Clear backlinks related to nested collections

* Return collection type in Mixed (#6520)

* Print nested collections to Json (#6534)

* dump to json support info about nested collections for schema

* reuse logic for printing nested collections

* main logic for expanding nested collections to json, requires to be polished

* more testing for nested containers

* complete algo for printing nested collections in json format

* add testing json files to project

* generate json files option set to false

* run whole test suite

* test nested collections with links

* format checks

* Move out_mixed_json... functionality to Mixed class

* Remove not needed template parameter from CollectionBaseImpl

* Delegate to_json to collections

* remove commented code

* fix audit conflicts

---------

Co-authored-by: Jørgen Edelbo <jorgen.edelbo@mongodb.com>

* Simplify Obj::get_path()

* Store ref in ArrayMixed (#6565)

* Cleanup naming and consolidate update strategy
* Allow a Mixed containing a ref to be stores in ArrayMixed

* Actually store collection type in Mixed (#6583)

* Allow Dictionary to contain a collection (#6584)

* Make a template specializetion for Lst<Mixed>

* Allow Lst<Mixed> to contain a collections

* Streamlining interface (#6615)

Main change is that insert... will not return the created collection. This
has of course big influence on the test cases written for the old API.

Virtual interface for setting/getting nested collections created

* Api nested collections in OS (#6618)

Add interface on both object_store::Collection and the C API to handle collections in Mixed.
---------

Co-authored-by: Jørgen Edelbo <jorgen.edelbo@mongodb.com>

* Set interface nested collections (#6648)

* testing for set<mixed>

* c-api for nested sets

* fix Set constructor

* Get path from collection objects (#6636)

* Move NoOpTransactionLogParser to transact_log.hpp

* Add nested collection path in transaction log

* Optimize get_path()

Avoid having the first element in Path being a std::string

* Small fixes

* Make m_path private in sync::instr::Path

* Remove `set_string_compare_method` (#6668)

Partially based on 5f2dda1 Delete some obsolete cruft

set_string_compare_method() and everyhing related to it has never actually been
used by any SDK, and is not really the correct solution to the problem anyway.

Co-authored-by: Thomas Goyne <tg@realm.io>

* Replication of operations on nested collections

* Remove support for query over typed links in Dictionary

This feature is not exposed, and should not be done in the way
it was implemented.

* Use BPlusTree to hold backlinks (#6673)

Removes the limit on how many backlinks we can handle

* Add StablePath concept

* Collection in mixed notification support. (#6660)

* Add support in the notification machinery for nested collections.

* Avoid passing string parameters by value in KeyPathMapping interface

* Use uniform Path representation in query parser

We need to be able to handle a path that is just a sequence of strings
and integers. The strings can then either be a property name or a key
in a dictionary. Before we have known that the last entry in a path
would be a property name. We can't assume that anymore, so we just have
to follow links as long as that is possible. The rest must then be a path
to the wanted value.

We must also allow the syntax "dict.key" and dict["key"] to be used
interchangeably. A nested dictionary can be used in the same way as
an embedded object is used and so the syntax for querying on a specific
property should be the same.

* Support query on nested collections

This includes supporting using index in query on list of primitives

* Copy replication nested collections (#6714)

* copy replication for nested collections

* Remove support for TypedLinks in LinkTranslator

Removes some complexity/code. Is easy to re-introduce. Can be safely
removed if we disallow creating columns of this type. This can also
safely be done, as this feature is not yet used.

* Support typed links in nested collections

This is about the usual stuff:
 - When a link is inserted, make sure a backlink is created
 - When a link is cleared, make sure the backlink is removed.
 - When object containing a collection containing links is deleted
   make sure the backlinks are removed.
 - When the linked-to object is deleted the link should be nullified/removed.
 - When the linked-to object is made into a tombstone, the link should be updated.
 - When the linked-to object is recreated, the link should be restored.

* Handle exceptions thrown from Obj::get_collection_ref

* Support assigning a json string to a mixed property

Collect all to_json related functions in one compilation unit. Then
it will only be included in the final binary if used.

* Support having [*] as part of a path (#6741)

Allows you to consider all elements at some level.

* added check for set in mixed in the C API (#6764)

* Fix collection mismatch for `Set` in `Mixed`. (#6802)

* Allow TypedLinks to be part of path to property in queries

* Sorting stage 2 (#6669)

* Remove `set_string_compare_method`

Partially based on 5f2dda1 Delete some obsolete cruft

set_string_compare_method() and everyhing related to it has never actually been
used by any SDK, and is not really the correct solution to the problem anyway.

* strings are no longer equal to binaries

* parser supports bin(...) to differentiate binaries

* fix sectioned results

* fix formatting

* review updates

* Fix syntax

* review feedback changes

* fix reported UB

* lint

---------

Co-authored-by: Thomas Goyne <tg@realm.io>
Co-authored-by: Jørgen Edelbo <jorgen.edelbo@mongodb.com>

* Fix list type

* Syntactical sugar

* Throw if syncing a collection nested in Mixed

* Support indexing into link collections in Query (#6854)

* Support syncing nested Set

* Add missing support for getting Sets

* Return correct attachement state from nested collections (#6880)

Ensures that proper notifications on no longer existing collections are sent out

* small changes to the c api for collections in mixed (#6881)

* explicit insertion for collections in mixed and return the collection just inserted

* Make exceptions thrown by nested collections more consistent (#6875)

* Make information on the deletion of a collection available in C API (#6896)

* update set_collection for list and have an explicit function for each collection (#6900)

* Small changes

* Publish Obj::set_json in C API

* Check for stale accessors to a collction embedded directly in a Mixed property

Change the index held by the collection object from ColKey to a structure
(ColIndex) )containing both the index of the column and a key generated for
that particular collection. The key value is stored alongside the ref and
compared with the key value found in index when trying to obtain the ref
for the collection.

This commit includes review updates

* Fix freezing a nested collection

* Remove support for static nested collections

* Use more bits in ColIndex key

* Improve StringIndex::dump_node_structure

* Check for stale accessors to a collction embedded in a dictionary

Change the index held by the collection object from std::string to a structure
(KeyIndex) ) containing both the beginning of the dictionary key (mostly for
debugging purposes) and an index key generated for that particular collection.
The key value is stored alongside the ref and compared with the key value found
in index when trying to obtain the ref for the collection.

* Optimize StableIndex

* Refactor StringIndex interface (#6787)

* refactor StringIndex interface

optimize

* StringIndex has a virtual parent SearchIndex

* review feedback and fix a warning

* More consistent exception handling for nested collections

This commit fixes the problem that trying to access position 0 in a newly
created nested list would give an exception saying that the collection
was gone instead of an out-of-bounds exception. This was because we had a
test for attached before validating the index.

The solution selected is to remove "ensure_attached" and let the exceptions
thrown in "get_collection_ref" flow all the way to the client. This is kind of
fundamental change in that we must remove the noexcept specification from
"update_if_needed_with_status" and make the "init_from_parent" functions rethrow
the exceptions caught. The noexcept functions calling "update_if_..." must
add a try..catch block.

* Add ability to get collections from Results (#6948)

Co-authored-by: Nicola Cabiddu <nicola.cabiddu@mongodb.com>

* Fix compilation of RealmTrawler

* Logging mutations on tables (#6953)

To avoid having the same operation logged twice, the logging in instruction_applier
in removed.

* Simplify Logger class a bit

Logger::m_base_logger_ptr seems not to be used in the class itself.
The member is added to the sub-classes that need it.

get/set level_threshold need not be virtual is we remove support
for NullLogger.

* Limiting the output when logging large string and binary values (#6986)

* Introduce logging categories

* Sorting stage 3 (#6670)

* Add tests on BPlusTree upgrade

* change the sort order of strings

* Add test on upgrade of StringIndex and Set

* remove utf8_compare

* Add upgrade functionality

* Avoid string index upgrade

* Update test

* Move Set::do_resort() to .cpp file

* Generate test_upgrade_database_x_23.realm as on ARM

* Revert "Avoid string index upgrade"

This reverts commit 333982a.

* Fix upgrade logic for string index

- Only upgrade if char is signed
- Upgrade Mixed columns too

* memcmp is faster than std::lexi_cmp and doesn't require casting

* optimize: only compare strings once

* Upgrade of fulltext index not needed

* migrate sets of binaries and better migration test

* generate migration Realms on a signed platform

* fix lint

* avoid a string index migration by using linear search

---------

Co-authored-by: Jørgen Edelbo <jorgen.edelbo@mongodb.com>

* Upodate Package.swift

* Client Reset for collections in mixed / nested collections (#6766)

* Fix error after merge

* Fix issue using REALM_ENABLE_MEMDEBUG=On

* Logging of schema migrations

* Use logging categories (#7052)

* Logging notification activity

* Logging details when opening DB

* Fix warning

* Update bindgen to support logging categories

* Add cases handling Json::value_t::binary

* Log free space and history sizes when opening file

* Remove unused stuff

* Rearrange some code in Set<T>

The is to prepare for merge with master. It is more or less a cherry-pick
of commit bf5ffd3.

* Fix missing NullLogger

* Remove support for nested sets

* Fix warnings

* Remove type_LinkList and col_type_LinkList (#7114)

Should have been removed long time ago

* Index on list of strings (#7152)

* Prepare beta release

* Add path.hpp to installed headers

* Update release notes

* Allow keypath to be provided as argument (#7210)

* Remove set from realm_value_type for collections in mixed (#7245)

* Add support for collections in indexed mixed fields

* [C-API] Fix the return type of realm_set_collection (#7247)

* Fix the return type of realm_set_collection

* fix some leaking tests

* Simplify JSON functionality

* Don't leak implementation of BsonDocument and BsonArray to the users.

This is done by defining the interface explicitly and in a way that
makes it possible to easily change the underlying implementation.

* Throw when inserting an embedded object into a list of Mixed

* Fix queries on dictionaries in Mixed with @keys

* Optimize BsonDocument::find()

* Only output '_key': xxx when output mode is plain JSON

Fix 9169fa1

* Send notifitations about mutations on nested collections

* Restore correct expected json files

* Support querying for @SiZe on Mixed

This will make sense both for strings, binaries and nested collections.

* Support querying with @type on nested collections (#7288)

* Refactor ConstantNode::visit() (#7295)

This will allow us to use argument substitution in more places. In
particuler if TypeOfValue is expected.

The visit function in ConstantNode is split up i 2 steps. First the
value is extracted into a Mixed - that being directly from the query
string or from the arguments. Then the vaule is adapted to what is
needed based on the 'hint' parameter.

* Fix merge error in dependency list

* Fix using stringops query on nested collections

* Fix using ANY, NONE, ALL in query on Mixed property

* Remove LinkList (#7308)

* fix == NONE {x} queries (#7333)

* fix == NONE {x} queries

* more tests

* Fix app URI tests for baasaas (#7342)

* Mitigate races in accessing `m_initated` and `m_finalized` in various REALM_ASSERTs (#7338)

* Fix a TOCTOU race when copying Realm files

Checking if the destination exists before copying is a race condition as the
file can be created in between the check and the copy. Instead we should
attempt to copy without overwriting the target if it exists.

* Use clonefile() when possible in File::copy()

* Delete unused sync file action metadata fields

* Schema migration tests to use admin API rather than querying backing cluster (#7345)

* Add bson library (#7324)

Can be used as a reference implementation in tests.

* Fix Results nofitifation for changes to nested collections

* Use TestDirGuard where applicable

* Simplify session tests by consitently using TestSyncManager::fake_user()

* Separate TestSyncManager and OfflineAppSession

Co-authored-by: James Stone <james.stone@mongodb.com>

* Fix sync replication (#7343)

* Adjust CMake files to used by vcpkg (#7334)

* Fix SPM compilation errors (#7360)

Include paths for the tests are set up slightly different for the SPM build
from the CMake build.

* Prepare release

* Update release notes

* Prepare release

* Update release note

* Rewrite the "app: app destroyed during token refresh" test (#7363)

* Update to CHANGELOG

* Don't allow Core targets to be installed if submodule (#7379)

Co-authored-by: Kenneth Geisshirt <kenneth.geisshirt@mongodb.com>
Co-authored-by: Jørgen Edelbo <jorgen.edelbo@mongodb.com>

* Prepeare release

* Update release notes

* Do not populate list KVO information for non-list collections (#7378)

* Prevent opening files with file format 23 in read-only mode

* Don't update backlinks in Mixed self-assignment (#7384)

Setting a Mixed field to ObjLink equal to the current value removed the
existing backlink and then exited before adding the new one, leaving things in
an invalid state.

* Eliminate copies when accessing values from Bson types (#7377)

Returning things by value performs a deep copy, which is very expensive when
those things are also bson containers.

Re-align the naming with the convention names for the functions rather than
being weird and different.

* Use the correct allocator for queries on dictionaries over links (#7382)

The base table's allocator was being used to read from the target table.

* Bson object should hold binary data in decoded form

If you construct a Bson object from a std::vector<char>, the  extjson
streaming format should encode the binary data.

* Fix passing a double as argument to query on Decimal128 (#7387)

* Treat missing keys in dictionaries as null in queries (#7391)

* Treat missing keys in dictionaries as null in queries

* Fix test

---------

Co-authored-by: Jørgen Edelbo <jorgen.edelbo@mongodb.com>

* Allow using aggregate operations on Mixed properties in queries (#7398)

This is something which Cocoa and the query engine supports but the core query
parser did not.

* [bindgen] Enable support for collections in the `Mixed` data type (#7392)

* Adapt to breaking change.

* Expose APIs for flat collections in Mixed and add preparation for nested.

* Add and expose helper for getting the data type.

* Expose a data type enum that includes non-primitives.

The JS SDK needs this for checking types of Mixed.

* Update casting of enum constants.

* Replace access of 'm_type' with use of the 'int()' operator overload.

* Expose APIs for setting nested lists in Mixed.

* Expose APIs for setting nested dictionaries in Mixed.

* Expose APIs for getting nested lists in Mixed.

* Expose APIs for getting nested dictionaries in Mixed.

* Expose get_obj() on List and Set.

* Expose method for getting element type in List.

* Remove the need for element type helpers.

The JS SDK has managed to use sentinel values in the generated bindings instead.

* Remove unused header.

* Avoid doing unneeded logger work in Replication

Most of the replication log statements do some work including memory
allocations which are then thrown away if the log level it too high, so always
check the log level first. A few places don't actually benefit from this, but
it's easier to consistently check the log level every time.

* Prepare release

* RCORE-1990 Add X86 Windows Release builder to evergreen (#7383)

* Use the bfd linker in the armv7 toolchain (#7406)

* Fix several crashes in the object store benchmarks (#7403)

* add new index related benchmarks (#7401)

* Use updated curl on evergreen windows hosts (#7409)

* comment test not working + still missing handling for nested collections

* handle collection array for mixed types

* lint

* please windows builder warnings + x86

* proposed fix for 32-bit

* moved call outside assert macro

* Fix for 32 bit archs for encoded Arrays (#7427)

* tentative fix for 32 bit archs
* removed wrong cast to size_t from bf_iterator::set_value()
* fix inverted condition in 'unsigned_to_num_bits'
* fix inverted condition in 'unsigned_to_num_bits'

---------

Co-authored-by: Nicola Cabiddu <nicola.cabiddu@mongodb.com>

* lint

* Revert "lint"

This reverts commit 7ac0073.

* lint

---------

Co-authored-by: Jørgen Edelbo <jorgen.edelbo@mongodb.com>
Co-authored-by: James Stone <james.stone@mongodb.com>
Co-authored-by: Thomas Goyne <tg@realm.io>
Co-authored-by: Nikola Irinchev <irinchev@me.com>
Co-authored-by: Claus Rørbech <claus.rorbech@mongodb.com>
Co-authored-by: Kenneth Geisshirt <kenneth.geisshirt@mongodb.com>
Co-authored-by: Jonathan Reams <jbreams@mongodb.com>
Co-authored-by: Thomas Goyne <thomas.goyne@mongodb.com>
Co-authored-by: Lee Maguire <lee.maguire@mongodb.com>
Co-authored-by: LJ <81748770+elle-j@users.noreply.github.com>
Co-authored-by: Yavor Georgiev <fealebenpae@users.noreply.github.com>
Co-authored-by: Finn Schiermer Andersen <finn.schiermer.andersen@gmail.com>
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants