Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate copies when accessing values from Bson types #7377

Merged
merged 2 commits into from Feb 27, 2024
Merged

Conversation

tgoyne
Copy link
Member

@tgoyne tgoyne commented Feb 23, 2024

Returning things by value performs a deep copy, which is very expensive when those things are also bson containers.

Re-align the naming with the conventional names for the functions rather than being unnecessarily different.

@tgoyne tgoyne self-assigned this Feb 23, 2024
@cla-bot cla-bot bot added the cla: yes label Feb 23, 2024
Returning things by value performs a deep copy, which is very expensive when
those things are also bson containers.

Re-align the naming with the convention names for the functions rather than
being weird and different.
Copy link

Pull Request Test Coverage Report for Build thomas.goyne_193

Details

  • 82 of 82 (100.0%) changed or added relevant lines in 9 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall first build on tg/bson-copy at 90.922%

Totals Coverage Status
Change from base Build 2062: 90.9%
Covered Lines: 238310
Relevant Lines: 262105

💛 - Coveralls

@tgoyne tgoyne requested a review from jedelbo February 24, 2024 01:44
@jedelbo
Copy link
Contributor

jedelbo commented Feb 26, 2024

@tgoyne so, in which actual use cases can you measure a performance degradation? I have not seen a degradation in the App use of things.

@tgoyne
Copy link
Member Author

tgoyne commented Feb 26, 2024

The uses in App don't involve any nested collections, but we also use the bson types in our mongo client wrapper which can receive arbitrarily deeply nested data if that's what the user has.

@tgoyne
Copy link
Member Author

tgoyne commented Feb 26, 2024

Here is a trivial benchmark which shows a performance regression:

using namespace realm::bson;
static const std::vector<char>& read(const Bson& bson)
{
    if (bson.type() == Bson::Type::Array) {
        return read(static_cast<const BsonArray&>(bson)[0]);
    }
    return (static_cast<const std::vector<char>&>(bson));
}

TEST_CASE("bson") {
    std::vector<char> data(1024 * 1024, 'a');
    BsonArray nested({BsonArray({BsonArray({BsonArray({Bson(data)})})})});
    BENCHMARK("read nested data") {
        REQUIRE(read(nested) == data);
    };
}

41us with 13.27.0 and 255us with 14.0.0.

The API changes are also just a significant negative which makes working with the type much more awkward.

@jedelbo
Copy link
Contributor

jedelbo commented Feb 27, 2024

@tgoyne got your point. I was not aware that the type had a wider use. I did not think this API was part of our public API.

@tgoyne tgoyne merged commit fd8b61d into master Feb 27, 2024
1 of 2 checks passed
@tgoyne tgoyne deleted the tg/bson-copy branch February 27, 2024 16:12
nicola-cab added a commit that referenced this pull request Mar 7, 2024
* Prepare next-major

* Remove support for upgrading from pre core-6 (v10) (#6090)

* Optimize size of ArrayDecimal128 (#6111)

Optimize storage of Decimal128 properties so that the individual values will take up 0 bits (if all nulls or all zero), 32 bits, 64 bits or 128 bits depending on what is needed.

* update next major to core 13.4.1 (#6310)

* temporary disable failing c api decimal test

* Revert "update next major to core 13.4.1" (#6312)

* Revert "update next major to core 13.4.1 (#6310)"

This reverts commit 59764a2.

* appease format checks

* Align dictionaries to Lists and Sets when they are cleared.  (#6254)

* Fix storage of Decimal128 NaNs

* Allow Collections to be owned by Collections (#6447)

Introduce a new class - ColletionParent - which a collection will refer to as its owner. This class can be
specialized as an Obj if the nesting level is 0 or a CollectionList if the collection is nested.

* Add interface for defining columns of nested collections

* Add CollectionList class

* Change CollectionBase::set_owner() interface

Make it clear that when an Obj is the owner, then the index must be a ColKey

* Implementation `CollectionList::remove()` (#6381) (#6458)

Co-authored-by: Jørgen Edelbo <jorgen.edelbo@mongodb.com>

* Schema support for nesting collection (#6451)

Co-authored-by: Jørgen Edelbo <jorgen.edelbo@mongodb.com>

* Handle links in nested collections (#6470)

* Handle nullifying links in nested collections
* Clear backlinks related to nested collections

* Return collection type in Mixed (#6520)

* Print nested collections to Json (#6534)

* dump to json support info about nested collections for schema

* reuse logic for printing nested collections

* main logic for expanding nested collections to json, requires to be polished

* more testing for nested containers

* complete algo for printing nested collections in json format

* add testing json files to project

* generate json files option set to false

* run whole test suite

* test nested collections with links

* format checks

* Move out_mixed_json... functionality to Mixed class

* Remove not needed template parameter from CollectionBaseImpl

* Delegate to_json to collections

* remove commented code

* fix audit conflicts

---------

Co-authored-by: Jørgen Edelbo <jorgen.edelbo@mongodb.com>

* Simplify Obj::get_path()

* Store ref in ArrayMixed (#6565)

* Cleanup naming and consolidate update strategy
* Allow a Mixed containing a ref to be stores in ArrayMixed

* Actually store collection type in Mixed (#6583)

* Allow Dictionary to contain a collection (#6584)

* Make a template specializetion for Lst<Mixed>

* Allow Lst<Mixed> to contain a collections

* Streamlining interface (#6615)

Main change is that insert... will not return the created collection. This
has of course big influence on the test cases written for the old API.

Virtual interface for setting/getting nested collections created

* Api nested collections in OS (#6618)

Add interface on both object_store::Collection and the C API to handle collections in Mixed.
---------

Co-authored-by: Jørgen Edelbo <jorgen.edelbo@mongodb.com>

* Set interface nested collections (#6648)

* testing for set<mixed>

* c-api for nested sets

* fix Set constructor

* Get path from collection objects (#6636)

* Move NoOpTransactionLogParser to transact_log.hpp

* Add nested collection path in transaction log

* Optimize get_path()

Avoid having the first element in Path being a std::string

* Small fixes

* Make m_path private in sync::instr::Path

* Remove `set_string_compare_method` (#6668)

Partially based on 5f2dda1 Delete some obsolete cruft

set_string_compare_method() and everyhing related to it has never actually been
used by any SDK, and is not really the correct solution to the problem anyway.

Co-authored-by: Thomas Goyne <tg@realm.io>

* Replication of operations on nested collections

* Remove support for query over typed links in Dictionary

This feature is not exposed, and should not be done in the way
it was implemented.

* Use BPlusTree to hold backlinks (#6673)

Removes the limit on how many backlinks we can handle

* Add StablePath concept

* Collection in mixed notification support. (#6660)

* Add support in the notification machinery for nested collections.

* Avoid passing string parameters by value in KeyPathMapping interface

* Use uniform Path representation in query parser

We need to be able to handle a path that is just a sequence of strings
and integers. The strings can then either be a property name or a key
in a dictionary. Before we have known that the last entry in a path
would be a property name. We can't assume that anymore, so we just have
to follow links as long as that is possible. The rest must then be a path
to the wanted value.

We must also allow the syntax "dict.key" and dict["key"] to be used
interchangeably. A nested dictionary can be used in the same way as
an embedded object is used and so the syntax for querying on a specific
property should be the same.

* Support query on nested collections

This includes supporting using index in query on list of primitives

* Copy replication nested collections (#6714)

* copy replication for nested collections

* Remove support for TypedLinks in LinkTranslator

Removes some complexity/code. Is easy to re-introduce. Can be safely
removed if we disallow creating columns of this type. This can also
safely be done, as this feature is not yet used.

* Support typed links in nested collections

This is about the usual stuff:
 - When a link is inserted, make sure a backlink is created
 - When a link is cleared, make sure the backlink is removed.
 - When object containing a collection containing links is deleted
   make sure the backlinks are removed.
 - When the linked-to object is deleted the link should be nullified/removed.
 - When the linked-to object is made into a tombstone, the link should be updated.
 - When the linked-to object is recreated, the link should be restored.

* Handle exceptions thrown from Obj::get_collection_ref

* Support assigning a json string to a mixed property

Collect all to_json related functions in one compilation unit. Then
it will only be included in the final binary if used.

* Support having [*] as part of a path (#6741)

Allows you to consider all elements at some level.

* added check for set in mixed in the C API (#6764)

* Fix collection mismatch for `Set` in `Mixed`. (#6802)

* Allow TypedLinks to be part of path to property in queries

* Sorting stage 2 (#6669)

* Remove `set_string_compare_method`

Partially based on 5f2dda1 Delete some obsolete cruft

set_string_compare_method() and everyhing related to it has never actually been
used by any SDK, and is not really the correct solution to the problem anyway.

* strings are no longer equal to binaries

* parser supports bin(...) to differentiate binaries

* fix sectioned results

* fix formatting

* review updates

* Fix syntax

* review feedback changes

* fix reported UB

* lint

---------

Co-authored-by: Thomas Goyne <tg@realm.io>
Co-authored-by: Jørgen Edelbo <jorgen.edelbo@mongodb.com>

* Fix list type

* Syntactical sugar

* Throw if syncing a collection nested in Mixed

* Support indexing into link collections in Query (#6854)

* Support syncing nested Set

* Add missing support for getting Sets

* Return correct attachement state from nested collections (#6880)

Ensures that proper notifications on no longer existing collections are sent out

* small changes to the c api for collections in mixed (#6881)

* explicit insertion for collections in mixed and return the collection just inserted

* Make exceptions thrown by nested collections more consistent (#6875)

* Make information on the deletion of a collection available in C API (#6896)

* update set_collection for list and have an explicit function for each collection (#6900)

* Small changes

* Publish Obj::set_json in C API

* Check for stale accessors to a collction embedded directly in a Mixed property

Change the index held by the collection object from ColKey to a structure
(ColIndex) )containing both the index of the column and a key generated for
that particular collection. The key value is stored alongside the ref and
compared with the key value found in index when trying to obtain the ref
for the collection.

This commit includes review updates

* Fix freezing a nested collection

* Remove support for static nested collections

* Use more bits in ColIndex key

* Improve StringIndex::dump_node_structure

* Check for stale accessors to a collction embedded in a dictionary

Change the index held by the collection object from std::string to a structure
(KeyIndex) ) containing both the beginning of the dictionary key (mostly for
debugging purposes) and an index key generated for that particular collection.
The key value is stored alongside the ref and compared with the key value found
in index when trying to obtain the ref for the collection.

* Optimize StableIndex

* Refactor StringIndex interface (#6787)

* refactor StringIndex interface

optimize

* StringIndex has a virtual parent SearchIndex

* review feedback and fix a warning

* More consistent exception handling for nested collections

This commit fixes the problem that trying to access position 0 in a newly
created nested list would give an exception saying that the collection
was gone instead of an out-of-bounds exception. This was because we had a
test for attached before validating the index.

The solution selected is to remove "ensure_attached" and let the exceptions
thrown in "get_collection_ref" flow all the way to the client. This is kind of
fundamental change in that we must remove the noexcept specification from
"update_if_needed_with_status" and make the "init_from_parent" functions rethrow
the exceptions caught. The noexcept functions calling "update_if_..." must
add a try..catch block.

* Add ability to get collections from Results (#6948)

Co-authored-by: Nicola Cabiddu <nicola.cabiddu@mongodb.com>

* Fix compilation of RealmTrawler

* Logging mutations on tables (#6953)

To avoid having the same operation logged twice, the logging in instruction_applier
in removed.

* Simplify Logger class a bit

Logger::m_base_logger_ptr seems not to be used in the class itself.
The member is added to the sub-classes that need it.

get/set level_threshold need not be virtual is we remove support
for NullLogger.

* Limiting the output when logging large string and binary values (#6986)

* Introduce logging categories

* Sorting stage 3 (#6670)

* Add tests on BPlusTree upgrade

* change the sort order of strings

* Add test on upgrade of StringIndex and Set

* remove utf8_compare

* Add upgrade functionality

* Avoid string index upgrade

* Update test

* Move Set::do_resort() to .cpp file

* Generate test_upgrade_database_x_23.realm as on ARM

* Revert "Avoid string index upgrade"

This reverts commit 333982a.

* Fix upgrade logic for string index

- Only upgrade if char is signed
- Upgrade Mixed columns too

* memcmp is faster than std::lexi_cmp and doesn't require casting

* optimize: only compare strings once

* Upgrade of fulltext index not needed

* migrate sets of binaries and better migration test

* generate migration Realms on a signed platform

* fix lint

* avoid a string index migration by using linear search

---------

Co-authored-by: Jørgen Edelbo <jorgen.edelbo@mongodb.com>

* Upodate Package.swift

* Client Reset for collections in mixed / nested collections (#6766)

* Fix error after merge

* Fix issue using REALM_ENABLE_MEMDEBUG=On

* Logging of schema migrations

* Use logging categories (#7052)

* Logging notification activity

* Logging details when opening DB

* Fix warning

* Update bindgen to support logging categories

* Add cases handling Json::value_t::binary

* Log free space and history sizes when opening file

* Remove unused stuff

* Rearrange some code in Set<T>

The is to prepare for merge with master. It is more or less a cherry-pick
of commit bf5ffd3.

* Fix missing NullLogger

* Remove support for nested sets

* Fix warnings

* Remove type_LinkList and col_type_LinkList (#7114)

Should have been removed long time ago

* Index on list of strings (#7152)

* Prepare beta release

* Add path.hpp to installed headers

* Update release notes

* Allow keypath to be provided as argument (#7210)

* Remove set from realm_value_type for collections in mixed (#7245)

* Add support for collections in indexed mixed fields

* [C-API] Fix the return type of realm_set_collection (#7247)

* Fix the return type of realm_set_collection

* fix some leaking tests

* Simplify JSON functionality

* Don't leak implementation of BsonDocument and BsonArray to the users.

This is done by defining the interface explicitly and in a way that
makes it possible to easily change the underlying implementation.

* Throw when inserting an embedded object into a list of Mixed

* Fix queries on dictionaries in Mixed with @keys

* Optimize BsonDocument::find()

* Only output '_key': xxx when output mode is plain JSON

Fix 9169fa1

* Send notifitations about mutations on nested collections

* Restore correct expected json files

* Support querying for @SiZe on Mixed

This will make sense both for strings, binaries and nested collections.

* Support querying with @type on nested collections (#7288)

* Refactor ConstantNode::visit() (#7295)

This will allow us to use argument substitution in more places. In
particuler if TypeOfValue is expected.

The visit function in ConstantNode is split up i 2 steps. First the
value is extracted into a Mixed - that being directly from the query
string or from the arguments. Then the vaule is adapted to what is
needed based on the 'hint' parameter.

* Fix merge error in dependency list

* Fix using stringops query on nested collections

* Fix using ANY, NONE, ALL in query on Mixed property

* Remove LinkList (#7308)

* fix == NONE {x} queries (#7333)

* fix == NONE {x} queries

* more tests

* Fix app URI tests for baasaas (#7342)

* Mitigate races in accessing `m_initated` and `m_finalized` in various REALM_ASSERTs (#7338)

* Fix a TOCTOU race when copying Realm files

Checking if the destination exists before copying is a race condition as the
file can be created in between the check and the copy. Instead we should
attempt to copy without overwriting the target if it exists.

* Use clonefile() when possible in File::copy()

* Delete unused sync file action metadata fields

* Schema migration tests to use admin API rather than querying backing cluster (#7345)

* Add bson library (#7324)

Can be used as a reference implementation in tests.

* Fix Results nofitifation for changes to nested collections

* Use TestDirGuard where applicable

* Simplify session tests by consitently using TestSyncManager::fake_user()

* Separate TestSyncManager and OfflineAppSession

Co-authored-by: James Stone <james.stone@mongodb.com>

* Fix sync replication (#7343)

* Adjust CMake files to used by vcpkg (#7334)

* Fix SPM compilation errors (#7360)

Include paths for the tests are set up slightly different for the SPM build
from the CMake build.

* Prepare release

* Update release notes

* Prepare release

* Update release note

* Rewrite the "app: app destroyed during token refresh" test (#7363)

* Update to CHANGELOG

* Don't allow Core targets to be installed if submodule (#7379)

Co-authored-by: Kenneth Geisshirt <kenneth.geisshirt@mongodb.com>
Co-authored-by: Jørgen Edelbo <jorgen.edelbo@mongodb.com>

* Prepeare release

* Update release notes

* Do not populate list KVO information for non-list collections (#7378)

* Prevent opening files with file format 23 in read-only mode

* Don't update backlinks in Mixed self-assignment (#7384)

Setting a Mixed field to ObjLink equal to the current value removed the
existing backlink and then exited before adding the new one, leaving things in
an invalid state.

* Eliminate copies when accessing values from Bson types (#7377)

Returning things by value performs a deep copy, which is very expensive when
those things are also bson containers.

Re-align the naming with the convention names for the functions rather than
being weird and different.

* Use the correct allocator for queries on dictionaries over links (#7382)

The base table's allocator was being used to read from the target table.

* Bson object should hold binary data in decoded form

If you construct a Bson object from a std::vector<char>, the  extjson
streaming format should encode the binary data.

* Fix passing a double as argument to query on Decimal128 (#7387)

* Treat missing keys in dictionaries as null in queries (#7391)

* Treat missing keys in dictionaries as null in queries

* Fix test

---------

Co-authored-by: Jørgen Edelbo <jorgen.edelbo@mongodb.com>

* Allow using aggregate operations on Mixed properties in queries (#7398)

This is something which Cocoa and the query engine supports but the core query
parser did not.

* [bindgen] Enable support for collections in the `Mixed` data type (#7392)

* Adapt to breaking change.

* Expose APIs for flat collections in Mixed and add preparation for nested.

* Add and expose helper for getting the data type.

* Expose a data type enum that includes non-primitives.

The JS SDK needs this for checking types of Mixed.

* Update casting of enum constants.

* Replace access of 'm_type' with use of the 'int()' operator overload.

* Expose APIs for setting nested lists in Mixed.

* Expose APIs for setting nested dictionaries in Mixed.

* Expose APIs for getting nested lists in Mixed.

* Expose APIs for getting nested dictionaries in Mixed.

* Expose get_obj() on List and Set.

* Expose method for getting element type in List.

* Remove the need for element type helpers.

The JS SDK has managed to use sentinel values in the generated bindings instead.

* Remove unused header.

* Avoid doing unneeded logger work in Replication

Most of the replication log statements do some work including memory
allocations which are then thrown away if the log level it too high, so always
check the log level first. A few places don't actually benefit from this, but
it's easier to consistently check the log level every time.

* Prepare release

* RCORE-1990 Add X86 Windows Release builder to evergreen (#7383)

* Use the bfd linker in the armv7 toolchain (#7406)

* Fix several crashes in the object store benchmarks (#7403)

* add new index related benchmarks (#7401)

* Use updated curl on evergreen windows hosts (#7409)

* comment test not working + still missing handling for nested collections

* handle collection array for mixed types

* lint

* please windows builder warnings + x86

* proposed fix for 32-bit

* moved call outside assert macro

* Fix for 32 bit archs for encoded Arrays (#7427)

* tentative fix for 32 bit archs
* removed wrong cast to size_t from bf_iterator::set_value()
* fix inverted condition in 'unsigned_to_num_bits'
* fix inverted condition in 'unsigned_to_num_bits'

---------

Co-authored-by: Nicola Cabiddu <nicola.cabiddu@mongodb.com>

* lint

* Revert "lint"

This reverts commit 7ac0073.

* lint

---------

Co-authored-by: Jørgen Edelbo <jorgen.edelbo@mongodb.com>
Co-authored-by: James Stone <james.stone@mongodb.com>
Co-authored-by: Thomas Goyne <tg@realm.io>
Co-authored-by: Nikola Irinchev <irinchev@me.com>
Co-authored-by: Claus Rørbech <claus.rorbech@mongodb.com>
Co-authored-by: Kenneth Geisshirt <kenneth.geisshirt@mongodb.com>
Co-authored-by: Jonathan Reams <jbreams@mongodb.com>
Co-authored-by: Thomas Goyne <thomas.goyne@mongodb.com>
Co-authored-by: Lee Maguire <lee.maguire@mongodb.com>
Co-authored-by: LJ <81748770+elle-j@users.noreply.github.com>
Co-authored-by: Yavor Georgiev <fealebenpae@users.noreply.github.com>
Co-authored-by: Finn Schiermer Andersen <finn.schiermer.andersen@gmail.com>
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 28, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants