Skip to content

Conversation

eramongodb
Copy link
Contributor

@eramongodb eramongodb commented Sep 26, 2025

Related to CXX-3320. This PR introduces a new helper utility for internal use only to manage bson_t objects and the creation of BSON documents for internal implementation: mongocxx::scoped_bson and mongocxx::scoped_bson_view (a la std::string and std::string_view). These two classes replace mongocxx::libbson::scoped_bson_t (note: not to be confused with scoped_bson_value for bson_value_t). The v_noabi implementation is refactored to use the new scoped_bson(_view) utilities to verify its correctness + behavior in advance of its widespread use in mongocxx::v1 implementations.


The scoped_bson_t class is very old, going back all the way to the first commit (Jan 2015). Unfortunately it has been confusing and difficult to use due to preserving too much bson_t idiosyncracies in its API, in particular its handling of the"uninitialized bson_t" state (e.g. .init_from_static()). Its initially simple and primitive API has been expanded over the years to support integration with bsoncxx and mongocxx API, often in tension with bson_t behaviors (e.g. owning vs. non-owning and uninitialized vs. empty vs. moved-from). This PR proposes an alternative API which abandons the "uninitialized bson_t" state entirely and always preserves bsoncxx-idiomatic well-defined state.

First, both scoped_bson and scoped_bson_view completely avoid depending on bson_t for ownership or managment of underlying BSON data. No operations depend on the bson_t aside from exposing bson_t const* for mongoc API integration. Instead, the bsoncxx::v1::document::value or bsoncxx::v1::document::view data member always own and manage associated BSON data. The bson_t data member is always and only ever initialized with bson_init_static() via this->sync_bson() after the bsoncxx value/view object has been updated first. Any applicable "would-be-uninitialized", "null", or "invalid" states are represented as an "invalid" document (.data() == nullptr) per v1 API's new well-defined "invalid" state for value and view.

Important

There is no "uninitialized bson_t" state exposed by the scoped_bson(_view) API.

Second, the owning scoped_bson class does its best to avoid unnecessary allocations. This includes using bson_destroy_with_steal() to acquire ownership of BSON data from bson_t* arguments, (re)using bsoncxx::v1::document::view's special empty state whenever an empty document is detected to avoid allocating empty bson_t objects, and releasing ownership of underlying BSON data as bsoncxx::v1::document::value with rvalue overloads.

Third, scoped_bson(_view) provide .out_ptr() and .inout_ptr() to interface with mongoc API that requires bson_t* and bson_t const** arguments. This pattern is inspired by P1132R8, although it has been greatly simplified to suit our limited purposes. This ensures the scoped_bson(_view) object always remains in a valid, well-defined state regardless of the result of the mongoc API call.

Last, scoped_bson implements an operator+= overload to minimally yet efficiently support building complex BSON documents in v1 implementations without depending on the v_noabi builder API. The "concatenation" operation is ultimately the only builder operation needed to build internal BSON documents: all other BSON builder behavior can be handled by the BCON_* family of macros, whose resulting bson_t* is (typically) immediately used to initialize a scoped_bson object:

scoped_bson doc{BCON_NEW("x", BCON_INT32(1))};

if (cond) {
  doc += scoped_bson{BCON_NEW("y", BCON_INT32(2))};
}

doc.view(); // {"x": 1} or {"x": 1, "y": 2}

This will be the primary method by which the v1 implementation constructs BSON documents for internal use until a new public BSON builder API is implemented with CXX-3275. A relaxed extended JSON constructor is also implemented (e.g. scoped_bson{R"("x": 1)"}) , but it is primarily for test convenience only. Implementation code should prefer to avoid string parsing performance overhead and should prefer to be explicit about the types of BSON element being constructed.

Note

Update: this problem is documented by CDRIVER-6113.

There was an attempt to default the scoped_bson_view copy SMFs and the scoped_bson move SMFs. This should be possible in theory given bson_t is only ever a view to the accompanying value or view object's underlying BSON data. However, this led very difficult-to-diagnose undefined behavior. Resulting spurious segmentation faults include stack traces pointing to std::function<void (unsigned char*)>::function(std::function<void (unsigned char*)>&&) with this=0.0 or the __memcmp_avx2_movbe intrinsic operation (???). These segfaults were unobservable when sanitizers are enabled (?!). Neither _DFORTIFY_SOURCE=3 nor GLIBCXX_ASSERTIONS were able to identify any problems. My takeaway is that bson_t is not trivially-copyable even when only ever zero-initialized or initialized with bson_init_static().

@eramongodb eramongodb self-assigned this Sep 26, 2025
@eramongodb eramongodb requested a review from a team as a code owner September 26, 2025 21:19
Copy link
Collaborator

@kevinAlbs kevinAlbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I very much like the new design of scoped_bson and scoped_bson_view. One question about possibly throwing.

@eramongodb eramongodb requested a review from kevinAlbs October 3, 2025 15:20
@eramongodb eramongodb requested a review from connorsmacd October 3, 2025 16:27
Copy link
Contributor

@connorsmacd connorsmacd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@eramongodb eramongodb merged commit 3500cbb into mongodb:master Oct 3, 2025
15 of 16 checks passed
@eramongodb eramongodb deleted the cxx-scoped_bson branch October 3, 2025 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants