Need a means of sharing configuration data between objects #6644

mrambacher · 2020-04-03T20:11:58Z

The Options interfaces allow objects to be serialized to strings and stored in initialization files. Each object serializes and stores its objects individually.

However, some objects may potentially be shared between these configured objects. For example, there may be one block cache shared across the multiple table factories in the system. There is currently no mechanism of representing these shared configurations via the options file.

I would propose the following:
-> The ObjectRegistry be expanded to hold "configurations". Configurations would have an "ID", "Type" (Environment, Cache, etc), an "options string" representing the serialized form of this object, and a weak_ptr.
-> When the Customizable class goes to create a new object (CreateFromString), it would first look for it in the ObjectRegistry. If the ID is found there, the one in the registry is used. If not, a new object is created as today. Note that CreateFromString for each type that is registered would need to follow this pattern.
-> When a Customizable object is converted to a string, it sees if it is in the ObjectRegistry. If so, only the ID is written (and not the entire configuration).
-> Unique IDs can be generated for objects added to the Registry by combining the Name() of the object with its address (e.g. LRUCache@0xdeadbeef).
-> The configurations will be serialized and written to the options file. This will allow them to be re-established on a restart.

Classes that would benefit from this sort of registry include Cache, MemoryAllocator, Logger, and Env (there may be others but this is the obvious first pass list).

One question is when would objects be registered and unregistered. Is it done automatically or is there another step (API call) a developer would be required to take? How would a configuration in the registry be updated?

pdillinger · 2020-11-17T21:50:43Z

I'm concerned that the factory interface

// Returns a new T when called with a string. Populates the std::unique_ptr
// argument if granting ownership to caller.
template <typename T>
using FactoryFunc =
    std::function<T*(const std::string&, std::unique_ptr<T>*, std::string*)>;

requires any shared objects to live forever. Or more precisely: exactly as long as the factory. If the factory keeps ownership, it doesn't know that it's ever safe to deallocate (before its own deallocation); alternately, the factory cannot share the object if it passes on ownership.

This suggests a shared_ptr+weak_ptr solution. Though it might seem heavy-handed, memory leaks can be nightmares for RocksDB users.

A possible alternative would be requiring a compile-time name to be used to (uniquely) identify a shared object. This effectively limits their number.

mrambacher · 2021-08-25T17:51:31Z

This issue is being addressed in #8658 by the addition of ManagedObjects. Here is the basic premise:

The ObjectRegistry maintains a list of "ManagedObjects" by "Name" and "Type" in a map<type+name, weak_ptr >
The GetManagedObject returns the shared_ptr to the object [type+name] in the map if one exists
The SetManagedObject associates the [type+name] with the object in the map if the relationship does not already exist
The NewManagedObject returns the object in the map (if found) or creates the object and stores it in the map (if it does not already exist).

ManagedObjects are stored as weak/shared pointers and must be an extension of Customizable. By using weak pointers, if an object associated with a name is deleted, the map entry still exists but the object does not. This distinction means that f an object is destroyed, it will no longer be Managed and the name can be re-used/point to a new object.

Here is an example of how ManagedObjects can be used to store and retrieve a Cache from the Options file.

By convention, the ID of a ManagedObject should be something like @. For a Cache, this name will be something like "LRUCache@0xdeadbeef#pid", where "deadbeef" is the address of the cache in memory and PID is the process ID of the process.
When Cache::CreateFromString is called to create a Cache, it will be passed an ID of the Cache to create. For the purposes of this discussion, the ID will either be "LRUCache" or "LRUCache@0xdeadbeef#pid.
-- If Cache::CreateFromString is passed the simple name ("LRUCache"), a new cache will be created and added as a ManagedObject to the ObjectRegistry. Because this is a new Cache, it will be guaranteed to be unique (the addr/pid will not refer to any existing object) and will be added (SetManagedObject) to the registry.
-- If Cache::CreateFromString is passed the full name (LRUCache@0xdeadbeef#pid). this name will first be looked up in the ObjectRegistry. If it exists, the existing object will be returned. If not, a new Cache will be created and returned and registered under the old and new names (two ManagedObject names referring to the same Cache object). Note that the Cache itself only knows its current name/ID and the ObjectRegistry is the only association between the old name and the new object.

When reading in Options from a string/properties file, if there are two Caches with the same "long" ID, these two objects will share the same cache object.

Note that this model does not require that ManagedObjects in the registry be serialized/saved independently. It is the responsibility of each "Type" (Cache, etc) to determine that it is a "ManagedObject" and interact with the ObjectRegistry as appropriate. In fact, the ManagedObject list is not required during serialization. Furthermore, the ManagedObject does not itself know of its registration in an ObjectRegistry.

Note that if two independent ObjectRegistry objects are used, then the same name in both registries may refer to different objects. For example, if a TableFactory is serialized with ObjectRegistryA and re-created with the same ObjectRegistryA, then the caches in the two table factories will be the same instance. However, if the serialization happens with ObjectRegistryA and the re-creation with ObjectRegistryB, then the caches in the two table factories may be different (to guarantee they are the same, the Cache should be registered with the "Default" ObjectRegistry).

mrambacher linked a pull request Aug 13, 2021 that will close this issue

Add support to the ObjectRegistry for ManagedObjects #8658

Closed

mrambacher self-assigned this Aug 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need a means of sharing configuration data between objects #6644

Need a means of sharing configuration data between objects #6644

mrambacher commented Apr 3, 2020

pdillinger commented Nov 17, 2020

mrambacher commented Aug 25, 2021

Need a means of sharing configuration data between objects #6644

Need a means of sharing configuration data between objects #6644

Comments

mrambacher commented Apr 3, 2020

pdillinger commented Nov 17, 2020

mrambacher commented Aug 25, 2021