[WIP] allocation-less serialization. #20

tewinget · 2020-09-18T06:46:34Z

We need fast (and thus alloc-free) (de-)serialization. This is a step in that direction.

majestrate · 2020-09-18T10:21:11Z

lokimq/bt_serialize.h

+    }
+
+    template <typename T>
+    fixed_buffer_producer& append_integer(T number, std::string_view key)


the parameters are backwards

I did this because there is a version which doesn't take a key, but I'm happy to swap the order.

majestrate · 2020-09-18T11:45:52Z

lokimq/bt_serialize.h

+
+    void check_space(size_t len)
+    {
+        if (end <= begin or (size_t)(end - begin) < len)


use std::distance

noted, will update.

majestrate · 2020-09-18T11:46:20Z

lokimq/bt_serialize.h

+    }
+
+    // MUST check for space before calling this.
+    size_t append_size(size_t size)


use std::to_chars

noted, will update.

It should also be private as there is no externally valid use case for appending a random integer string into a bt value.

jagerman · 2020-09-18T14:09:57Z

lokimq/bt_serialize.h

+/// Currently this is not idiot-proof.  That is to say, the compiler won't catch everything and
+/// you technically can create invalid BEncoded strings.  Basically you can add list elements
+/// to a dict and dict elements to a list, and obviously that can end badly.  This may be fixed
+/// at a later time.


These /// comments are documentation; they really should describe the type in general before it describes idiots.

Yeah, fair cop. WIP indeed...

jagerman · 2020-09-18T14:14:03Z

lokimq/bt_serialize.h

+    // will be used for that as well.  As this is designed for strings, will optimize for small
+    // integers and won't work for values > 99999.
+    size_t num_digits(size_t value)
+    {


{ on the same line here and elsewhere to match the existing style. (Which I used because I've never been a fan of the wasted vertical line by putting it on the next line).

habit; changing.

jagerman · 2020-09-18T14:18:12Z

lokimq/bt_serialize.h

+/// you technically can create invalid BEncoded strings.  Basically you can add list elements
+/// to a dict and dict elements to a list, and obviously that can end badly.  This may be fixed
+/// at a later time.
+struct fixed_buffer_producer


Should be a class, not a struct. (Technically equivalent, but in intention class signals that it's primary purpose is to do something, while a struct signals the primary purpose is to hold something).

jagerman · 2020-09-18T14:24:54Z

lokimq/bt_serialize.h

+        if (value > 10) digits++;
+        if (value > 100) digits++;
+        if (value > 1000) digits++;
+        if (value > 10000) digits++;


These should all be >= instead of > (10 has 2 digits, 100 has 3, etc.). But also it would probably be faster to use else ifs. I'm also not a fan of the "won't work for values > 99999" bit. This should be nice and fast for the common case, but won't break for a big value:

size_t digits; if (value < 10) digits = 1; else if (value < 100) digits = 2; else if (value < 1000) digits = 3; else if (value < 10000) digits = 4; else { digits = 5; value /= 100000; while (value >= 10) { digits++; value /= 10; } }

why not just

while(value >= 10) { digits++; value /= 10; }

makes sense.

jagerman · 2020-09-18T14:28:55Z

lokimq/bt_serialize.h

+            *op++ = tmp;
+        }
+
+        *(begin++) = 'e';


Aside from the 'i' and the 'e' this looks identical to append_size()

jagerman · 2020-09-18T14:29:38Z

lokimq/bt_serialize.h

+    }
+
+    // MUST check for space before calling this.
+    size_t append_size(size_t size)


It should also be private as there is no externally valid use case for appending a random integer string into a bt value.

jagerman · 2020-09-18T14:37:42Z

lokimq/bt_serialize.h

@@ -909,5 +922,192 @@ class bt_dict_consumer : private bt_list_consumer {
    bt_dict_consumer consume_dict_consumer() { return consume_dict_data(); }
 };

+/// Currently this is not idiot-proof.  That is to say, the compiler won't catch everything and
+/// you technically can create invalid BEncoded strings.  Basically you can add list elements
+/// to a dict and dict elements to a list, and obviously that can end badly.  This may be fixed


I find this interface a bit clunky because of this: the caller has to worry about not calling the wrong method to screw things up, which means the caller has to know the technical details of bt_dict. It would be cleaner to split them up into bt_buffer_list_producer/bt_buffer_dict_producer to mirror bt_dict_consumer/bt_list_consumer (perhaps with a common private base class for the implementation functions), where things like append_integer(T) are only in the list version and append_integer(string_view, T) are only in the dict version.

Otherwise you basically have two distinct functionalities in one single class: there are one set of methods that you may only call if using it as a list, and another set that you may only call if using it as a dict, and this feels wrong.

jagerman · 2020-09-18T14:39:59Z

lokimq/bt_serialize.h

+    char* begin;
+    char* end;
+
+    char* original_begin;


The dict version also needs to have a std::string_view last_key pointing at the already-written previous key value so that it can throw if you try to add out of order.

jagerman · 2020-09-18T15:04:13Z

lokimq/bt_serialize.h

+        *(begin++) = 'd';
+        end--;
+        return *this;
+    }


I'm not a fan of the way this works because the caller will have a fixed_buffer_producer, but will also have a bunch of state associated with it that they have to keep track of -- is it making a dict or a list? Have I written a key yet? Do I have a sublist/subdict active?

One idea to make this better is to have RAII subclasses for nested types:

// Existing classes reworked a bit: class fixed_buffer_base { protected: ... int nesting_ = 0; void end_nesting(char* new_begin) { begin = new_begin; } virtual void begin_nesting() { nesting_++; } friend class nested_dict; friend class nested_list; }; class fixed_dict_buffer : public fixed_buffer_base { ... nested_dict begin_dict(std::string_view key) { // check nesting_ == 0 // check minimum size available for key + empty dict // check key order // write key // update last_key // (all of the above should just be some method since they'll be identical checks for every kv pair) return {*this}; } }; class fixed_list_buffer : public fixed_buffer_base { ... }; // New nested classes: class nested_dict : fixed_dict_buffer { fixed_buffer_base& b; nested_dict(fixed_buffer_base& b) : b{b} { // Copy buffer pointers from b b.begin_nesting(); b.begin_dict(); } ~nested_dict() { b.end_dict(); b.end_nesting(begin); } void begin_nesting() override { nesting_++; b.begin_nesting(); } }; class nested_list : fixed_list_buffer { fixed_buffer_base& b; nested_dict(fixed_buffer_base& b) : b{b} { b.nesting_++; b.begin_list(); } ~nested_list() { b.end_list(); b.end_nesting(begin); } void begin_nesting() override { nesting_++; b.begin_nesting(); } };

Now the caller does something like:

fixed_dict_buffer foo{start, end}; foo.append_string("foo", "bar"); { auto x = foo.begin_dict("key1"); // foo.append_string("xyz", "abc"); -- would throw because there is an open nested structure x.append_string("key2", "xyz"); { auto y = x.begin_list("key3"); y.append_integer(42); { auto z = y.begin_dict(); // .. etc. } } } foo.append_string("xyz", "abc");

a potential bug in that example code: the nested_dict and nested_list destructors may throw

True; it should probably fiddle with the end pointer (i.e. shorten it by one) so that there is always space to write the 'e' during destruction.

jagerman · 2021-11-28T14:35:43Z

This PR is superseded by the bt_dict_producer/bt_list_producer added to https://github.com/oxen-io/oxen-encoding (which is meant to replace the encoding code here).

initial commit for fixed-size buffer serialization

572a0a2

tewinget force-pushed the fixed_buffer_serialization branch from a0da073 to 572a0a2 Compare September 18, 2020 08:54

majestrate reviewed Sep 18, 2020

View reviewed changes

jagerman changed the base branch from master to dev September 18, 2020 14:07

jagerman reviewed Sep 18, 2020

View reviewed changes

jagerman force-pushed the dev branch from b2b2e09 to a53e1f1 Compare October 25, 2021 16:09

jagerman closed this Nov 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] allocation-less serialization. #20

[WIP] allocation-less serialization. #20

tewinget commented Sep 18, 2020

majestrate Sep 18, 2020

tewinget Sep 18, 2020

majestrate Sep 18, 2020

tewinget Sep 18, 2020

majestrate Sep 18, 2020

tewinget Sep 18, 2020

jagerman Sep 18, 2020

jagerman Sep 18, 2020

tewinget Sep 21, 2020

jagerman Sep 18, 2020

tewinget Sep 21, 2020

jagerman Sep 18, 2020

jagerman Sep 18, 2020

majestrate Sep 18, 2020

tewinget Sep 21, 2020

jagerman Sep 18, 2020

jagerman Sep 18, 2020

jagerman Sep 18, 2020

jagerman Sep 18, 2020

jagerman Sep 18, 2020 •

edited

majestrate Sep 21, 2020

jagerman Sep 21, 2020

jagerman commented Nov 28, 2021

[WIP] allocation-less serialization. #20

[WIP] allocation-less serialization. #20

Conversation

tewinget commented Sep 18, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jagerman Sep 18, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jagerman commented Nov 28, 2021

jagerman Sep 18, 2020 •

edited