New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] allocation-less serialization. #20
Conversation
a0da073
to
572a0a2
Compare
} | ||
|
||
template <typename T> | ||
fixed_buffer_producer& append_integer(T number, std::string_view key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the parameters are backwards
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this because there is a version which doesn't take a key, but I'm happy to swap the order.
|
||
void check_space(size_t len) | ||
{ | ||
if (end <= begin or (size_t)(end - begin) < len) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use std::distance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noted, will update.
} | ||
|
||
// MUST check for space before calling this. | ||
size_t append_size(size_t size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use std::to_chars
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noted, will update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should also be private as there is no externally valid use case for appending a random integer string into a bt value.
/// Currently this is not idiot-proof. That is to say, the compiler won't catch everything and | ||
/// you technically can create invalid BEncoded strings. Basically you can add list elements | ||
/// to a dict and dict elements to a list, and obviously that can end badly. This may be fixed | ||
/// at a later time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These /// comments are documentation; they really should describe the type in general before it describes idiots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, fair cop. WIP indeed...
// will be used for that as well. As this is designed for strings, will optimize for small | ||
// integers and won't work for values > 99999. | ||
size_t num_digits(size_t value) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{
on the same line here and elsewhere to match the existing style. (Which I used because I've never been a fan of the wasted vertical line by putting it on the next line).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
habit; changing.
/// you technically can create invalid BEncoded strings. Basically you can add list elements | ||
/// to a dict and dict elements to a list, and obviously that can end badly. This may be fixed | ||
/// at a later time. | ||
struct fixed_buffer_producer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be a class
, not a struct
. (Technically equivalent, but in intention class
signals that it's primary purpose is to do something, while a struct signals the primary purpose is to hold something).
if (value > 10) digits++; | ||
if (value > 100) digits++; | ||
if (value > 1000) digits++; | ||
if (value > 10000) digits++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should all be >= instead of > (10 has 2 digits, 100 has 3, etc.). But also it would probably be faster to use else if
s. I'm also not a fan of the "won't work for values > 99999" bit. This should be nice and fast for the common case, but won't break for a big value:
size_t digits;
if (value < 10) digits = 1;
else if (value < 100) digits = 2;
else if (value < 1000) digits = 3;
else if (value < 10000) digits = 4;
else {
digits = 5;
value /= 100000;
while (value >= 10) { digits++; value /= 10; }
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just
while(value >= 10) { digits++; value /= 10; }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense.
*op++ = tmp; | ||
} | ||
|
||
*(begin++) = 'e'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from the 'i' and the 'e' this looks identical to append_size()
} | ||
|
||
// MUST check for space before calling this. | ||
size_t append_size(size_t size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should also be private as there is no externally valid use case for appending a random integer string into a bt value.
@@ -909,5 +922,192 @@ class bt_dict_consumer : private bt_list_consumer { | |||
bt_dict_consumer consume_dict_consumer() { return consume_dict_data(); } | |||
}; | |||
|
|||
/// Currently this is not idiot-proof. That is to say, the compiler won't catch everything and | |||
/// you technically can create invalid BEncoded strings. Basically you can add list elements | |||
/// to a dict and dict elements to a list, and obviously that can end badly. This may be fixed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this interface a bit clunky because of this: the caller has to worry about not calling the wrong method to screw things up, which means the caller has to know the technical details of bt_dict. It would be cleaner to split them up into bt_buffer_list_producer/bt_buffer_dict_producer to mirror bt_dict_consumer/bt_list_consumer (perhaps with a common private base class for the implementation functions), where things like append_integer(T)
are only in the list version and append_integer(string_view, T)
are only in the dict version.
Otherwise you basically have two distinct functionalities in one single class: there are one set of methods that you may only call if using it as a list, and another set that you may only call if using it as a dict, and this feels wrong.
char* begin; | ||
char* end; | ||
|
||
char* original_begin; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dict version also needs to have a std::string_view last_key
pointing at the already-written previous key value so that it can throw if you try to add out of order.
*(begin++) = 'd'; | ||
end--; | ||
return *this; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of the way this works because the caller will have a fixed_buffer_producer
, but will also have a bunch of state associated with it that they have to keep track of -- is it making a dict or a list? Have I written a key yet? Do I have a sublist/subdict active?
One idea to make this better is to have RAII subclasses for nested types:
// Existing classes reworked a bit:
class fixed_buffer_base {
protected:
...
int nesting_ = 0;
void end_nesting(char* new_begin) { begin = new_begin; }
virtual void begin_nesting() { nesting_++; }
friend class nested_dict;
friend class nested_list;
};
class fixed_dict_buffer : public fixed_buffer_base {
...
nested_dict begin_dict(std::string_view key) {
// check nesting_ == 0
// check minimum size available for key + empty dict
// check key order
// write key
// update last_key
// (all of the above should just be some method since they'll be identical checks for every kv pair)
return {*this};
}
};
class fixed_list_buffer : public fixed_buffer_base { ... };
// New nested classes:
class nested_dict : fixed_dict_buffer {
fixed_buffer_base& b;
nested_dict(fixed_buffer_base& b) : b{b} {
// Copy buffer pointers from b
b.begin_nesting();
b.begin_dict();
}
~nested_dict() {
b.end_dict();
b.end_nesting(begin);
}
void begin_nesting() override { nesting_++; b.begin_nesting(); }
};
class nested_list : fixed_list_buffer {
fixed_buffer_base& b;
nested_dict(fixed_buffer_base& b) : b{b} {
b.nesting_++;
b.begin_list();
}
~nested_list() {
b.end_list();
b.end_nesting(begin);
}
void begin_nesting() override { nesting_++; b.begin_nesting(); }
};
Now the caller does something like:
fixed_dict_buffer foo{start, end};
foo.append_string("foo", "bar");
{
auto x = foo.begin_dict("key1");
// foo.append_string("xyz", "abc"); -- would throw because there is an open nested structure
x.append_string("key2", "xyz");
{
auto y = x.begin_list("key3");
y.append_integer(42);
{
auto z = y.begin_dict();
// .. etc.
}
}
}
foo.append_string("xyz", "abc");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a potential bug in that example code: the nested_dict and nested_list destructors may throw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True; it should probably fiddle with the end pointer (i.e. shorten it by one) so that there is always space to write the 'e' during destruction.
This PR is superseded by the |
We need fast (and thus alloc-free) (de-)serialization. This is a step in that direction.