Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] allocation-less serialization. #20

Closed
wants to merge 1 commit into from

Conversation

tewinget
Copy link
Collaborator

We need fast (and thus alloc-free) (de-)serialization. This is a step in that direction.

}

template <typename T>
fixed_buffer_producer& append_integer(T number, std::string_view key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the parameters are backwards

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this because there is a version which doesn't take a key, but I'm happy to swap the order.


void check_space(size_t len)
{
if (end <= begin or (size_t)(end - begin) < len)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use std::distance

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noted, will update.

}

// MUST check for space before calling this.
size_t append_size(size_t size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use std::to_chars

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noted, will update.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should also be private as there is no externally valid use case for appending a random integer string into a bt value.

@jagerman jagerman changed the base branch from master to dev September 18, 2020 14:07
/// Currently this is not idiot-proof. That is to say, the compiler won't catch everything and
/// you technically can create invalid BEncoded strings. Basically you can add list elements
/// to a dict and dict elements to a list, and obviously that can end badly. This may be fixed
/// at a later time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These /// comments are documentation; they really should describe the type in general before it describes idiots.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, fair cop. WIP indeed...

// will be used for that as well. As this is designed for strings, will optimize for small
// integers and won't work for values > 99999.
size_t num_digits(size_t value)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{ on the same line here and elsewhere to match the existing style. (Which I used because I've never been a fan of the wasted vertical line by putting it on the next line).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

habit; changing.

/// you technically can create invalid BEncoded strings. Basically you can add list elements
/// to a dict and dict elements to a list, and obviously that can end badly. This may be fixed
/// at a later time.
struct fixed_buffer_producer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be a class, not a struct. (Technically equivalent, but in intention class signals that it's primary purpose is to do something, while a struct signals the primary purpose is to hold something).

if (value > 10) digits++;
if (value > 100) digits++;
if (value > 1000) digits++;
if (value > 10000) digits++;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should all be >= instead of > (10 has 2 digits, 100 has 3, etc.). But also it would probably be faster to use else ifs. I'm also not a fan of the "won't work for values > 99999" bit. This should be nice and fast for the common case, but won't break for a big value:

size_t digits;
if (value < 10) digits = 1;
else if (value < 100) digits = 2;
else if (value < 1000) digits = 3;
else if (value < 10000) digits = 4;
else {
    digits = 5;
    value /= 100000;
    while (value >= 10) { digits++; value /= 10; }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just

while(value >= 10) { digits++; value /= 10; }

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense.

*op++ = tmp;
}

*(begin++) = 'e';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from the 'i' and the 'e' this looks identical to append_size()

}

// MUST check for space before calling this.
size_t append_size(size_t size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should also be private as there is no externally valid use case for appending a random integer string into a bt value.

@@ -909,5 +922,192 @@ class bt_dict_consumer : private bt_list_consumer {
bt_dict_consumer consume_dict_consumer() { return consume_dict_data(); }
};

/// Currently this is not idiot-proof. That is to say, the compiler won't catch everything and
/// you technically can create invalid BEncoded strings. Basically you can add list elements
/// to a dict and dict elements to a list, and obviously that can end badly. This may be fixed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this interface a bit clunky because of this: the caller has to worry about not calling the wrong method to screw things up, which means the caller has to know the technical details of bt_dict. It would be cleaner to split them up into bt_buffer_list_producer/bt_buffer_dict_producer to mirror bt_dict_consumer/bt_list_consumer (perhaps with a common private base class for the implementation functions), where things like append_integer(T) are only in the list version and append_integer(string_view, T) are only in the dict version.

Otherwise you basically have two distinct functionalities in one single class: there are one set of methods that you may only call if using it as a list, and another set that you may only call if using it as a dict, and this feels wrong.

char* begin;
char* end;

char* original_begin;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dict version also needs to have a std::string_view last_key pointing at the already-written previous key value so that it can throw if you try to add out of order.

*(begin++) = 'd';
end--;
return *this;
}
Copy link
Member

@jagerman jagerman Sep 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of the way this works because the caller will have a fixed_buffer_producer, but will also have a bunch of state associated with it that they have to keep track of -- is it making a dict or a list? Have I written a key yet? Do I have a sublist/subdict active?

One idea to make this better is to have RAII subclasses for nested types:

// Existing classes reworked a bit:
class fixed_buffer_base {
protected:
    ...

    int nesting_ = 0;
    void end_nesting(char* new_begin) { begin = new_begin; }
    virtual void begin_nesting() { nesting_++; }
    friend class nested_dict;
    friend class nested_list;
};
class fixed_dict_buffer : public fixed_buffer_base {
    ...
    nested_dict begin_dict(std::string_view key) {
        // check nesting_ == 0
        // check minimum size available for key + empty dict
        // check key order
        // write key
        // update last_key
        // (all of the above should just be some method since they'll be identical checks for every kv pair)

        return {*this};
    }
};
class fixed_list_buffer : public fixed_buffer_base { ... };

// New nested classes:
class nested_dict : fixed_dict_buffer {
    fixed_buffer_base& b;
    nested_dict(fixed_buffer_base& b) : b{b} {
        // Copy buffer pointers from b
        b.begin_nesting();
        b.begin_dict();
    }
    ~nested_dict() {
        b.end_dict();
        b.end_nesting(begin);
    }
    void begin_nesting() override { nesting_++; b.begin_nesting(); }
};
class nested_list : fixed_list_buffer {
    fixed_buffer_base& b;
    nested_dict(fixed_buffer_base& b) : b{b} {
        b.nesting_++;
        b.begin_list();
    }
    ~nested_list() {
        b.end_list();
        b.end_nesting(begin);
    }
    void begin_nesting() override { nesting_++; b.begin_nesting(); }
};

Now the caller does something like:

fixed_dict_buffer foo{start, end};
foo.append_string("foo", "bar");
{
    auto x = foo.begin_dict("key1");
    // foo.append_string("xyz", "abc");  -- would throw because there is an open nested structure
    x.append_string("key2", "xyz");
    {
        auto y = x.begin_list("key3");
        y.append_integer(42);
        {
            auto z = y.begin_dict();
            // .. etc.
        }
     }
}
foo.append_string("xyz", "abc");

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a potential bug in that example code: the nested_dict and nested_list destructors may throw

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True; it should probably fiddle with the end pointer (i.e. shorten it by one) so that there is always space to write the 'e' during destruction.

@jagerman
Copy link
Member

This PR is superseded by the bt_dict_producer/bt_list_producer added to https://github.com/oxen-io/oxen-encoding (which is meant to replace the encoding code here).

@jagerman jagerman closed this Nov 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants