New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
types: de/serialization: work in terms of fragmented buffers #6138
Labels
area/internals
an issue which refers to some internal class or something which has little exposure to users and is
feature/enhancement
Milestone
Comments
slivne
added
the
area/internals
an issue which refers to some internal class or something which has little exposure to users and is
label
Apr 13, 2020
michoecho
added a commit
to michoecho/scylla
that referenced
this issue
Oct 30, 2020
michoecho
added a commit
to michoecho/scylla
that referenced
this issue
Oct 30, 2020
michoecho
added a commit
to michoecho/scylla
that referenced
this issue
Nov 20, 2020
avikivity
added a commit
that referenced
this issue
Dec 3, 2020
…Chojnowski Citing #6138: > In the past few years we have converted most of our codebase to work in terms of fragmented buffers, instead of linearised ones, to help avoid large allocations that put large pressure on the memory allocator. > One prominent component that still works exclusively in terms of linearised buffers is the types hierarchy, more specifically the de/serialization code to/from CQL format. Note that for most types, this is the same as our internal format, notable exceptions are non-frozen collections and user types. > > Most types are expected to contain reasonably small values, but texts, blobs and especially collections can get very large. Since the entire hierarchy shares a common interface we can either transition all or none to work with fragmented buffers. This series gets rid of intermediate linearizations in deserialization. The next steps are removing linearizations from serialization, validation and comparison code. Series summary: - Fix a bug in `fragmented_temporary_buffer::view::remove_prefix`: 9bceaac (Discovered while testing. Since it wasn't discovered earlier, I guess it doesn't occur in any code path in master.) - Add a `FragmentedView` concept to allow uniform handling of various types of fragmented buffers (`bytes_view`, `temporary_fragmented_buffer::view`, `ser::buffer_view` and likely `managed_bytes_view` in the future): bc4db6f - Implement `FragmentedView` for relevant fragmented buffer types: b423581..d352434 - Add helper functions for reading from `FragmentedView`: 82fde0f and 82bd7f4 - Switch `deserialize()` and all its helpers from `bytes_view` to `FragmentedView`: 30e9b42..3d07611 - Remove `with_linearized()` calls which just became unnecessary: 09e464d..47b963b The addition of `FragmentedView` might be controversial, because another concept meant for the same purpose - `FragmentRange` - is already used. Unfortunately, it lacks the functionality we need. The main (only?) thing we want to do with a fragmented buffer is to extract a prefix from it and `FragmentRange` gives us no way to do that, because it's immutable by design. We can work around that by wrapping it into a mutable view which will track the offset into the immutable `FragmentRange`, and that's exactly what `linearizing_input_stream` is. But it's wasteful. `linearizing_input_stream` is a heavy type, unsuitable for passing around as a view - it stores a pair of fragment iterators, a fragment view and a size (11 words) to conform to the iterator-based design of `FragmentRange`, when one fragment iterator (4 words) already contains all needed state, just hidden. I suggest we replace `FragmentRange` with `FragmentedView` (or something similar) altogether. Refs: #6138 Closes #7692 * github.com:scylladb/scylla: types: collection: add an optimization for single-fragment buffers in deserialize types: add an optimization for single-fragment buffers in deserialize cql3: tuples: don't linearize in in_value::from_serialized cql3: expr: expression: replace with_linearize with linearized cql3: constants: remove unneeded uses of with_linearized cql3: update_parameters: don't linearize in prefetch_data_builder::add_cell cql3: lists: remove unneeded use of with_linearized query-result-set: don't linearize in result_set_builder::deserialize types: remove unneeded collection deserialization overloads types: switch collection_type_impl::deserialize from bytes_view to FragmentedView cql3: sets: don't linearize in value::from_serialized cql3: lists: don't linearize in value::from_serialized cql3: maps: don't linearize in value::from_serialized types: remove unused deserialize_aux types: deserialize: don't linearize tuple elements types: deserialize: don't linearize collection elements types: switch deserialize from bytes_view to FragmentedView types: deserialize tuple types from FragmentedView types: deserialize set type from FragmentedView types: deserialize map type from FragmentedView types: deserialize list type from FragmentedView types: add FragmentedView versions of read_collection_size and read_collection_value types: deserialize varint type from FragmentedView types: deserialize floating point types from FragmentedView types: deserialize decimal type from FragmentedView types: deserialize duration type from FragmentedView types: deserialize IP address types from FragmentedView types: deserialize uuid types from FragmentedView types: deserialize timestamp type from FragmentedView types: deserialize simple date type from FragmentedView types: deserialize time type from FragmentedView types: deserialize boolean type from FragmentedView types: deserialize integer types from FragmentedView types: deserialize string types from FragmentedView types: remove unused read_simple_opt types: implement read_simple* versions for FragmentedView utils: fragmented_temporary_buffer: implement FragmentedView for view utils: fragment_range: add single_fragmented_view serializer: implement FragmentedView for buffer_view utils: fragment_range: add linearized and with_linearized for FragmentedView utils: fragment_range: add FragmentedView utils: fragmented_temporary_buffer: fix view::remove_prefix
psarna
added a commit
that referenced
this issue
Dec 4, 2020
…Chojnowski Citing #6138: > In the past few years we have converted most of our codebase to work in terms of fragmented buffers, instead of linearised ones, to help avoid large allocations that put large pressure on the memory allocator. > One prominent component that still works exclusively in terms of linearised buffers is the types hierarchy, more specifically the de/serialization code to/from CQL format. Note that for most types, this is the same as our internal format, notable exceptions are non-frozen collections and user types. > > Most types are expected to contain reasonably small values, but texts, blobs and especially collections can get very large. Since the entire hierarchy shares a common interface we can either transition all or none to work with fragmented buffers. This series gets rid of intermediate linearizations in deserialization. The next steps are removing linearizations from serialization, validation and comparison code. Series summary: - Fix a bug in `fragmented_temporary_buffer::view::remove_prefix`. (Discovered while testing. Since it wasn't discovered earlier, I guess it doesn't occur in any code path in master.) - Add a `FragmentedView` concept to allow uniform handling of various types of fragmented buffers (`bytes_view`, `temporary_fragmented_buffer::view`, `ser::buffer_view` and likely `managed_bytes_view` in the future). - Implement `FragmentedView` for relevant fragmented buffer types. - Add helper functions for reading from `FragmentedView`. - Switch `deserialize()` and all its helpers from `bytes_view` to `FragmentedView`. - Remove `with_linearized()` calls which just became unnecessary. - Add an optimization for single-fragment cases. The addition of `FragmentedView` might be controversial, because another concept meant for the same purpose - `FragmentRange` - is already used. Unfortunately, it lacks the functionality we need. The main (only?) thing we want to do with a fragmented buffer is to extract a prefix from it and `FragmentRange` gives us no way to do that, because it's immutable by design. We can work around that by wrapping it into a mutable view which will track the offset into the immutable `FragmentRange`, and that's exactly what `linearizing_input_stream` is. But it's wasteful. `linearizing_input_stream` is a heavy type, unsuitable for passing around as a view - it stores a pair of fragment iterators, a fragment view and a size (11 words) to conform to the iterator-based design of `FragmentRange`, when one fragment iterator (4 words) already contains all needed state, just hidden. I suggest we replace `FragmentRange` with `FragmentedView` (or something similar) altogether. Refs: #6138 Closes #7692 * github.com:scylladb/scylla: types: collection: add an optimization for single-fragment buffers in deserialize types: add an optimization for single-fragment buffers in deserialize cql3: tuples: don't linearize in in_value::from_serialized cql3: expr: expression: replace with_linearize with linearized cql3: constants: remove unneeded uses of with_linearized cql3: update_parameters: don't linearize in prefetch_data_builder::add_cell cql3: lists: remove unneeded use of with_linearized query-result-set: don't linearize in result_set_builder::deserialize types: remove unneeded collection deserialization overloads types: switch collection_type_impl::deserialize from bytes_view to FragmentedView cql3: sets: don't linearize in value::from_serialized cql3: lists: don't linearize in value::from_serialized cql3: maps: don't linearize in value::from_serialized types: remove unused deserialize_aux types: deserialize: don't linearize tuple elements types: deserialize: don't linearize collection elements types: switch deserialize from bytes_view to FragmentedView types: deserialize tuple types from FragmentedView types: deserialize set type from FragmentedView types: deserialize map type from FragmentedView types: deserialize list type from FragmentedView types: add FragmentedView versions of read_collection_size and read_collection_value types: deserialize varint type from FragmentedView types: deserialize floating point types from FragmentedView types: deserialize decimal type from FragmentedView types: deserialize duration type from FragmentedView types: deserialize IP address types from FragmentedView types: deserialize uuid types from FragmentedView types: deserialize timestamp type from FragmentedView types: deserialize simple date type from FragmentedView types: deserialize time type from FragmentedView types: deserialize boolean type from FragmentedView types: deserialize integer types from FragmentedView types: deserialize string types from FragmentedView types: remove unused read_simple_opt types: implement read_simple* versions for FragmentedView utils: fragmented_temporary_buffer: implement FragmentedView for view utils: fragment_range: add single_fragmented_view serializer: implement FragmentedView for buffer_view utils: fragment_range: add linearized and with_linearized for FragmentedView utils: fragment_range: add FragmentedView utils: fragmented_temporary_buffer: fix view::remove_prefix
This was referenced Dec 7, 2020
avikivity
added a commit
that referenced
this issue
Dec 8, 2020
…jnowski A sequel to #7692. This series gets rid of linearization in `serialize_for_cql`, which serializes collections and user types from `collection_mutation_view` to CQL. We switch from `bytes` to `bytes_ostream` as the intermediate buffer type. The only user of of `serialize_for_cql` immediately copies the result to another `bytes_ostream`. We could avoid some copies and allocations by writing to the final `bytes_ostream` directly, but it's currently hidden behind a template. Before this series, `serialize_for_cql_aux()` delegated the actual writing to `collection_type_impl::pack` and `tuple_type_impl::build_value`, by passing them an intermediate `vector`. After this patch, the writing is done directly in `serialize_for_cql_aux()`. Pros: we avoid the overhead of creating an intermediate vector, without bloating the source code (because creating that intermediate vector requires just as much code as serializing the values right away). Cons: we duplicate the CQL collection format knowledge contained in `collection_type_impl::pack` and `tuple_type_impl::build_value`. Refs: #6138 Closes #7771 * github.com:scylladb/scylla: types: switch serialize_for_cql from bytes to bytes_ostream types: switch serialize_for_cql_aux from bytes to bytes_ostream types: serialize user types to bytes_ostream types: serialize lists to bytes_ostream types: serialize sets to bytes_ostream types: serialize maps to bytes_ostream utils: fragment_range: use range-based for loop instead of boost::for_each types: add write_collection_value() overload for bytes_ostream and value_view
avikivity
added a commit
that referenced
this issue
Dec 10, 2020
A sequel to #7692. This series gets rid of linearization when validating collections and tuple types. (Other types were already validated without linearizing). The necessary helpers for reading from fragmented buffers were introduced in #7692. All this series does is put them to use in `validate()`. Refs: #6138 Closes #7770 * github.com:scylladb/scylla: types: add single-fragment optimization in validate() utils: fragment_range: add with_simplified() cql3: statements: select_statement: remove unnecessary use of with_linearized cql3: maps: remove unnecessary use of with_linearized cql3: lists: remove unnecessary use of with_linearized cql3: tuples: remove unnecessary use of with_linearized cql3: sets: remove unnecessary use of with_linearized cql3: tuples: remove unnecessary use of with_linearized cql3: attributes: remove unnecessary uses of with_linearized types: validate lists without linearizing types: validate tuples without linearizing types: validate sets without linearizing types: validate maps without linearizing types: template abstract_type::validate on FragmentedView types: validate_visitor: transition from FragmentRange to FragmentedView types: validate_visitor: change `empty()` to `size_bytes() == 0`
avikivity
added a commit
that referenced
this issue
Dec 11, 2020
A sequel to #7692. This series gets rid of linearization when validating collections and tuple types. (Other types were already validated without linearizing). The necessary helpers for reading from fragmented buffers were introduced in #7692. All this series does is put them to use in `validate()`. Refs: #6138 Closes #7770 * github.com:scylladb/scylla: types: add single-fragment optimization in validate() utils: fragment_range: add with_simplified() cql3: statements: select_statement: remove unnecessary use of with_linearized cql3: maps: remove unnecessary use of with_linearized cql3: lists: remove unnecessary use of with_linearized cql3: tuples: remove unnecessary use of with_linearized cql3: sets: remove unnecessary use of with_linearized cql3: tuples: remove unnecessary use of with_linearized cql3: attributes: remove unnecessary uses of with_linearized types: validate lists without linearizing types: validate tuples without linearizing types: validate sets without linearizing types: validate maps without linearizing types: template abstract_type::validate on FragmentedView types: validate_visitor: transition from FragmentRange to FragmentedView utils: fragmented_temporary_buffer: add empty() to FragmentedView utils: fragmented_temporary_buffer: don't add to null pointer
avikivity
added a commit
that referenced
this issue
Dec 11, 2020
A sequel to #7692. This series gets rid of linearization when validating collections and tuple types. (Other types were already validated without linearizing). The necessary helpers for reading from fragmented buffers were introduced in #7692. All this series does is put them to use in `validate()`. Refs: #6138 Closes #7770 * github.com:scylladb/scylla: types: add single-fragment optimization in validate() utils: fragment_range: add with_simplified() cql3: statements: select_statement: remove unnecessary use of with_linearized cql3: maps: remove unnecessary use of with_linearized cql3: lists: remove unnecessary use of with_linearized cql3: tuples: remove unnecessary use of with_linearized cql3: sets: remove unnecessary use of with_linearized cql3: tuples: remove unnecessary use of with_linearized cql3: attributes: remove unnecessary uses of with_linearized types: validate lists without linearizing types: validate tuples without linearizing types: validate sets without linearizing types: validate maps without linearizing types: template abstract_type::validate on FragmentedView types: validate_visitor: transition from FragmentRange to FragmentedView utils: fragmented_temporary_buffer: add empty() to FragmentedView utils: fragmented_temporary_buffer: don't add to null pointer
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/internals
an issue which refers to some internal class or something which has little exposure to users and is
feature/enhancement
In the past few years we have converted most of our codebase to work in terms of fragmented buffers, instead of linearised ones, to help avoid large allocations that put large pressure on the memory allocator.
One prominent component that still works exclusively in terms of linearised buffers is the types hierarchy, more specifically the de/serialization code to/from CQL format. Note that for most types, this is the same as our internal format, notable exceptions are non-frozen collections and user types.
Most types are expected to contain reasonably small values, but texts, blobs and especially collections can get very large. Since the entire hierarchy shares a common interface we can either transition all or none to work with fragmented buffers.
We already have the tools to work with fragmented buffers: bytes_ostream for writing fragmented buffers and utils::linearizing_input_stream for reading them.
Once we converted the types hierarchy we could convert the keys hierarchy as well (keys.hh, compound.hh and compound_compat.hh) and maybe we could kill
with_linearized_managed_bytes()
and friends for good.The text was updated successfully, but these errors were encountered: