Access refactoring RFC #47

ImmemorConsultrixContrarie · 2020-01-26T15:20:49Z

Summary

Rework current mutable slice split logic into non-atomic, where bitslice.split_at_mut() would return a tuple of (&mut self, AfterSplit). AfterSplit logic is explained lower.

Motivation

I was recently thinking about simple and fast iteration logic based on std slice iterator with some bit tweaks, but such iterator should use default access. I simply don't like carrying atomic overhead everywhere, even if I can deal with it. Also, getting rid of atomics would simplify overall logic of this crate.

Guide-level explanation

On the high level AfterSplit (or whatever name it would get) is a sibling of BitSlice: it implements many traits and functions BitSlice implements (for example, it could be mutable split itself, and after.split_at_mut() produces (self, AfterSplit) tuple). But it is a downgrade, and so it doesn't have any public constructor and some BitSlice functions aren't there.

Reference-level explanation

On a lower level AfterSplit is a three words wide structure: (owned_bits: Bitstore, slice: &mut Bitslice). The trick is simple: slice would always start with properly owned byte (if it's length is not zero, of course) because slices could be split only at machine word border, and the lone bitstore could only store $BITS - 1 bits at max. The latter is easy: split_mut() returns (self, ...), and "self" part there will be always greedy, so its last item will have at least one bit, and never zero (unless the whole slice has zero length, but in that case both slices have zero length). So, the BitStore part of AfterSplit always has one bit to show it's length, and that's enough (trailing_zeroes, leading_zeroes, wrapped up in an ordering trait function).

Drawbacks

A lot of code and docs to write, three words wide instead of two, not a proper BitSlice, thus it could be limited in options.

Rationale and alternatives

I see three ways to deal with mutable splits: no split_mut() at all, atomics, or some kind of AfterSplit. The first way is very simple, but strongly hits functionality; the second way hits performance and now you should uphold the possible double access on most operations with slice, even if they use immutable access; the third way is a compromise: mutable splitting is allowed and doesn't interfere with other parts of library, but is somewhat unhandy for the user.

Prior art

The current atomic access logic crawls everywhere due to edge cases you should think about: for example, iterator over an immutable bitslice should use atomic access, because it could be called onto the product of split_mut(), which could be mutated right now (which is a very rare case, but you are forced to uphold it). Right now atomics are everywhere. This is good to emulate std slice logic, but BitSlice is not a common slice; nobody would treat utf-8 string as ascii, unless it is checked to be ascii. And so BitSlice doesn't need to do everything in the way std slices do. For example, "string".chars() iterator produce value itself, not a reference, and I haven't heard a single complaint about it. And so could work BitSlice immutable iterator: just give 'em the bool and forget about it.

Unresolved questions

Actually, there's just one question: should it be implemented at all, and if yes, how should it be implemented? Once it's resolved, there's a strait way to write code.

Future possibilities

After this, there would be an easy way to write fast value-producing iterator over an immutable bitslice, without thinking much about mutable access problems. Maybe, implementing some other things would become easier. I'm not very sure about it. But it's a move from copypasting common slices' logic, which could be good for performance, as bit slice is special and the fast way to do things is almost never the same as fast way with common slices.

And some things that are not related to this particular question, but related to fast iterations:

BitStore trait should have those functions: rotate_left, rotate_right, reverse_bits (names are the same with numeric primitives' functions);
BitOrder trait should have those functions: rotate_forward, rotate_backward, from_ne_order — this function takes BitStore with some bits and turns it into the required order (essentially a no-op for Lsb0 and single bit reverse for Msb0). Those functions could be easily implemented for two default orderings with functions from 1.

I could have written PR with those things before that RFC, but there's no point in those functions before mutable split question is answered in some way.

The text was updated successfully, but these errors were encountered:

ImmemorConsultrixContrarie · 2020-01-26T21:33:43Z

Okay, atomics are still required for this, to write data back once it's done with changing dangling bits, and the actual split should return two different AfterSplits: (One, Other), so their accesses won't overlap, and each part should know, where it should write its dangling bits. Yet, this would strip atomics from all other parts of library. And no possible atomic bottleneck, because just like BitMut those AfterSplits would write it's shared bits only on drop.
While "atomic" feature is off, all split_mut functions could return (&mut self, &mut BitSlice), since BitSlice is not Send-Sync without it, and one-threaded access should be okay.

myrrlyn · 2020-01-27T01:20:50Z

Okay so I've been thinking about this all day and you're right in spirit, though as your follow-up comments mention, less so in implementation. This is my fifth draft at a reply, so I'll try to be concise:

atomic writes are never faster than non-atomic; they are at best as fast as and at worst slower. therefore, on unaliased memory, they should be removed.
I do not want to diverge from [T] except where absolutely necessary. &mut bool requires a proxy type. slices do not. Changing the split_at_mut API is not something I will do without very strong evidence that atomics are intolerable and there is no better option
but I have one. The domain module fractures a slice into its aliased edges and unaliased center.

We can encode that a particular slice has an unaliased view to memory in its T: BitStore parameter by creating a wrapper type over the store integers that unconditionally uses Cell for memory access. We can add a method to BitSlice to produce such deätomized slices using domain, and change the methods that affect aliasing (ctors are mostly unaliased; splitters are always aliased) to add or remove that wrapper.

I've been working on doing exactly this. It's a lot of boilerplate, mostly in trait forwarding, but I think once done it should be able to make this work without severe external effects.

Thank you for bringing this to my attention; indexed access is important to keep performant and this is a strong opportunity to do so.

myrrlyn · 2020-02-25T23:09:02Z

Wow it's only been a month? For some reason I thought this was filed earlier in January.

Okay. So. This has been a nightmare of a feature to actually implement.

Summary:

This program

extern crate bitvec;

use bitvec::prelude::*;

fn main() {
  let mut raw = [0u8; 3];
  let bits = raw.bits_mut::<Local>();
  let (left, _right) = bits.split_at_mut(12);
  match left.bit_domain_mut() {
    BitDomainMut::Region { body, tail, .. } => {
      body.set(4, true);
      tail.set(2, true);
    },
    _ => unsafe{std::hint::unreachable_unchecked()},
  }
}

produces this output

bitvec::main:
    push   rax
    mov    word, ptr, [rsp, +, 6], 0
    mov    byte, ptr, [rsp, +, 5], 16
    lock   or, byte, ptr, [rsp, +, 6], 4
    pop    rax
    ret

You can observe that this writes a zero into [6] (initializing the middle byte), then a sixteen into [5] (body.set(4, true)), then a lock or of [6] with 4 (tail.set(2, true)).

The default behavior has been moved back from atomic to ordinary accesses. The methods that cause aliasing of memory elements contaminate their entire slices with an aliasing marker, which causes all accesses through tainted slices to use atomic access, even when unneeded. This is because switching instructions once is faster than performing an alias detection check on every write.

To untaint a slice, use the .bit_domain_mut() method to get access to three slices: two aliased, and restricted to only the aliased element, and one unaliased, covering all the unaliased elements.

You now have full control over whether an access may be unaliased, or must be aliased.

This feature is currently only available on branch feature/47-noalias. When I am satisfied with the code quality and also implementation correctness in Miri, I will merge it into master, close this issue, and cut the 0.18 release.

I have added an example file, examples/aliasing.rs. Please let me know if the example of how to operate the new behavior is sufficient, and if the type system rules on display there are acceptable or painful to use. In generic contexts, the alias markers are deeply unpleasant, but in instantiated contexts, they are easy to drag back to the specific integer or atom required.

Thank you again for this issue. I am happy to have a community of users pushing bitvec to continually be better.

I am probably going to stop working on it for a month or so once 0.18 is published.

myrrlyn · 2020-03-05T01:19:19Z

This behavior is now present in trunk. I will issue an 0.18 that includes it by this Saturday at the latest.

This change reduces the computation done by `.next()` and `.next_back()` by representing the iteration region as a pair of boundary pointers. The behavior is analagous to the double-pointer iteration seen in `core::slice::Iter{,Mut}` and C++ iterators, so I will not explain it further here. The implementation uses `BitIdx` rather than `BitTail` for the ending marker, because the ending marker must be considered a live pointer one past the end of the region, rather than a dead pointer, in order to support `.next_back()` correctly. Instruction size of `.next()` is considerably reduced, and the runtime drops from 17ns to 8. Benchmarks: old: 2,634ns +/- 898 old_rev: 1,519ns +/- 256 new: 1,090ns +/- 246 new_rev: 1,095ns +/- 186

ImmemorConsultrixContrarie changed the title ~~Refactoring RFC~~ Access refactoring RFC Jan 26, 2020

myrrlyn mentioned this issue Feb 13, 2020

split_at_mut may be unsound #49

Closed

myrrlyn mentioned this issue Mar 4, 2020

Serde Tests Fail on Certain Architectures #44

Closed

myrrlyn closed this as completed Mar 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Access refactoring RFC #47

Access refactoring RFC #47

ImmemorConsultrixContrarie commented Jan 26, 2020 •

edited

Loading

ImmemorConsultrixContrarie commented Jan 26, 2020 •

edited

Loading

myrrlyn commented Jan 27, 2020

myrrlyn commented Feb 25, 2020 •

edited

Loading

myrrlyn commented Mar 5, 2020

Access refactoring RFC #47

Access refactoring RFC #47

Comments

ImmemorConsultrixContrarie commented Jan 26, 2020 • edited Loading

Summary

Motivation

Guide-level explanation

Reference-level explanation

Drawbacks

Rationale and alternatives

Prior art

Unresolved questions

Future possibilities

ImmemorConsultrixContrarie commented Jan 26, 2020 • edited Loading

myrrlyn commented Jan 27, 2020

myrrlyn commented Feb 25, 2020 • edited Loading

myrrlyn commented Mar 5, 2020

ImmemorConsultrixContrarie commented Jan 26, 2020 •

edited

Loading

ImmemorConsultrixContrarie commented Jan 26, 2020 •

edited

Loading

myrrlyn commented Feb 25, 2020 •

edited

Loading