Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access refactoring RFC #47

Closed
ImmemorConsultrixContrarie opened this issue Jan 26, 2020 · 4 comments
Closed

Access refactoring RFC #47

ImmemorConsultrixContrarie opened this issue Jan 26, 2020 · 4 comments

Comments

@ImmemorConsultrixContrarie
Copy link
Contributor

ImmemorConsultrixContrarie commented Jan 26, 2020

Summary

Rework current mutable slice split logic into non-atomic, where bitslice.split_at_mut() would return a tuple of (&mut self, AfterSplit). AfterSplit logic is explained lower.

Motivation

I was recently thinking about simple and fast iteration logic based on std slice iterator with some bit tweaks, but such iterator should use default access. I simply don't like carrying atomic overhead everywhere, even if I can deal with it. Also, getting rid of atomics would simplify overall logic of this crate.

Guide-level explanation

On the high level AfterSplit (or whatever name it would get) is a sibling of BitSlice: it implements many traits and functions BitSlice implements (for example, it could be mutable split itself, and after.split_at_mut() produces (self, AfterSplit) tuple). But it is a downgrade, and so it doesn't have any public constructor and some BitSlice functions aren't there.

Reference-level explanation

On a lower level AfterSplit is a three words wide structure: (owned_bits: Bitstore, slice: &mut Bitslice). The trick is simple: slice would always start with properly owned byte (if it's length is not zero, of course) because slices could be split only at machine word border, and the lone bitstore could only store $BITS - 1 bits at max. The latter is easy: split_mut() returns (self, ...), and "self" part there will be always greedy, so its last item will have at least one bit, and never zero (unless the whole slice has zero length, but in that case both slices have zero length). So, the BitStore part of AfterSplit always has one bit to show it's length, and that's enough (trailing_zeroes, leading_zeroes, wrapped up in an ordering trait function).

Drawbacks

A lot of code and docs to write, three words wide instead of two, not a proper BitSlice, thus it could be limited in options.

Rationale and alternatives

I see three ways to deal with mutable splits: no split_mut() at all, atomics, or some kind of AfterSplit. The first way is very simple, but strongly hits functionality; the second way hits performance and now you should uphold the possible double access on most operations with slice, even if they use immutable access; the third way is a compromise: mutable splitting is allowed and doesn't interfere with other parts of library, but is somewhat unhandy for the user.

Prior art

The current atomic access logic crawls everywhere due to edge cases you should think about: for example, iterator over an immutable bitslice should use atomic access, because it could be called onto the product of split_mut(), which could be mutated right now (which is a very rare case, but you are forced to uphold it). Right now atomics are everywhere. This is good to emulate std slice logic, but BitSlice is not a common slice; nobody would treat utf-8 string as ascii, unless it is checked to be ascii. And so BitSlice doesn't need to do everything in the way std slices do. For example, "string".chars() iterator produce value itself, not a reference, and I haven't heard a single complaint about it. And so could work BitSlice immutable iterator: just give 'em the bool and forget about it.

Unresolved questions

Actually, there's just one question: should it be implemented at all, and if yes, how should it be implemented? Once it's resolved, there's a strait way to write code.

Future possibilities

After this, there would be an easy way to write fast value-producing iterator over an immutable bitslice, without thinking much about mutable access problems. Maybe, implementing some other things would become easier. I'm not very sure about it. But it's a move from copypasting common slices' logic, which could be good for performance, as bit slice is special and the fast way to do things is almost never the same as fast way with common slices.

And some things that are not related to this particular question, but related to fast iterations:

  1. BitStore trait should have those functions: rotate_left, rotate_right, reverse_bits (names are the same with numeric primitives' functions);
  2. BitOrder trait should have those functions: rotate_forward, rotate_backward, from_ne_order — this function takes BitStore with some bits and turns it into the required order (essentially a no-op for Lsb0 and single bit reverse for Msb0). Those functions could be easily implemented for two default orderings with functions from 1.

I could have written PR with those things before that RFC, but there's no point in those functions before mutable split question is answered in some way.

@ImmemorConsultrixContrarie ImmemorConsultrixContrarie changed the title Refactoring RFC Access refactoring RFC Jan 26, 2020
@ImmemorConsultrixContrarie
Copy link
Contributor Author

ImmemorConsultrixContrarie commented Jan 26, 2020

Okay, atomics are still required for this, to write data back once it's done with changing dangling bits, and the actual split should return two different AfterSplits: (One, Other), so their accesses won't overlap, and each part should know, where it should write its dangling bits. Yet, this would strip atomics from all other parts of library. And no possible atomic bottleneck, because just like BitMut those AfterSplits would write it's shared bits only on drop.
While "atomic" feature is off, all split_mut functions could return (&mut self, &mut BitSlice), since BitSlice is not Send-Sync without it, and one-threaded access should be okay.

@myrrlyn
Copy link
Collaborator

myrrlyn commented Jan 27, 2020

Okay so I've been thinking about this all day and you're right in spirit, though as your follow-up comments mention, less so in implementation. This is my fifth draft at a reply, so I'll try to be concise:

  • atomic writes are never faster than non-atomic; they are at best as fast as and at worst slower. therefore, on unaliased memory, they should be removed.
  • I do not want to diverge from [T] except where absolutely necessary. &mut bool requires a proxy type. slices do not. Changing the split_at_mut API is not something I will do without very strong evidence that atomics are intolerable and there is no better option
  • but I have one. The domain module fractures a slice into its aliased edges and unaliased center.

We can encode that a particular slice has an unaliased view to memory in its T: BitStore parameter by creating a wrapper type over the store integers that unconditionally uses Cell for memory access. We can add a method to BitSlice to produce such deätomized slices using domain, and change the methods that affect aliasing (ctors are mostly unaliased; splitters are always aliased) to add or remove that wrapper.

I've been working on doing exactly this. It's a lot of boilerplate, mostly in trait forwarding, but I think once done it should be able to make this work without severe external effects.

Thank you for bringing this to my attention; indexed access is important to keep performant and this is a strong opportunity to do so.

@myrrlyn
Copy link
Collaborator

myrrlyn commented Feb 25, 2020

Wow it's only been a month? For some reason I thought this was filed earlier in January.

Okay. So. This has been a nightmare of a feature to actually implement.

Summary:

This program

extern crate bitvec;

use bitvec::prelude::*;

fn main() {
  let mut raw = [0u8; 3];
  let bits = raw.bits_mut::<Local>();
  let (left, _right) = bits.split_at_mut(12);
  match left.bit_domain_mut() {
    BitDomainMut::Region { body, tail, .. } => {
      body.set(4, true);
      tail.set(2, true);
    },
    _ => unsafe{std::hint::unreachable_unchecked()},
  }
}

produces this output

bitvec::main:
    push   rax
    mov    word, ptr, [rsp, +, 6], 0
    mov    byte, ptr, [rsp, +, 5], 16
    lock   or, byte, ptr, [rsp, +, 6], 4
    pop    rax
    ret

You can observe that this writes a zero into [6] (initializing the middle byte), then a sixteen into [5] (body.set(4, true)), then a lock or of [6] with 4 (tail.set(2, true)).


The default behavior has been moved back from atomic to ordinary accesses. The methods that cause aliasing of memory elements contaminate their entire slices with an aliasing marker, which causes all accesses through tainted slices to use atomic access, even when unneeded. This is because switching instructions once is faster than performing an alias detection check on every write.

To untaint a slice, use the .bit_domain_mut() method to get access to three slices: two aliased, and restricted to only the aliased element, and one unaliased, covering all the unaliased elements.

You now have full control over whether an access may be unaliased, or must be aliased.

This feature is currently only available on branch feature/47-noalias. When I am satisfied with the code quality and also implementation correctness in Miri, I will merge it into master, close this issue, and cut the 0.18 release.

I have added an example file, examples/aliasing.rs. Please let me know if the example of how to operate the new behavior is sufficient, and if the type system rules on display there are acceptable or painful to use. In generic contexts, the alias markers are deeply unpleasant, but in instantiated contexts, they are easy to drag back to the specific integer or atom required.

Thank you again for this issue. I am happy to have a community of users pushing bitvec to continually be better.

I am probably going to stop working on it for a month or so once 0.18 is published.

@myrrlyn
Copy link
Collaborator

myrrlyn commented Mar 5, 2020

This behavior is now present in trunk. I will issue an 0.18 that includes it by this Saturday at the latest.

@myrrlyn myrrlyn closed this as completed Mar 5, 2020
myrrlyn referenced this issue Mar 11, 2020
This change reduces the computation done by `.next()` and
`.next_back()` by representing the iteration region as a pair of
boundary pointers.

The behavior is analagous to the double-pointer iteration seen in
`core::slice::Iter{,Mut}` and C++ iterators, so I will not explain
it further here. The implementation uses `BitIdx` rather than
`BitTail` for the ending marker, because the ending marker must be
considered a live pointer one past the end of the region, rather
than a dead pointer, in order to support `.next_back()` correctly.

Instruction size of `.next()` is considerably reduced, and the
runtime drops from 17ns to 8.

Benchmarks:

old:     2,634ns +/- 898
old_rev: 1,519ns +/- 256
new:     1,090ns +/- 246
new_rev: 1,095ns +/- 186
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants