Skip to content

Conversation

apoelstra
Copy link
Contributor

Adds to Bitv the methods:

  • .push() and .pop() which take and return bool, respectively
  • .collect() to get a Bitv from a bool-yielding iterator
  • .truncate() truncates a Bitv to a specific length
  • .len() returns the number of bits stored in the Bitv

Adds to Bitv the methods:
  - .push() and .pop() which take and return bool, respectively
  - .collect() to get a Bitv from a bool-yielding iterator
  - .truncate() truncates a Bitv to a specific length
  - .len() returns the number of bits stored in the Bitv
@alexcrichton
Copy link
Member

I was under the impression that the Bitv type was a fixed-size bit vector while the BitvSet type was a growable and shrinkable bit vector. It does seem a little odd to have two types though, so perhaps they're good candidates for merging? The line between them is definitely getting blurred with a push and pop method on the old fix-sized variant.

@apoelstra
Copy link
Contributor Author

Oh, my bad, the description of BitvSet is "An implementation of a set using a bit vector as an underlying representation for holding numerical elements.", which sounds nothing at all like "resizable bit vector", so I didn't realize that this was its purpose.

I'm happy to write code to merge these (and introduce more Vec-like functionality) if there would be support for that.

@alexcrichton
Copy link
Member

Oh dear, I think both of these types are a little out of date! I think that Bitv's major benefit is that it will use inline storage of just a uint if possible, but I'm not entirely sure how much that wins in the long run.

Perhaps the methods could be added to BitvSet for now? I'm not entirely sure what the best design for these two types are. Perhaps they could get merged, but they're generally used in perf-sensitive situations and it'd be tough to ensure that nothing was lost during the merge. For the time being, given two types, they likely belong on BitvSet rather than Bitv, and we could merge the two types in the future if necessary.

@apoelstra
Copy link
Contributor Author

Ok, it is not much work at all to move these functions into BitvSet, but there is more functionality missing that I would need to port from Bitv. In particular, Bitv::iter() returns an iterator of uints which yields each of the indices at which the BitvSet has a 1 (in keeping with its nominal usecase of "set of integers"). There is no way to nondestructively get a bool iterator from a BitvSet, and adding one requires either (a) retooling the Bitv iterator code to support both structures or (b) duplicating the code almost, but not quite, exactly. Neither option is trivial.

My feeling is that we should merge these two data structures into a single Bitv which supports the small and large representations and can be resized. Then BitvSet can be implemented as struct BitvSet(Bitv) which implements the set operations (and only set operations) on top of the underlying Bitv.

Is this a dramatic enough change to warrant an RFC? Note that Bitv would retain its original interface, plus new additions. While the BitvSet interface would possibly be reduced, we could support .get_ref() and .get_mut_ref() methods to access the underlying Bitv (this is impossible with the current implementation). And besides, BitvSet is used only once in all of rustc, in src/libsyntax/attr.rs, and the only methods used are assignment and .contains(). So it would suffice for rustc to just impl Set on Bitv, define type BitvSet = Bitv and call that a day.

What are your thoughts?

@alexcrichton
Copy link
Member

Sounds like Bitv and BitvSet should be merged! I don't think this will require an RFC due to it being only a lightly used type and this will provide a migration path forward in all cases.

@apoelstra
Copy link
Contributor Author

Thanks Alex.

I'm cutting into the code now to merge the data structures, and there are some... interesting things. For one thing, BitvSet exposes several methods like symmetric_difference, which takes two BitvSets and a function bool -> bool, computes their bitwise symmetric difference, runs the function on each bit, and returns true iff the function accepted on every bit. This function is not used anywhere in rustc, its functionality is IMHO very specified, its name does not match its functionality, and there is no documentation. My feeling is that these functions should be removed.

Also, the existing code is not very performant at all. For example, on every BitvSet set operation the current size (number of ones in the bit vector) is effectively recomputed twice. The size is used only as a return value for .len(), so why not compute it once there? Also the method of computation is to count the bits in every uint with a loop. Why not use num::Int::count_ones which is a passthrough to the appropriate variant of intrinsics::ctlz64?

Also there is some very creative use of iterators, most of which can be greatly simplified and made more efficient. Perhaps this is an artifact of Rust's immaturity when the module was written; Bitv has not been changed at least since it was part of libextra.

And perhaps there is more that remains to be seen. :)

So I'm going to close this, and I'll submit a new PR "revamp Bitv and BitvSet" with my suggested changes spread across several commits. The goal is to minimize interface changes (except to remove symmetric_difference and friends) but there will be a lot of rewritten code and improved documentation.

@apoelstra apoelstra closed this Jun 28, 2014
@huonw
Copy link
Contributor

huonw commented Jun 28, 2014

For one thing, BitvSet exposes several methods like symmetric_difference, which takes two BitvSets and a function bool -> bool, computes their bitwise symmetric difference, runs the function on each bit, and returns true iff the function accepted on every bit. This function is not used anywhere in rustc, its functionality is IMHO very specified, its name does not match its functionality, and there is no documentation. My feeling is that these functions should be removed.

I think this should be converted into an external iterator (e.g. the equlvaient things on TreeSet). I would assume they were just missed in the internal -> external iterator changes (since, as you say, it hasn't changed for a long time).

(Thanks for taking on this task ❤️)

@alexcrichton
Copy link
Member

For one thing, BitvSet exposes several methods like symmetric_difference

Indeed, as @huonw mentioned, these are ancient methods! If they don't make it through in the rewrite, it's probably not the end of the world.

Also, the existing code is not very performant at all. For example, on every BitvSet set operation the current size (number of ones in the bit vector) is effectively recomputed twice.

Oh dear!

Why not use num::Int::count_ones which is a passthrough to the appropriate variant of intrinsics::ctlz64?

We weren't clever enough when this was written :)

So I'm going to close this, and I'll submit a new PR "revamp Bitv and BitvSet" with my suggested changes spread across several commits

Awesome! I'll definitely chime in with @huonw and say thanks for taking this on!

bors added a commit that referenced this pull request Jul 5, 2014
The types `Bitv` and `BitvSet` are badly out of date. This PR:
- cleans up the code (primarily, simplifies `Bitv` and implements `BitvSet` in terms of `Bitv`)
- implements several new traits for `Bitv`
- adds new functionality to `Bitv` and `BitvSet`
- replaces internal iterators with external ones
- updates documentation
- minor bug fixes

This is a significantly souped-up version of PR #15139 and is the result of the discussion there.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants