ACP: Add functions for "compress bits" and "expand bits" to unsigned integers

# Proposal

## Problem statement

There's currently no way to take advantage of compiler optimizations or hardware instructions if a naive implementation is written to extract multiple bits from an integer or deposit those bits into an integer according to a mask.

These operations (known as parallel bit extract/deposit, gather/scatter, or compress/expand) are useful building blocks for algorithms and data structures, and they are backed by hardware instructions on some platforms.

I will refer to these functions as some variant of "bit expand/compress" since I find those names more descriptive.

## Motivating examples or use cases

Bit expand and compress would be good candidates for `std` inclusion because they are non-trivial to implement efficiently and may require the use of special intrinsics from the compiler to optimize portably according to the target platform, like the other bit operations of `trailing_zeros` and `count_ones`.

Expand and compress have use cases for working with variable length encodings (varints, Unicode), bit interleaving/deinterleaving (Morton codes/space-filling curves), perfect hashing/lookup tables, parsing tasks, and succinct data structures. They're also useful for expressing operations that would ordinarily take several masks and shifts, and simplifying that to one function call with a specified mask.

I'm not aware of any pattern that will get optimized into the desired instruction on x86 so the intrinsics must be used.
LLVM doesn't currently have a generic intrinsic but there has been discussion on the Discourse and there's a C++ proposal for C++26 that would include the functionality so it may be implemented in the future.

## Solution sketch

A naive implementation is shown for each function. Nonzero and signed integers would be omitted.

```rust
/// Returns `self` with the bit locations specified by `mask` packed
/// contiguously into the least significant bits of the result.
/// ```
/// let n = 0b1011_1100;
///
/// assert_eq!(compress_bits(n, 0b0010_0100), 0b0000_0011);
/// assert_eq!(compress_bits(n, 0xF0), 0b0000_1011);
/// ```
pub const fn compress_bits(self_: u32, mask: u32) -> u32 {
    let mut result = 0;
    let mut i = 0;
    let mut j = 0;
    while i < u32::BITS {
        let mask_bit = (mask >> i) & 1;
        result |= (mask_bit & (self_ >> i)) << j;
        j += mask_bit;
        i += 1;
    }
    result
}

/// Returns a result with the least significant bits of `self` distributed to
/// the bit locations specified by `mask`.
/// ```
/// let n = 0b1010_1101;
///
/// assert_eq!(expand_bits(n, 0b0101_0101), 0b0101_0001);
/// assert_eq!(expand_bits(n, 0xF0), 0b1101_0000);
/// ```
pub const fn expand_bits(self_: u32, mask: u32) -> u32 {
    let mut result = 0;
    let mut i = 0;
    let mut j = 0;
    while i < u32::BITS {
        let mask_bit = (mask >> i) & 1;
        result |= (mask_bit & (self_ >> j)) << i;
        j += mask_bit;
        i += 1;
    }
    result
}
```

A more optimal way to implement this is with an XOR parallel suffix as discussed in Hacker's Delight, or with native instructions.
The function can optimize to only a few instructions when the mask is known, [as shown here](https://eisenwave.github.io/cpp-proposals/bit-permutations.html#example-24d2a732).

## Alternatives

1. Do nothing and rely on third party crates implementing this in software or wrappers around the platform intrinsics.
2. Wait for LLVM to implement the intrinsic.

As for naming, I think suitable Rust names would be either of the first two.
"Extract" is in my opinion too overloaded with meaning to be a good name for what it does.
- `compress_bits`, `expand_bits`
- `gather_bits`, `scatter_bits`
- `extract_bits`, `deposit_bits`

## Links and related work

[PEXT](https://www.felixcloutier.com/x86/pext), [PDEP](https://www.felixcloutier.com/x86/pdep) - Intel BMI2 (introduced with Haswell BMI2)  
[BEXT](https://dougallj.github.io/asil/doc/bext_z_zz_64.html), [BDEP](https://dougallj.github.io/asil/doc/bdep_z_zz_64.html) - Arm SVE2  
APL's [Replicate](https://aplwiki.com/wiki/Replicate) and [Expand](https://aplwiki.com/wiki/Expand)  
https://orlp.net/blog/extracting-depositing-bits/  
https://en.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set#Parallel_bit_deposit_and_extract  
https://eisenwave.github.io/cpp-proposals/bit-permutations.html#interleaving-bits - C++26 proposal examples  
https://discourse.llvm.org/t/how-would-i-go-about-implementing-new-bit-manipulation-builtins-for-my-proposal-especially-generic-builtins/76523  

## What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team [feature lifecycle]. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

[feature lifecycle]: https://std-dev-guide.rust-lang.org/development/feature-lifecycle.html

## Possible responses

The libs team may respond in various different ways. First, the team will consider the *problem* (this doesn't require any concrete solution or alternatives to have been proposed):

- We think this problem seems worth solving, and the standard library might be the right place to solve it.
- We think that this probably doesn't belong in the standard library.

Second, if there's a concrete solution:

- We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
- We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ACP: Add functions for "compress bits" and "expand bits" to unsigned integers #695

Proposal

Problem statement

Motivating examples or use cases

Solution sketch

Alternatives

Links and related work

What happens now?

Possible responses

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ACP: Add functions for "compress bits" and "expand bits" to unsigned integers #695

Description

Proposal

Problem statement

Motivating examples or use cases

Solution sketch

Alternatives

Links and related work

What happens now?

Possible responses

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions