Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upRFC: bit fields and bit matching #29
Conversation
farcaller
changed the title
Added RFC on bit fields
RFC: bit fields and bit matching
Apr 4, 2014
cmr
reviewed
Apr 4, 2014
| 0b00 => ..., | ||
| 0b01 => ..., | ||
| 0b02 => ..., | ||
| 0b03 => ... |
This comment has been minimized.
This comment has been minimized.
cmr
reviewed
Apr 4, 2014
|
|
||
| # Alternatives | ||
|
|
||
| Provide a bit extraction macros that would perform the first part of this RFC. Doesn't solve the problem of second part. |
This comment has been minimized.
This comment has been minimized.
cmr
Apr 4, 2014
Member
Is this possible to do with macros? If we can do it with macros, we don't need to add it to the language. I believe it could be done, there's nothing that really precludes it I don't think. I definitely think the match, and possibly the updating could be done, though exraction might need to use methods, and it might not look as nice.
This comment has been minimized.
This comment has been minimized.
farcaller
Apr 4, 2014
Author
I think, it's possible to do things like val[4..5] with macros, maybe even val[0, 4..5] with recursive macros. Not sure how to deal with match though, as noted on ML that would require new variable bit-sized ints.
This comment has been minimized.
This comment has been minimized.
cmr
Apr 4, 2014
Member
I don't think it'd actually require that. It can just extract it to the smallest uint size that fits and compare against constants, with a fall-through match arm.
When I say "macros", though, I mostly just mean "any syntax extension", which includes procedural ones (written in pure Rust)
This comment has been minimized.
This comment has been minimized.
tari
Apr 4, 2014
Contributor
I would personally prefer a macro approach as well. I initially proposed arbitrary-width integers as a workaround due to some confusion about what constituted valid operations in a macro, and haven't been able to come up with a reasonable way to unify the required types with the existing ones in a pleasing way.
While the implementation of this feature would be easier with such arbitrary-width integers, I think it comes with effects on the semantics of too many other language items. In the simplest case, we clutter the 'standard' namespace with a lot of new types that few users will need (u4, i13, ...) and a more generic version would likely need its own unusual semantics (UIntN(4), IntN(13), perhaps).
SimonSapin
reviewed
Apr 4, 2014
| ```rust | ||
| let mut val: u32 = ...; | ||
| let bits1 = val[4..5]; // equivalent to bits = (val >> 4) & 3 | ||
| let bits2 = val[0,4..5]; // equivalent to bits = ((val >> 4) & 3) | (val & 1) |
This comment has been minimized.
This comment has been minimized.
SimonSapin
Apr 4, 2014
Contributor
The previous line is fine, but this might be a bit too magical. It’s not obvious that , means |
This comment has been minimized.
This comment has been minimized.
farcaller
Apr 28, 2014
Author
Still, it would be useful to extract non-continuous bits. Maybe using | instead of , is a better option?
SimonSapin
reviewed
Apr 4, 2014
| let bits2 = val[0,4..5]; // equivalent to bits = ((val >> 4) & 3) | (val & 1) | ||
| val[2..7] = 10; // equivalent to val = (val & (0xffffffff ^ 0xfc)) | (10 << 2) | ||
| val[0] = 3; // doesn't compile, as you can't fit 0b11 into one bit place |
This comment has been minimized.
This comment has been minimized.
SimonSapin
Apr 4, 2014
Contributor
What if instead of a literal you have a variable or other expression whose value is not known at compile-time? Do too-big values trigger fail!()?
This comment has been minimized.
This comment has been minimized.
farcaller
Apr 4, 2014
Author
I don't think it's reasonable to have run-time support for this feature. It could be solved by something like this: val[0] = data[0], where data is of integer type.
This comment has been minimized.
This comment has been minimized.
bill-myers
commented
Apr 4, 2014
|
I think that it's not necessary to change the language in this way to support this. For bit access, it seems to me that a macro would work. For matching, add a special case to the language so that matching an expression that, after trivial constant propagation/move elimination, is (expr & const), (expr % const), (expr << const), (expr >> const) or a combination of those (and possibly others) is considered exhaustive when all the possible output values are specified. It's also possible to handle matching with integers of any bit-size, but that doesn't quite handle modulus and shifts and so on without adding those too in the typesystem, which seems worse. |
This comment has been minimized.
This comment has been minimized.
erickt
commented
Apr 7, 2014
|
I'm going to vote against this RFC. While I love the idea of bitstring matching, @LeoTestard's awesome rustlex demonstrates it's possible to build a complex pattern matching macro. I don't really see any disadvantages of it being done externally, so I feel this should be done as an external project. |
This comment has been minimized.
This comment has been minimized.
|
I think something along these lines is necessary for systems programming. I don't really care if it is part of the language or a syntax extension, as long as it is nice. I would rather have this done well in the language than done awkwardly with a syntax extension. Given that this is a fairly core feature for our core audience, I don't see the attraction of having it outside the language if there is any reason not to. |
nrc
reviewed
Apr 28, 2014
|
|
||
| ```rust | ||
| let mut val: u32 = ...; | ||
| let bits1 = val[4..5]; // equivalent to bits = (val >> 4) & 3 |
This comment has been minimized.
This comment has been minimized.
nrc
Apr 28, 2014
Member
Could you explain this syntax please? Its not clear to me from the examples.
This comment has been minimized.
This comment has been minimized.
pczarn
Apr 28, 2014
Consider 123u[32..63]. Would such access compile only on some platforms other than 32-bit?
This comment has been minimized.
This comment has been minimized.
farcaller
Apr 28, 2014
Author
I think it should be limited to strictly sized types, e.g. 123u[32..62] wouldn't work, you must use 123u64[32..62].
nrc
reviewed
Apr 28, 2014
| Provide a bit extraction macros that would perform the first part of this RFC. Doesn't solve the problem of second part. | ||
|
|
||
| Erlang has an even better bit matching: | ||
|
|
This comment has been minimized.
This comment has been minimized.
nrc
Apr 28, 2014
Member
Could you show what this would look like in Rust or explain it in words please? I don't understand the Erlang syntax.
This comment has been minimized.
This comment has been minimized.
farcaller
Apr 28, 2014
Author
Well, this one is quite a complex example, actually, I got it from here. I think, rust version would be something along the lines of
let IP_VERSION = 4;
let IP_MIN_HDR_LEN = 5;
let DgramSize = byte_size(Dgram);
match Drgam {
(
ref IPVers @ [0..3],
ref HLen @ [4..7],
ref SrvcType @ [8..15],
ref TotLen @ [16..31],
ref ID @ [31..47],
ref Flgs @ [48..50],
ref FragOff @ [51..63],
ref TTL @ [64..71],
ref Proto @ [72..79],
ref HdrChkSum @ [80..95],
ref SrcIP @ [96..127],
ref DestIP @ [128..159],
ref RestDgram @ [160..]
) if IPVers = IP_VERSION && HLen >= 5 && HLen*4 <= DgramSize {
// ...
},
_ => (),
}
This comment has been minimized.
This comment has been minimized.
edwardw
May 10, 2014
More formally, a bitstring in Erlang is of the form:
<<Sengment1, ..., Segment_N>>
And a segment is:
Segment = Value | Value:Size | Value/TypeSpecifiers | Value:Size/TypeSpecifiers
TypeSpecifiers = Endianess-Sign-Type-Unit
Endianess = big | little | native
Sign = signed | unsigned
Type = integer | float | binary
Unit = 1 | 2 | ... | 255
Please let us rusteceans have it :)
This comment has been minimized.
This comment has been minimized.
pczarn
May 10, 2014
@farcaller: I don't understand how to reference bit-aligned data. Also, how would you create and transmute bitstrings?
I'm convinced that bit matching should use structs.
struct Dgram {
ip_vers: Uint<4>,
hlen: Uint<4>,
srvc_type: u8,
total_len: u16,
id: u16,
flgs: Uint<3>,
frag_off: Uint<13>,
ttl: u8,
proto: u8,
hdr_chksum: u16,
src_ip: u32,
dest_ip: u32,
}
static IP_VERSION = 4;
static IP_MIN_HDR_LEN = 5;
// in fn(dgram: Dgram, rest: Vec<u8>)
let size = size_of::<Dgram>();
match dgram {
Dgram {
ip_vers: IP_VERSION as Uint<4>,
hlen: hlen,
srvc_type: srvc_type, total_len: total_len,
id: id, flgs: flgs, frag_off: frag_off,
ttl: ttl, proto: proto, hdr_chksum: hdr_chksum,
src_ip: src_ip, dest_ip: dest_ip,
} if hlen >= 5 && hlen*4 <= size => {
let opts_len = 4 * (hlen - IP_MIN_HDR_LEN);
let (opts, data) = rest.split_at(opts_len);
// ...
},
_ => (),
}
This comment has been minimized.
This comment has been minimized.
farcaller
May 10, 2014
Author
I like how this struct looks, but it's getting close to bitfields of C/C++ that are often frowned upon. I guess the main reason is that byte order is not defined in those, so if we can have structs with explicit alignment and byte order, that would work.
This comment has been minimized.
This comment has been minimized.
pczarn
May 10, 2014
Struct fields with attributes are certainly possible. I propose the following syntax
struct MyData {
a: u8,
#[align(4)] b: u8,
align(16) little { // little endian
c: int,
d: uint,
}
}
This comment has been minimized.
This comment has been minimized.
farcaller
May 10, 2014
Author
That would still require support for arbitrary-sized ints, right? In cases of Uint<4>.
This comment has been minimized.
This comment has been minimized.
pczarn
May 10, 2014
Yes, and they require support for static generic parameters in turn. Another problem is, would all fields have bit alignment by default? What would happen when an Uint<4> was followed by u8?
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
I think it is essential to be refer to groups of bits (not just single bits; its not clear to me if that is possible here) and to name them. Both things are possible in C++. I think there is too much potential for errors without. There is some good discussion on this reddit thread on what is necessary to have safe and portable bitfields - http://www.reddit.com/r/rust/comments/244yz6/bitfields_in_rust/ |
pczarn
reviewed
Apr 28, 2014
|
|
||
| # Detailed design | ||
|
|
||
| The first part of this RFC is defenition of a bit access for integer types. For the sake of simplicity, only unsigned integer types (uint, u8, u16, u32, u64) are supported. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
dobkeratops
commented
Apr 28, 2014
|
over the years i'd done a lot of low level work packing vertex and color formats, building DMA tags and so on.. I'd always been perfectly happy with C/C++ on this front without bitfields (always feared them for portability issues,and just used shifts/masks and abstractions of those). Its a long way from the issues that drove me to Rust.. and I can think of many other things i'd rather have added to the language today. Ints in the generic type params (like C++) would extend what you can do with generic code, eg shift/mask values in constants. (eg, have a smartpointer which is a compressed pointer with a shift value for acessing aligned objects within an arena, ). Better generic type inference (equiv of C++ decltype, even auto return type) would help. HKT. If rust could get something that improves on struct inheritance (generalized delegation of fields and component methods? .. maybe on tuples? .. and coercions ?) ... that would be great too. Even array sugar foo[i,j] would be preferable to dedicated bitfields .. if you get improved [] overloading maybe that type of array sugar could be of use for bitfield access? some people want slice syntax, maybe that would work. |
This comment has been minimized.
This comment has been minimized.
esbullington
commented
Apr 28, 2014
|
I've been playing around using Rust with several binary network protocols. Pattern matching on bits would be incredibly useful. I've experienced no easier parsing of binary wire protocols than when I've used OCaml's bitstring pattern matching syntax extension (based on Erlang's bitstring matching). Pure bliss. If this can be accomplished using macros, great (I still haven't explored Rust's macro system so I'm not sure). But given the interest in Rust from embedded and systems programmers, I would think this would be a great addition to the core language. Bitfields would be great, too, particularly for the embedded programmers. |
esbullington
referenced this pull request
Apr 29, 2014
Closed
support bit fields for C interop #8680
This comment has been minimized.
This comment has been minimized.
|
Does the recent |
This comment has been minimized.
This comment has been minimized.
|
No. Bit fields/bit matching are very unrelated to bitflags. On Tue, May 6, 2014 at 6:24 PM, Ben Striegel notifications@github.comwrote:
|
SimonSapin
referenced this pull request
May 10, 2014
Merged
RFC: Add byte and byte string literals #69
This comment has been minimized.
This comment has been minimized.
|
I haven't read this in detail yet but I think rust-lang/rust#12642 might be related. |
This comment has been minimized.
This comment has been minimized.
|
Thank you for the RFC. Current policy is to not have bitfields in the language itself and use syntax extensions. We currently have a Closing. |
brson
closed this
Jun 5, 2014
This comment has been minimized.
This comment has been minimized.
|
To elaborate on @brson's comment: If |
farcaller commentedApr 4, 2014
No description provided.