-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python indexing vs. bit indexing #156
Comments
Bitstrings are always interpreted as MSB-first, so I'm assuming you're asking for indexing for LSB-first data. Well you can use standard Python negative indices, so for LSB-first n you ask for I've never been able to think of a good interface for LSB-first bitstrings. You could have a per-module or per-bitstring flag that says 'LSB-first', but then as well as indexing differences there are changes to many of the properties ( A few people have asked though, so perhaps I should give it a bit more thought. |
Well, you understand the trouble right ( data[-n-1] works, but the minus bit : 1 1 0 0 [0] of the '1100' will be the leftmost '1', but you're expecting the Do you see now? I'm guessing that this way of indexing would break the complete library and On Wed, Feb 24, 2016 at 10:35 AM, Scott Griffiths notifications@github.com
|
Ah OK, you're talking about the LSB 0 bit numbering you linked to, which isn't the same as LSB first. MSB or LSB-first concerns the order of the bits in the bitstream, whereas MSB or LSB 0 refers to whether the first or last bit in the bitstream is the '0' bit. So for unsigned integer 12: [This is mostly for my reference]: MSB-first, MSB-0 (the default) MSB-first, LSB-0 LSB-first, LSB-0 LSB-first, MSB-0 OK, no wonder everyone gets confused (myself included). So I think you want MSB-first, LSB-0. So an 'LSB-0' mode wouldn't change any of the bitstring interpretations ( You would want Hmmm, so if a bitstring (or the module) has the new So if you say The change would also affect methods which can specify bit positions: Must be a better way... |
Right, exactly... Confusing as hell... and we did not even start on How are the bits defined? MSB-first, LSB-0. Datasheet says: bits 7-6 do No changes to internal interpretations indeed - everything python is On Wed, Feb 24, 2016 at 11:44 AM, Scott Griffiths notifications@github.com
|
Cool, although my experience is that everyone disagrees about what the 'natural' and 'obvious' ways to do things are - it depends upon your field of interest. So my other thought was that you might not expect Which hurts my head a little, but probably less than the alternative. So I think that is now back to replacing each index If you just wanted a way of using It could be a flag in the bitstring object, but that would get confusing when working with multiple bitstrings with different flags. Or it could be a module wide flag (like OK, that just about makes a sane feature request I think! |
Right, completely agree with you that using general statements like 'natural' and 'obvious' is completely useless here - just like choosing which 'endianness' is the correct one :) I forgot to talk about the indexing example you've shown, I think it's not accurate.
so 'unwrapping' the index [0:3] would be so the 'easier to understand' way of doing the slicing would be from left to right, not right to left: You're right that mixing two styles of indexing in one program would get very confusing.. so maybe a module-wide flag could be an option |
I figured I'd just chime in here. I'm using bitstring for parsing of some registers on a system-on-module. I (incorrectly) assumed that index 0 would correspond to the LSB, as that is pretty standard across the electronic engineering world. I very rarely come across datasheets that reference bit 0 (or 1) as the MSB. A modulewide flag to change the behaviour would be awesome. |
I had a need for this some time back and emailed Scott about it at the time. Since then I've mostly-solved my own problem by monkeypatching bitstring to add a .lbin interpreter/constructor arg, which does the same thing as .bin but reverses the bits in each byte; I can't say that's the best solution but it worked for me. It probably does strange things with slices. I had a bit of an epiphany this morning about how these things come up:
This seems like a pretty common pattern for packing multiple boolean values into a single byte. If the flags are mostly unrelated, it doesn't matter, but if the order is important somewhere else (e.g. something references flag #n rather than specifically FLAG4), you need LSB-0 indexing or things break. I would expect this to be more common than the alternative in most binaries, though people tend to write MSB-first instead...presumably because that's how we write decimal numbers. One peculiarity of my own case is that I sometimes have to deal with LSB-0 bitfields and sometimes MSB-0. I've yet to come across both in the same program run but I can't be sure it won't happen. (another peculiarity is that I'm more concerned with the order that comes out of .bin rather than slicing order, but there are ways around that) Thirding the request for this feature or something like it; my monkeypatch works but it's a bit of a wart. |
I'd like to voice my support for this issue. I completely agree with @kubaraczkowski that the slicing format should be LSB=0. To @scott-griffiths point on Feb. 24, the display should stay as-is (MSB-first), but when slicing, This is very much a standard (and some would say sacred) way of accessing bits across all of computer science. I'm going to take a stab at modifying the backend code to support this, as I need the functionality this week :) No idea if it will be worthy of submitting a pull request, quality-wise. We'll see, I'm up for the challenge. |
This functionality would be very valuable, but I would hate to see it directed by flags. What do you think of exposing an LSB-0 accessor as a separate attribute next to It could work like this:
So you have to be aware that this indexing works inversely compared to regular Python indexing, but that's exactly what we want to have. One could debate whether ' |
The above functionality actually uses very little code:
|
Hacked my own LSB extractor as a result of device data being packed in this manner. def _subbits_lsb(bit_string, start_pos, num_bits, extract_fmt):
"""
Take a string of bits (E.g. from BitStream(...).bin and extact uints reading from the right )
:param start_pos: the start position, from the RIGHT, where RIGHT-most bit is position 0
:param num_bits: the number of bits to collect, RIGHTWARDS
:param extract_fmt: Bitstring read() format
"""
f_pos = 0 - num_bits - start_pos
e_pos = len(bit_string) - start_pos
sub_bits = bit_string[f_pos:e_pos]
print bit_string, sub_bits, f_pos, e_pos
return BitStream(bin=sub_bits).read(extract_fmt) Tests assert _subbits_lsb('0001', 0, 2, 'uint') == 1
assert _subbits_lsb('110000', 4, 2, 'uint') == 3
assert _subbits_lsb('0011110000', 6, 3, 'uint') == 3
assert _subbits_lsb('0000101000000', 6, 3, 'uint') == 5 |
Except you mean "the number of bits to collect, LEFTWARDS". |
Yes, It definitely does collect R/W, sorry. Dang copy and paste omission from the lefty version. |
This is quite an old thread, but finally there is some movement. I've just released 3.1.7, which has an experimental Least Significant Bit Zero mode. If anyone has any feedback please don't hesitate. From the release notes: Experimental LSB0 modeThis feature allows bitstring to use Least Significant Bit Zero To switch from the default MSB0, use the module level function
Getting and setting bits should work in this release, as will some Slicing is still done with the start bit smaller than the end bit.
Negative indices work as (hopefully) you'd expect, with the first stored |
I still have a need for this and I want to check it out and give feedback, but I'm booked for the next couple weeks. How long do you expect it to stay experimental before finalizing the interface? |
Oh it will be at least a few weeks, quite possibly a couple of months. Only got round to it now because I had time off work and there's not much else to do! There are still a whole load of untested methods that use the start and end positions, plus BitStream methods that have the concept of a current bit position. So I need to write maybe 30 more tests and fix everything. I just noticed that even append/prepend which I thought were working aren't - I wrote the test the wrong way round. It's sometimes difficult to get used to LSB0. |
Okay, I had a look. Either something is still wrong, or we have different ideas about correct behavior. I created a couple of Bits objects to test with, like this:
Then I tried a few variations on interpretation. It seemed mostly-right for the one-byte case, and mostly-wrong for the multi-byte case. Here and below, when I refer to bit #n I mean the nth-least-significant-bit of a given byte; when I refer to byte #n, I mean the nth byte of the
I can see an argument for the It's the rest that confuses me. I haven't looked at the code, but it looks to me like the lsb0 mode now indexes bits like this -- assuming a four-byte bitstring created as
When I expected this:
That is, the new mode isn't just changing the indexing of bits-within-a-byte, it's changing the order of bytes within the stream. |
I just noticed a concise way to define that last part:
(I think I have that right...[edit: note that I don't know if others in this thread had the same expectation]) Unrelated: indexing and iteration don't match. That is: |
Yes, this is complex. Partly because some things are known not to work. So we have I think you may have had the expectation that on the creation of I won't go through every case as there are still bugs (notably here the
I wouldn't expect the commutativity you mention. Standard Python indexing is left-to-right, so when you use Hope that makes sense ! I think that is all OK (except for interpretation bugs), but good to run through sanity checks. Feel free to argue though! Thanks. |
Pardon the delay again. We still have a disconnect, but I think I've figured out what it is. I think this part shows that we have very different assumptions about what 'lsb0 mode' should do:
...because that seems very wrong to me, and I want to make sure I understand you correctly. Disclaimer: everything below here is me trying to guess what you're thinking; if I'm wrong, correct me: value_in = 1
value_out = Bits(bytes([value_in])).uint
# does value_in == value_out? It seems obvious to me that the value I get out should be the same as the value I put in -- after all, I haven't changed it. But 128 is what you would get if you reversed the significance of the bits, and I think that might be where our disconnect is. When I refer to lsb0, I'm talking about the order in which bits are indexed, sliced, and iterated. Not which bit in a given byte is most or least significant, but which bit is labeled as bit '0'. Based on the example above, it looks like you expect 'lsb0 mode' to reverse the significance of the bits themselves -- changing not their labeling, but their interpretation. That's not what I wanted, and it's not how I interpret the relevant section of wikipedia (sorry, I couldn't find a better source). Have I understood you so far? I'm happy to argue about correct behavior, but I want to make sure I'm arguing against the right thing. |
Hi, I just read through this thread and I greately appreciate the LSB0 indexing-mode. The last few posts might have introduced some confusion though... From my view (as I am a Computer-Engineer) the bit-indexing and uint-conversion (assuming big-endian meaning most significant byte first) are just fine as expected! Especially What @andrew-vant might have expected is |
Hi, I've just run into this issue while trying to read gzip bitstreams with this library. I'd like to say that I agree (I think) with Andrew, that reversing the order of the bytes is not what is expected, although my reasons are a bit different. A string of bytes does not have an "endianness", that only exists when it's converted to/from an integer. A string of bytes has exactly one expected direction, going from the 0th byte to the nth. So if I write As it is now, the changes work perfectly if I append one byte at a time. Exactly as expected. But if I write more than one byte at once, the byte string is reversed. This has a direct performance impact and a direct dev frustration impact. The way that lsb0 currently works also likely has a performance impact for appending, because internally the strings are being prepended (which is usually slower, but I've not benchmarked it). If that's still not clear (I don't blame you if so), here's a section from RFC 1952 that explains the common programming convention:
Would you be open to a PR that demonstrates the "correct" way? |
Version 4.0 now has documented lsb0 support, although I've still put a beta tag on it as there are still likely to be issues when using streaming methods in some cases. I think most use cases will be working though, so I'm closing this issue as done, and any new bugs can be reported as new issues. Thank you all for your patience and comments on this. |
One feature that would be great with bitstring would be to be able to use 'normal bit indexing' (https://en.wikipedia.org/wiki/Bit_numbering).
All is nice for LSB-first data, but that's not default.
When having a MSB-first bitstring, index 0 goes to MSB, which is ... inconvenient. Sure, it is compliant with python indexing (like a list), but it's often not what you want to do when working with binary data.
For instance: "please give me bit 2 out of this byte of data" - in "bit indexing" it would be data[2](counting from 0!), but in bitstring (and python in general) you're forced to do data[5] (or generally data[len(data) - ix + 1]...
Would implementing a proper indexing for MSB-first data possible? Or am I missing something that's already in bitstring?
The text was updated successfully, but these errors were encountered: