Bug on 32 bit systems? #56

jgm · 2023-12-06T14:26:48Z

Debian packaging ran into a problem with pandoc on i386, described at jgm/pandoc#9233. I think I've traced it to base64. Here is a Dockerfile that reproduces the segfault:

FROM i386/debian:sid
RUN echo 'deb-src http://deb.debian.org/debian/ unstable main' >> /etc/apt/sources.list
RUN apt update
RUN apt install -y cabal-install libghc-base64-dev libghc-bytestring-dev
RUN printf 'import "base64" Data.ByteString.Base64 (encodeBase64)\n\
import qualified Data.ByteString as B\n\
main = B.readFile "/usr/bin/whoami" >>= print . encodeBase64\n'\
> test.hs
RUN runghc -XPackageImports test.hs

The text was updated successfully, but these errors were encountered:

iliastsi · 2023-12-06T14:38:09Z

I reproduced this with ghc-9.0 as well (i.e., use debian:stable instead of debian:sid), in case this helps. In addition, run the base64 testsuite on 32-bits fails as well (with segmentation fault).

jgm · 2023-12-06T14:39:38Z

Nothing special here about /usr/bin/whoami - I just chose a binary I was sure would be available. If you use B.take n to trim down the size of the bytestring that is converted, you'll eventually stop getting the segfault. What I found is that:

The behavior is not entirely deterministic, as I would get different results sometimes on different runs
Between the segfault and an errorless conversion (I didn't check it for correctness), I would sometimes get errors about improper UTF-8 encoding from Data.Text.Encoding. This suggests to me that the bytestring version of the base64 string is not all ASCII, as it should be. But I didn't investigate that further.

emilypi · 2023-12-07T06:49:52Z

Hi John, I deleted all of my 32 bit code because I don't want to support it anymore. Do you have a dire need for it? Also, what version of base64?

emilypi · 2023-12-07T06:58:44Z

Looking into it, I'd wager the memory boundaries for peek got a little stricter in GHC versions in the 9.x series which makes this line:

base64/src/Data/ByteString/Base64/Internal/W64/Loop.hs

Line 51 in 7080ac0

!t <- peekWord64BE src

A little less sound. I can have a fix out rather quickly, but I'd need test data. Can you provide the offending image so i can add it as a regression? I can just add that to the test suite and support i386 in CI without adding my word32-specific code back into the repo

jgm · 2023-12-07T18:07:34Z

The Docker container above gives you everything you need to reproduce the problem. Any medium sized binary will work (I just use the whoami executable in the above test, but you could probably use any image).

As noted above, I've been able to reproduce problems (but not deterministically) with even fairly short bytestring literals.

There is no urgency on my end, because I already switched back to using base64-bytestring in pandoc.

emilypi · 2023-12-07T21:25:16Z

If performance isn't your bottleneck, that's certainly a solution. It uses the worst-case-in-every-case approach. That said, on the branch for #57 I can't get it to trigger, even with multiple 32-bit CI attempts, so maybe the fix diagnosis was spot on? I'll need a regression tho, so if you end up with something that triggers it more deterministically than not, I'm all ears. I could also just attempt to encode/decode something against whoami and the like to regress, but i have no reasonable means of testing on 32 bit systems.

jgm · 2023-12-07T22:15:55Z

With whoami, it never failed to segfault for me. I only got indeterministic results for very short bytestrings (e.g. the first 20 bytes).

Are there benchmarks of base64 vs bytestring-base64? I may go back if the problem is fixed.

emilypi · 2023-12-07T22:38:50Z

Small bytes

Okay - thanks i'll focus on that and see what i can dredge up. My guess is that when peeking a 64 byte word off, we cross some memory barrier somewhere that gets tripped in 32 bit systems when there are fewer than 2 words left in the array. The fix should target that case, since prior to the fix, we were looking only if there were at least six bytes left, not necessarily a full word, and GHC was always lax enough to say "if we attempt to read a full word and it's partial, fill with \NUL." It's actually probably saner to have this fix in regardless.

Are there benchmarks of base64 vs bytestring-base64? I may go back if the problem is fixed.

Benchmarks exist here, and do compare encodeBase64 and decodeBase64 against the base64-bytestring library. I maintain both libraries, so I regress against my worst case in this package alot of the time :)

Here's a sample of all recent output: https://github.com/emilypi/base64/blob/72cfd854ee3ba394e6dd7cfa0473d0fe542bf8ad/output.html.

The long short of it is that base64 gets away with an encode that's roughly 80%-100% faster, and decode is roughly 25%-40% faster for the typed loops, with negligible difference in the untyped loops. So, if you plan on encoding, and in particular encoding large numbers of bytes, I'd use this library. If you're primarily decoding, it's your pick. if you're only encoding small bytes, however, i'd honestly probably go with just the base64-bytestring package just because that one has fewer dependencies and the difference you'll see is minimal at best.

jgm · 2023-12-07T23:16:28Z

OK, thanks, that's helpful.

By the way, the initial image I tested with is test/lalune.jpg in the pandoc source distribution. Feel free to use that. But I think just about any image would work.

emilypi · 2023-12-08T00:27:30Z

After discussion with @kozross, I'm pretty sure the issue is as described. Merging and releasing

jgm · 2023-12-08T16:48:46Z

I don't see a new release yet?

emilypi · 2024-01-11T22:35:23Z

Got caught up in the holidays and forgot. It's up now: https://hackage.haskell.org/package/base64-1.0

emilypi · 2024-01-11T22:35:46Z

Actually, keeping this open just to confirm.

dpwiz · 2024-04-03T11:33:17Z

Don't know if that's related, but we've been hit by SIGBUS/unaligned access on 32bit armv7a device 😟
(base64-1.0 on GHC-8.10.7)

emilypi · 2024-04-03T17:57:35Z

@dpwiz could you give more about the details? My suspicion would immediately be the encoding loop in that situation, but that's probably because we use 64-bit intrinsics in that particular hotloop. This library dropped support for 32 bit arches a few years ago.

dpwiz · 2024-04-04T15:01:46Z

This library dropped support for 32 bit arches a few years ago.

Oops... Nvm then (:

emilypi · 2024-04-08T01:20:46Z

I'd still like to figure out how to solve it 😄

dpwiz · 2024-04-15T08:12:21Z

Well, that was some android mobile running 32bit/v7 chip. Apparently those can't tolerate unaligned reads, so the app got killed.
I don't know much of the details, only that SIGBUS error code and the env description. Should be reproducible in a faithful QEMU I think.

emilypi · 2024-04-15T14:10:12Z

I'll see if I can work it out - thanks for the lead @dpwiz :)

jgm mentioned this issue Dec 6, 2023

Tests fail for pandoc-3.0.1 on 32-bit architectures jgm/pandoc#9233

Closed

emilypi mentioned this issue Dec 7, 2023

[Bugfix] bug in 32-bit peek #57

Merged

emilypi closed this as completed Jan 11, 2024

emilypi reopened this Jan 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug on 32 bit systems? #56

Bug on 32 bit systems? #56

jgm commented Dec 6, 2023

iliastsi commented Dec 6, 2023

jgm commented Dec 6, 2023

emilypi commented Dec 7, 2023 •

edited

Loading

emilypi commented Dec 7, 2023 •

edited

Loading

jgm commented Dec 7, 2023

emilypi commented Dec 7, 2023

jgm commented Dec 7, 2023

emilypi commented Dec 7, 2023 •

edited

Loading

jgm commented Dec 7, 2023

emilypi commented Dec 8, 2023

jgm commented Dec 8, 2023

emilypi commented Jan 11, 2024

emilypi commented Jan 11, 2024

dpwiz commented Apr 3, 2024 •

edited

Loading

emilypi commented Apr 3, 2024 •

edited

Loading

dpwiz commented Apr 4, 2024

emilypi commented Apr 8, 2024

dpwiz commented Apr 15, 2024

emilypi commented Apr 15, 2024

Bug on 32 bit systems? #56

Bug on 32 bit systems? #56

Comments

jgm commented Dec 6, 2023

iliastsi commented Dec 6, 2023

jgm commented Dec 6, 2023

emilypi commented Dec 7, 2023 • edited Loading

emilypi commented Dec 7, 2023 • edited Loading

jgm commented Dec 7, 2023

emilypi commented Dec 7, 2023

jgm commented Dec 7, 2023

emilypi commented Dec 7, 2023 • edited Loading

jgm commented Dec 7, 2023

emilypi commented Dec 8, 2023

jgm commented Dec 8, 2023

emilypi commented Jan 11, 2024

emilypi commented Jan 11, 2024

dpwiz commented Apr 3, 2024 • edited Loading

emilypi commented Apr 3, 2024 • edited Loading

dpwiz commented Apr 4, 2024

emilypi commented Apr 8, 2024

dpwiz commented Apr 15, 2024

emilypi commented Apr 15, 2024

emilypi commented Dec 7, 2023 •

edited

Loading

emilypi commented Dec 7, 2023 •

edited

Loading

emilypi commented Dec 7, 2023 •

edited

Loading

dpwiz commented Apr 3, 2024 •

edited

Loading

emilypi commented Apr 3, 2024 •

edited

Loading