Portability: add support for big endian and need-aligned-load-store architectures #26

vincenthz · 2015-07-08T06:00:54Z

tracking issue for supporting more architectures
#24 #25

nomeata · 2015-08-22T18:43:07Z

Just a little factoid to create additional motivation to fix this issue: I just noticed that you have become a dependency of git-annex, which is a popular tool on little NAS devices which often run arm or mips.

egrimley · 2015-08-25T08:49:07Z

See https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=796800
for a patch.

nomeata · 2015-08-26T07:54:20Z

I applied that patch to the Debian directory, and it seems to help on arm64, armel, kfreebsd-i386, ppc64el; compare https://buildd.debian.org/status/logs.php?pkg=haskell-cryptonite&ver=0.6-1&suite=experimental with https://buildd.debian.org/status/logs.php?pkg=haskell-cryptonite&ver=0.6-2&suite=experimental

It still fails for the big-endian architectures: powerpc, s390x, ppc64.

It also uncovered a bug that quickcheck seems to find only occasionally and that seems to be independent of this patch, I opened issue #29 for it.

vincenthz · 2015-08-27T09:31:16Z

@egrimley: the intent of the arch section detecting the endianness, if for endian-constant architecture like x86/amd64 to not have to go through any 'if byteOrder then x else y' branching code. Sadly, with some personal experiments in, an unrelated, but similar case, ghc was not able to optimise endianness away. For non-endian-const architecture like mips/arm, the goal was to do this if byteorder then .. else .. business (not to autodetect/hardcode the endianness of the arch)

Also one more thing, cryptonite shouldn't depends on any extra packages.

egrimley · 2015-08-27T09:52:02Z

If I understand correctly, you're suggesting that the cabal file distinguish three cases (little-endian arch, big-endian arch, variable-endian arch) and then the Haskell code, likewise, have three implementations. That sounds rather complex and difficult to maintain. It might be worth checking if you can, somehow, get GHC to optimise away the endianness test, and then include the appropriate part of the byteorder package (which is tiny) if you don't want the additional dependency.
The funny thing is that when I write C I don't use preprocessor endianness tests at all. I just write stuff like
static inline uint32_t read32le(unsigned char *b) { return b[0] | (uint32_t)b[1] << 8 | (uint32_t)b[2] << 16 | (uint32_t)b[3] << 24; }
and expect the compiler to do the right thing. GCC 5.2.1 does, in fact, turn that code into a single load instruction on (little-endian) AArch64.

nomeata · 2015-08-27T10:27:13Z

Also one more thing, cryptonite shouldn't depends on any extra packages.

It seems that the functionality is also available in your memory package that you depend on anyways: http://hackage.haskell.org/package/memory-0.8/docs/Data-Memory-Endian.html#v:getSystemEndianness
And looking at that implementation, with getSystemEndianness = LittleEndian set depending on CPP flags, I sincerly expect GHC to optimize a pattern match on that away. This is different from the byteorder package, where byteorder is always defined in terms of some unsafePerformIO.

vincenthz · 2015-08-27T10:35:54Z

Sincerely expecting and reality often mismatch in the case of GHC (specially considering the broad versions supported); I've seen some pretty horrible stuff happening, so I've learned to be paranoid/careful.

egrimley · 2015-08-27T10:39:52Z

From https://github.com/vincenthz/hs-memory/blob/master/memory.cabal I get the impression that memory might have the same problems as cryptonite.

If there is an easy way of getting GHC to optimise away endianness tests then it doesn't seem to be widely known. This page only suggests template Haskell:

http://stackoverflow.com/questions/29349835/conditionally-compiling-based-on-endianness

So perhaps a good approach would be to use an inefficient but portable run-time endianness test but push the test up the call tree so that it doesn't hurt performance (much). So instead of having be32Prim test for endianness you'd get mutableArray32FromAddrBE to test for endianness and select between two implementations, with and without byteswapping.

vincenthz · 2015-08-27T10:43:11Z

@egrimley I've never seen the unoptimised 8 bit access, optimised in 32 bits access before in the case of gcc, but I haven't done any benchmark of that sort of optimisation in many years ;). also by default, I would just write the optimised case without thinking about it really.

GHC is much more archaic in those low-level machine optimisation

nomeata · 2015-08-27T10:49:29Z

Sincerely expecting and reality often mismatch in the case of GHC (specially considering the broad versions supported); I've seen some pretty horrible stuff happening, so I've learned to be paranoid/careful.

Ok, I checked: This code

foo = case getSystemEndianness of
    LittleEndian -> "le"
    BigEndian -> "be"

main = putStrLn foo

optimizes the check away, leaving Main.main2 = GHC.CString.unpackCString# "le"# in the simplified Core.

egrimley · 2015-08-27T11:06:26Z

@vincenthz I hope you realise that the "optimised" C code you're referring to is probably incorrect and that a grumpy C compiler could do something very nasty with it, like assume that the bad code is unreachable and optimise the entire program on that assumption, along the lines of

if (x) *(uint32_t *)p = ...

"That memory access would be illegal, so now we know that x must be zero, so..."

A few years ago you wouldn't have to worry about that but modern C compilers really do stuff like that.

egrimley · 2015-08-27T11:10:22Z

@nomeata So Data.Memory.Endian does the job, and it seems to do it by #including "MachDeps.h", which is supplied by GHC. That's the quick and easy solution, then. Thanks.

vincenthz · 2015-08-27T13:09:27Z

@egrimley: it's incorrect to use {,double,quad}word memory access ? Can you point me to relevant informations ?

egrimley · 2015-08-27T13:52:26Z

In general you're not allowed to dereference a pointer unless the pointer value was obtained by taking the address of an object with a compatible type. So I was thinking of a situation in which the compiler can see that p was not obtained thus. You're pretty safe, in practice, if p is an argument passed from code that was generated separately, though if you have char *p and you use both *(int *)p and *(int *)(p + 15) then the compiler can deduce that one of those is unaligned and therefore undefined (regardless of whether the hardware allows unaligned memory access). Undefined behaviour invalidates the entire execution, which is why the compiler can assume that x is zero not just after the if (x) *(uint32_t *)p = ... but beforehand, too. By the way, I'm not an expert language lawyer, so I might be erring on the side of caution here...

vincenthz · 2015-08-28T06:35:31Z

unaligned access is never undefined. the compiler could maybe decide to change the access method (switch to instructions that uses bytes), but I don't really see the compiler removing instructions because it thinks they are invalid.

vincenthz · 2015-09-07T09:28:06Z

@nomeata I think most issues should be fixed, any chance you can check that the latest version work properly for second tier archs ?

egrimley · 2015-09-07T11:03:11Z

cryptonite-v0.7 seems to require memory >=0.8 (for B.snoc and Base64OpenBSD) while Debian only has haskell-memory 0.7-3 in experimental so it might be easier to wait a bit.

vincenthz added the wishlist label Jul 8, 2015

This was referenced Aug 27, 2015

integerify on big-endian-machines #30

Closed

Use mkLE instead of LE #31

Closed

vincenthz closed this as completed Nov 7, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Portability: add support for big endian and need-aligned-load-store architectures #26

Portability: add support for big endian and need-aligned-load-store architectures #26

vincenthz commented Jul 8, 2015

nomeata commented Aug 22, 2015

egrimley commented Aug 25, 2015

nomeata commented Aug 26, 2015

vincenthz commented Aug 27, 2015

egrimley commented Aug 27, 2015

nomeata commented Aug 27, 2015

vincenthz commented Aug 27, 2015

egrimley commented Aug 27, 2015

vincenthz commented Aug 27, 2015

nomeata commented Aug 27, 2015

egrimley commented Aug 27, 2015

egrimley commented Aug 27, 2015

vincenthz commented Aug 27, 2015

egrimley commented Aug 27, 2015

vincenthz commented Aug 28, 2015

vincenthz commented Sep 7, 2015

egrimley commented Sep 7, 2015 via email

Portability: add support for big endian and need-aligned-load-store architectures #26

Portability: add support for big endian and need-aligned-load-store architectures #26

Comments

vincenthz commented Jul 8, 2015

nomeata commented Aug 22, 2015

egrimley commented Aug 25, 2015

nomeata commented Aug 26, 2015

vincenthz commented Aug 27, 2015

egrimley commented Aug 27, 2015

nomeata commented Aug 27, 2015

vincenthz commented Aug 27, 2015

egrimley commented Aug 27, 2015

vincenthz commented Aug 27, 2015

nomeata commented Aug 27, 2015

egrimley commented Aug 27, 2015

egrimley commented Aug 27, 2015

vincenthz commented Aug 27, 2015

egrimley commented Aug 27, 2015

vincenthz commented Aug 28, 2015

vincenthz commented Sep 7, 2015

egrimley commented Sep 7, 2015 via email