Skip to content
This repository has been archived by the owner on Sep 20, 2023. It is now read-only.

Portability: add support for big endian and need-aligned-load-store architectures #26

Closed
vincenthz opened this issue Jul 8, 2015 · 17 comments
Labels

Comments

@vincenthz
Copy link
Member

tracking issue for supporting more architectures
#24 #25

@nomeata
Copy link
Contributor

nomeata commented Aug 22, 2015

Just a little factoid to create additional motivation to fix this issue: I just noticed that you have become a dependency of git-annex, which is a popular tool on little NAS devices which often run arm or mips.

@egrimley
Copy link
Contributor

@nomeata
Copy link
Contributor

nomeata commented Aug 26, 2015

I applied that patch to the Debian directory, and it seems to help on arm64, armel, kfreebsd-i386, ppc64el; compare https://buildd.debian.org/status/logs.php?pkg=haskell-cryptonite&ver=0.6-1&suite=experimental with https://buildd.debian.org/status/logs.php?pkg=haskell-cryptonite&ver=0.6-2&suite=experimental

It still fails for the big-endian architectures: powerpc, s390x, ppc64.

It also uncovered a bug that quickcheck seems to find only occasionally and that seems to be independent of this patch, I opened issue #29 for it.

This was referenced Aug 27, 2015
@vincenthz
Copy link
Member Author

@egrimley: the intent of the arch section detecting the endianness, if for endian-constant architecture like x86/amd64 to not have to go through any 'if byteOrder then x else y' branching code. Sadly, with some personal experiments in, an unrelated, but similar case, ghc was not able to optimise endianness away. For non-endian-const architecture like mips/arm, the goal was to do this if byteorder then .. else .. business (not to autodetect/hardcode the endianness of the arch)

Also one more thing, cryptonite shouldn't depends on any extra packages.

@egrimley
Copy link
Contributor

If I understand correctly, you're suggesting that the cabal file distinguish three cases (little-endian arch, big-endian arch, variable-endian arch) and then the Haskell code, likewise, have three implementations. That sounds rather complex and difficult to maintain. It might be worth checking if you can, somehow, get GHC to optimise away the endianness test, and then include the appropriate part of the byteorder package (which is tiny) if you don't want the additional dependency.
The funny thing is that when I write C I don't use preprocessor endianness tests at all. I just write stuff like
static inline uint32_t read32le(unsigned char *b) { return b[0] | (uint32_t)b[1] << 8 | (uint32_t)b[2] << 16 | (uint32_t)b[3] << 24; }
and expect the compiler to do the right thing. GCC 5.2.1 does, in fact, turn that code into a single load instruction on (little-endian) AArch64.

@nomeata
Copy link
Contributor

nomeata commented Aug 27, 2015

Also one more thing, cryptonite shouldn't depends on any extra packages.

It seems that the functionality is also available in your memory package that you depend on anyways: http://hackage.haskell.org/package/memory-0.8/docs/Data-Memory-Endian.html#v:getSystemEndianness
And looking at that implementation, with getSystemEndianness = LittleEndian set depending on CPP flags, I sincerly expect GHC to optimize a pattern match on that away. This is different from the byteorder package, where byteorder is always defined in terms of some unsafePerformIO.

@vincenthz
Copy link
Member Author

Sincerely expecting and reality often mismatch in the case of GHC (specially considering the broad versions supported); I've seen some pretty horrible stuff happening, so I've learned to be paranoid/careful.

@egrimley
Copy link
Contributor

From https://github.com/vincenthz/hs-memory/blob/master/memory.cabal I get the impression that memory might have the same problems as cryptonite.

If there is an easy way of getting GHC to optimise away endianness tests then it doesn't seem to be widely known. This page only suggests template Haskell:

http://stackoverflow.com/questions/29349835/conditionally-compiling-based-on-endianness

So perhaps a good approach would be to use an inefficient but portable run-time endianness test but push the test up the call tree so that it doesn't hurt performance (much). So instead of having be32Prim test for endianness you'd get mutableArray32FromAddrBE to test for endianness and select between two implementations, with and without byteswapping.

@vincenthz
Copy link
Member Author

@egrimley I've never seen the unoptimised 8 bit access, optimised in 32 bits access before in the case of gcc, but I haven't done any benchmark of that sort of optimisation in many years ;). also by default, I would just write the optimised case without thinking about it really.

GHC is much more archaic in those low-level machine optimisation

@nomeata
Copy link
Contributor

nomeata commented Aug 27, 2015

Sincerely expecting and reality often mismatch in the case of GHC (specially considering the broad versions supported); I've seen some pretty horrible stuff happening, so I've learned to be paranoid/careful.

Ok, I checked: This code

foo = case getSystemEndianness of
    LittleEndian -> "le"
    BigEndian -> "be"

main = putStrLn foo 

optimizes the check away, leaving Main.main2 = GHC.CString.unpackCString# "le"# in the simplified Core.

@egrimley
Copy link
Contributor

@vincenthz I hope you realise that the "optimised" C code you're referring to is probably incorrect and that a grumpy C compiler could do something very nasty with it, like assume that the bad code is unreachable and optimise the entire program on that assumption, along the lines of

if (x) *(uint32_t *)p = ...

"That memory access would be illegal, so now we know that x must be zero, so..."

A few years ago you wouldn't have to worry about that but modern C compilers really do stuff like that.

@egrimley
Copy link
Contributor

@nomeata So Data.Memory.Endian does the job, and it seems to do it by #including "MachDeps.h", which is supplied by GHC. That's the quick and easy solution, then. Thanks.

@vincenthz
Copy link
Member Author

@egrimley: it's incorrect to use {,double,quad}word memory access ? Can you point me to relevant informations ?

@egrimley
Copy link
Contributor

In general you're not allowed to dereference a pointer unless the pointer value was obtained by taking the address of an object with a compatible type. So I was thinking of a situation in which the compiler can see that p was not obtained thus. You're pretty safe, in practice, if p is an argument passed from code that was generated separately, though if you have char *p and you use both *(int *)p and *(int *)(p + 15) then the compiler can deduce that one of those is unaligned and therefore undefined (regardless of whether the hardware allows unaligned memory access). Undefined behaviour invalidates the entire execution, which is why the compiler can assume that x is zero not just after the if (x) *(uint32_t *)p = ... but beforehand, too. By the way, I'm not an expert language lawyer, so I might be erring on the side of caution here...

@vincenthz
Copy link
Member Author

unaligned access is never undefined. the compiler could maybe decide to change the access method (switch to instructions that uses bytes), but I don't really see the compiler removing instructions because it thinks they are invalid.

@vincenthz
Copy link
Member Author

@nomeata I think most issues should be fixed, any chance you can check that the latest version work properly for second tier archs ?

@egrimley
Copy link
Contributor

egrimley commented Sep 7, 2015 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants