Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Endianess #106

Open
kloetzl opened this issue Jan 7, 2019 · 2 comments
Open

Endianess #106

kloetzl opened this issue Jan 7, 2019 · 2 comments

Comments

@kloetzl
Copy link
Contributor

kloetzl commented Jan 7, 2019

Murmur3 is sensitive to the endianess of the system and thus can produce different results depending on that. On debian this leads to breakage on big-endian systems (log) (bug tracker). Can mash on a big-endian machine produce the same output as on a little-endian? May be the tests should allow for a little wiggle-room of the numbers?

Best, Fabian

// Block read - if your platform needs to do endian-swapping or can only
// handle aligned reads, do the conversion here
FORCE_INLINE uint32_t getblock32 ( const uint32_t * p, int i )
{
return p[i];
}
FORCE_INLINE uint64_t getblock64 ( const uint64_t * p, int i )
{
return p[i];
}
//-----------------------------------------------------------------------------

@ondovb
Copy link
Member

ondovb commented Mar 18, 2019

Passing tests with wiggle room would be fine fine for locally created sketches, but there would be spurious results against pre-built sketches that we distribute (like RefSeq). Including endianness in the sketch metadata to enforce compatibility is a possibility, but ideally we would want to generate the same hashes on big- and little-. I don't know how to make that happen without doing more research; any insight is appreciated!

@kloetzl
Copy link
Contributor Author

kloetzl commented Mar 18, 2019

This is kinda academic because very few people other than Debian actually use a big-endian machine. So in order to distribute mash on Debian and derivatives we now link against a portable implementation of Murmurhash. So you could do the same and incorporate the changes.

W.r.t. the sketches, yeah that's tricky. Quoting the Capnp website here:

But doesn’t that mean the encoding is platform-specific?
NO! The encoding is defined byte-for-byte independent of any platform. […] Integers use little-endian byte order because most CPUs are little-endian, and even big-endian CPUs usually have instructions for reading little-endian data.

So you should be fine there? Unfotunately, I do not have a big-endian machine for testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants