New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-29805: Support Arm64 #31
Conversation
include/lsst/sphgeom/codec.h
Outdated
#if defined(__x86_64__) | ||
// x86-64 is little endian. | ||
#if defined(__x86_64__) or defined(__aarch64__) | ||
// x86-64 and Apple Arm64 are little endian. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think __aarch64__
is Apple specific, and AArch64 is not guaranteed to be little-endian: https://developer.arm.com/documentation/102376/0100/Alignment-and-endianness. I'm unsure whether big-endian implementations are a real concern, but maybe there's a preprocessor check for little-endianness that could be added to the one for __aarch64__
(or failing that, __APPLE__
)?
Also, it might be good to provide a way to force the fallback path to get used (like NO_SIMD in other parts of the code), so it can be tested without exotic hardware or cross-compilation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @smonkewitz, both points are good. I've now gone down the rabbit hole of trying to find which compilers on which architectures define a macro that would be useful here, and it looks like at least Apple defines __LITTLE_ENDIAN__
on arm64, so that should handle the most relevant case today without being over-inclusive.
f57454b
to
6afce4f
Compare
The alternate code path was also broken, so I fixed it but it is never used. Added a check for a preprocessor macro NO_OPTIMIZED_PATHS to make it easier to test that path.
6afce4f
to
112c28f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I still worked for LSST, I would approve this. Since I don't, I feel like someone who does should also take a look...
@smonkewitz your money is, in fact, still good here :-) I approve this as well, if that is helpful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been vaguely following the GitHub posts and am happy to sign off if @smonkewitz is.
The conversation did make me think of https://commandcenter.blogspot.com/2012/04/byte-order-fallacy.html. That's probably out of scope for this ticket, if a change in philosophy is even warranted - this is not an area of expertise for me, just something relevant I ran across at one point from someone pretty darn sensible.
I generally agree with the blog post, except that performance sometimes matters. I was thinking that it might matter for these functions (e.g. if a bunch of polygons ended up stored in a database column as binary strings). But I'm not sure that's actually happening, and maybe it's not that big a cost. (I mean, I obviously didn't check what assembly gets generated for the fallback, or I would have caught the brokenness - 🤦 ). So I'd agree it's worth considering whether we actually need the fast path at all in a separate PR. |
Thanks for the panoply of reviews; merged to master. |
@ctslater Did you mean to use a user branch for this? (although looking at sphgeom it was a normal Jira ticket branch that merged so now I'm very confused by what this PR is saying) |
I created a ticket branch after creating the PR, and since we had useful conversations here I didn't want to make a new PR just to switch to the ticket branch. Both branches point to the same commit. |
Very clever. That explains why it thinks it merged even though technically this branch never did 😄 |
There was a broken code path for non-x86-64 architectures, so I fixed that and also added arm64 to the fast path.