Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apple Silicon Support #3290

Closed
6 tasks
rnlahaye opened this issue Jan 21, 2022 · 6 comments
Closed
6 tasks

Apple Silicon Support #3290

rnlahaye opened this issue Jan 21, 2022 · 6 comments

Comments

@rnlahaye
Copy link
Contributor

rnlahaye commented Jan 21, 2022

Principia should have some level of support for M1 Macs.

Right now Principia successfully runs under Rosetta2 translation (but cpuid_test fails).
Ideally, Principia would run natively. In order for this to happen, it first has to be compiled natively. This is a matter of setting the right build flags and hiding Intel intrinsics behind #if guards (I have done this locally).
However, building Principia as a native binary is not enough: native code cannot be run from a translated process, so no arm code can be loaded from KSP (which is x86_64 only). The solution for this is to launch a separate Principia process (I have not attempted this yet); the exact approach would depend on benchmark results.

(Please assign this to me)

@eggrobin
Copy link
Member

(Added indexing to your list for ease of reference.)
1 is definitely a good idea—and if someone wants to run them on an AMD chip that’s fine too. For context, this is almost certainly breaking. Not sure about the feature flags, maybe it’s saying that it doesn’t have AVX?

// This mostly checks that we are getting something from CPUID, since it is
// hard to expect things from the feature flags. This could be expanded to an
// AnyOf as needed if the tests are run on non-Intel processors.
EXPECT_THAT(CPUVendorIdentificationString(), Eq("GenuineIntel"));

2 is already the case; we run without the intrinsics in debug to check that the non-intrinsics code works too, so they are behind PRINCIPIA_USE_SSE3_INTRINSICS.

Re. 3 we should probably do a round of dependency catchup at some point.

Re.

Ideally, Principia would run natively
hiding Intel intrinsics
the exact approach would depend on benchmark results

Those (SSE2) intrinsics are used to get vector instructions (and FMA on Windows). My understanding is that Rosetta2 will translate those to their neon equivalents, so you will get a performance penalty from getting rid of them. It is unclear to me that you would get anything resembling an improvement unless you use the corresponding neon intrinsics too.

@rnlahaye
Copy link
Contributor Author

Benchmarks seem to indicate that dropping the intrinsics and running natively has some wins and some losses (comp.txt). I'll add intrinsics to the arm code and see what I get.

Regarding 2, merely including <pmmintrin.h> causes compile errors (such as error: use of undeclared identifier '__builtin_ia32_emms'; did you mean '__builtin_isless'?)

@pleroy
Copy link
Member

pleroy commented Jan 21, 2022

I am going to throw a bucket of cold water on some of the ideas here.

First, some background. As far as I am concerned macOS has been by far the most labor-intensive platform to support. We don't have easy access to a mac, and we are not familiar with that operating system, so just releasing was a constant pain until we put in place Azure pipelines. It doesn't help that, unlike Windows and Linux there is apparently no way for a user to upgrade their C++ library, so we are stuck in C++14 and must provide cheesy compatibility libraries.

Until you started contributing and answering user questions (thanks a lot for that!), we were effectively unable to support user or to address macOS-specific issues (performance, in particular). That was not nice.

Now to the specific ideas mentioned in your original message:

  • Re. 3: I don't think we should go around cherry-picking individual PRs. I also would not want to commit to catching up more than once a year: the catch-up process is excruciatingly boring, error-prone, and time-consuming, and we've never figured out a way to automate it. (A good part of the problem is that Google is pretty bad at supporting Windows.)
  • Re. 4 and 5: Fine, but note that we don't have continuous integration on M1. If it breaks, nobody will notice (other than you, maybe).
  • Re. 6: Frankly, this seems like a Terrible Idea™. First, the performance gain would probably be modest (of all the benchmarks, polynomial evaluation is probably the best predictor of the overall performance, now that discrete trajectories are efficient). Second, there will be a cost to inter-process communication (Unix sockets? shared memory?) which will probably be higher than P/Invoke, where everything happens in a single address space. But more importantly (and third) a lot of support would need to be added in the C#/C++ interface and the generator that spits out all the generated units. As the maintainer of the generator, I would not want to incur the attendant complexity and maintenance cost (it's already complicated enough).

@eggrobin
Copy link
Member

I will add, re. 6, that I am working on a new (and much faster) method for trajectory plotting which, among other things, involves allocating an array on the C# side, passing a pointer to it to the C++ side (in a C# fixed statement), and writing into it directly from C++.
I don’t see how this would work cross-process.

@lamont-granquist
Copy link
Contributor

I've actually got a handful of M1 macs now, but I think I agree that until KSP runs on Apple Silicon natively that Principia shouldn't really think about it. Cross process communication is usually janky and brittle. That may mean that it always runs under Rosetta2 if KSP never gets updated. I'd stick with fixing the cpuid_test.

@rnlahaye
Copy link
Contributor Author

Fair enough. I'll close this once #3291 is in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants