-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apple M1 build #4585
Comments
Thank you, @magnumripper! This is only subtly related, but FWIW as far as I'm aware (from user feedback) our builds made on/for Intel Macs with SSE4 do work on M1 via Rosetta 2, however those with AVX2 don't (print the "AVX2 is required" message). |
That's interesting, I'll try things like that as well (and comparing benchmarks). I haven't googled anywhere near what I need to yet, but I was also under the impression even running an ARM build wouldn't quite be "native" Silicon M1 (but "more native" than intel binaries). Lots of things to learn here. |
Simply re-linking arch.h to arm64le.h didn't help at all - same build error. |
Trying Using arm32le.h, I get DEScrypt speeds of 5334K (many salts) and 5060K (one salt). NT-opencl benchmarks at 958253K. Using arm64le.h, nothing much changed and I realized even with arm32le.h we got 64-bit bitslice. Not sure exactly what's going on there. I also noticed OpenCL figures vary a fair bit between runs, even with same auto-tune results. Umm I also see now that I didn't get an OpenMP build (native clang compiler). Even given that, the DEScrypt figures are mediocre. |
Seems I was wrong. Disabling SIMD and OpenMP on my Intel Macbook, I only get 3355K "many salts" and 3233K "one salt" |
The native gcc is just arm64 though, as is Homebrew's real gcc (v10). Using the latter, all build problems went away and I got OpenMP, and I ended up with Manually relinking arch.h to arm64le.h fixes the SIMD confusion, rendering My 8 core, 16 threads 2.3 GHz intel i9 Macbook gets 88227K (many salts) and 57212K (one salt). And a whole lot better NT speeds but it does have a "discrete" AMD Vega GPU so it's not comparable. Using the "Intel UHD Graphics 630" gets me 454800K for NT-opencl, just half the speed of the M1. |
So far, just this needed Install Xcode via App Store, then install "command-line tools":
Clone repo:
Install Homebrew, then install OpenSSL and gcc with it:
Build JtR
Until #4587 is merged, you also need |
Native gcc (clang)
So it defaults to 64-bit and to having support for NEON and/or ASIMD. It doesn't seem to need Homebrew gcc-10
It defaults to 64-bit and doesn't even allow I'm not yet sure if there's something in our |
From what I can tell, us calling it "ASIMD" instead of "NEON" is blindly from that arch.h change, but the speed difference might be from the Our platform-agnostic intrinsics triggers on |
The native gcc also doesn't change its macro output with
|
Regardless of which arch.h is used, here's the first error for native gcc
|
Back to real gcc: I now confirmed using Things that differ for DES is |
On a side note, Apple seem to reuse their universal binary thing:
We could consider that when distributing John binaries. BTW the john binary created with gcc-10 isn't arm64e:
It's just arm64. Same goes for a native gcc build:
|
From https://en.wikipedia.org/wiki/Mach-O
|
Have you tried things like Web are talking about this. PS (not clear if you are able to compile using clang). |
Snorting the gcc-10 man page, there are options applicable for this arch such as Using
Using These were some of the ones that were dropped when using native compiler with Anyway my first tests (Only comparing single-run benchmarks) of Anyway, what made me do all the above - the On some other note (of now many) I believe this CPU can access unaligned data - we might want to tweak some I got the idea to try this:
Score! So I reckon we can look for |
Thanks, that'll be yet another thing to my to-do list :-) |
Tried the following diff --git a/src/arm32le.h b/src/arm32le.h
index 9a8ff9995..a0c60c0d2 100644
--- a/src/arm32le.h
+++ b/src/arm32le.h
@@ -27,7 +27,7 @@
#define ARCH_INT_GT_32 0
#endif
-#define ARCH_ALLOWS_UNALIGNED 0
+#define ARCH_ALLOWS_UNALIGNED __ARM_FEATURE_UNALIGNED
#define ARCH_INDEX(x) ((unsigned int)(unsigned char)(x))
#define CPU_DETECT 0
diff --git a/src/arm64le.h b/src/arm64le.h
index a916cc053..63b1ed932 100644
--- a/src/arm64le.h
+++ b/src/arm64le.h
@@ -28,7 +28,7 @@
#define ARCH_INT_GT_32 0
#endif
-#define ARCH_ALLOWS_UNALIGNED 0
+#define ARCH_ALLOWS_UNALIGNED __ARM_FEATURE_UNALIGNED
#define ARCH_INDEX(x) ((unsigned int)(unsigned char)(x))
#define CPU_DETECT 0 The RAR format now compiles, and works fine.
Just half of intel speed though, comparing M1 @3.2 Ghz (well it doesn't say! Google indicates it might be 3.2 GHz) against intel core i9 @2.4 Ghz and same number of (real) cores. Come to think of it, a core or two might have been totally hogged by other things during that benchmark though. Anyway, new to-do entry would be to test penalty for unaligned... Oh and BTW I was comparing 128-bit wide ASIMD against 256 bit AVX2... |
Mixing all sorts of info in this issue, this M1 chip allegedly has four high-efficiency (as in low-power) cores and four high-performance cores. Running just one thread
versus
Clock for clock and width for width it's not too bad, I think. But I might have screwed that calculation up royally. |
To-do: Have |
Trivial fix: diff --git a/src/pseudo_intrinsics.h b/src/pseudo_intrinsics.h
index 5fabbdbb8..450e2477b 100644
--- a/src/pseudo_intrinsics.h
+++ b/src/pseudo_intrinsics.h
@@ -66,10 +66,10 @@ typedef union {
#define VLOADU_EMULATED 1
#define vor(x, y) (vtype)vorrq_u32((x).v32, (y).v32)
#define vorn(x, y) (vtype)vornq_u32((x).v32, (y).v32)
-#define vroti_epi32(x, i) (i > 0 ? (vtype)vsliq_n_u32(vshrq_n_u32((x).v32, 32 - (i)), (x).v32, i) : \
- (vtype)vsriq_n_u32(vshlq_n_u32((x).v32, 32 + (i)), (x).v32, -(i)))
-#define vroti_epi64(x, i) (i > 0 ? (vtype)vsliq_n_u64(vshrq_n_u64((x).v64, 64 - (i)), (x).v64, i) : \
- (vtype)vsriq_n_u64(vshlq_n_u64((x).v64, 64 + (i)), (x).v64, -(i)))
+#define vroti_epi32(x, i) (i > 0 ? (vtype)vsliq_n_u32(vshrq_n_u32((x).v32, 32 - ((i) & 31)), (x).v32, (i) & 31) : \
+ (vtype)vsriq_n_u32(vshlq_n_u32((x).v32, (32 + (i)) & 31), (x).v32, (-(i)) & 31))
+#define vroti_epi64(x, i) (i > 0 ? (vtype)vsliq_n_u64(vshrq_n_u64((x).v64, 64 - ((i) & 63)), (x).v64, (i) & 63) : \
+ (vtype)vsriq_n_u64(vshlq_n_u64((x).v64, (64 + (i)) & 63), (x).v64, (-(i)) & 63))
#define vroti16_epi32 vroti_epi32
#define vset1_epi32(i) (vtype)vdupq_n_u32(i)
#define vset1_epi64(i) (vtype)vdupq_n_u64(i) This solves all problems with native gcc. Native gcc (doesn't support OpenMP):
Real gcc (with OpenMP disabled):
|
…m64) It would end up picking arm32le.h due to flaws in the detecting macro. See openwall#4585
This allows the RAR format to build, where applicable. See openwall#4585
MacOS native (clang) gcc wouldn't allow rotations larger than width so we simply mask the argument correctly. Closes openwall#4585
MacOS native (clang) gcc wouldn't allow rotations larger than width so we simply mask the argument correctly. Closes openwall#4585
Tried it now, it fails because Homebrew's OpenSSL is just arm64:
I could compile OpenSSL too myself of course, but I don't think there will be much (if any) difference. Anyway, basically I just manually set these lines in the generated Makefile:
|
And just for reference, John 1.8.0.9-jumbo-1-bleeding SSE-4.1 benchmark, using Rosetta2:
Native (current bleeding)
|
…m64) It would end up picking arm32le.h due to flaws in the detecting macro. See openwall#4585
This allows the RAR format to build, where applicable. See openwall#4585
MacOS native (clang) gcc wouldn't allow rotations larger than width so we simply mask the argument correctly. Closes openwall#4585
…m64) It would end up picking arm32le.h due to flaws in the detecting macro. See #4585
This allows the RAR format to build, where applicable. See #4585
I got my hands on a Macbook M1 for a while and thought I'd sort out JtR on it. This issue will contains a bunch of notes and likely end up with a PR or two, if needed.
So, starting with a 100% pristine Macbook. Before anything else, I installed xcode and command-line tools. After that (maybe before it as well, I dunno) I had git(1) so could clone the repo. Then,
First try: Just run
./configure
and see what happens. As expected, it failed due to lack of OpenSSL.Second try: Install Homebrew OpenSSL and only that. This time, (passing LDFLAGS and CPPFLAGS as recommended from Homebrew) we passed configure stage but build failed.
It appears configure detected ARM and 64-bit but used
arm32le.h
and NEON as opposed toarm64le.h
and ASIMD.https://en.wikipedia.org/wiki/Mac_transition_to_Apple_Silicon
The text was updated successfully, but these errors were encountered: