forked from ziglang/zig
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
minimal compiler changes so that you can compile to nvptx #1
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
328a665
to
0407552
Compare
Similar to what was done for EdDSA, allow incremental creation and verification of ECDSA signatures. Doing so for ECDSA is trivial, and can be useful for TLS as well as the future package manager.
These utility functions allow reading from (stage2) packed memory at runtime-known offsets.
Packed memory has a well-defined layout that doesn't require conversion from an integer to read from. Let's use it :-) This change means that for bitcasting to/from a packed value that is N layers deep, we no longer have to create N temporary big-ints and perform N copies. Other miscellaneous improvements: - Adds support for casting to packed enums and vectors - Fixes bitcasting to/from vectors outside of a packed struct - Adds a fast path for bitcasting <= u/i64 - Fixes bug when bitcasting f80 which would clear following fields This also changes the bitcast memory layout of exotic integers on big-endian systems to match what's empirically observed on our targets. Technically, this layout is not guaranteed by LLVM so we should probably ban bitcasts that reveal these padding bits, but for now this is an improvement.
Further enhance explanation of why expression is evaluated at comptime
Perform C-style arithmetic conversions on operands to division operator in macros Closes ziglang#13162
There's probably plenty of room to optimize these further in the future, but for the moment this gives ~3x improvement on Intel x86-64 processors, ~5x on AMD, and ~10x on M1 Macs. These extensions are very new - Most processors prior to 2020 do not support them. AVX-512 is a slightly older alternative that we could use on Intel for a much bigger performance bump, but it's been fused off on Intel's latest hybrid architectures and it relies on computing independent SHA hashes in parallel. In contrast, these SHA intrinsics provide the usual single-threaded, single-stream interface, and should continue working on new processors. AArch64 also has SHA-512 intrinsics that we could take advantage of in the future
This feature detection must be done at comptime so that we avoid generating invalid ASM for the target.
This gets us most of the way back to the performance I had when I was using the LLVM intrinsics: - Intel Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz: 190.67 MB/s (w/o intrinsics) -> 1285.08 MB/s - AMD EPYC 7763 (VM) @ 2.45 GHz: 240.09 MB/s (w/o intrinsics) -> 1360.78 MB/s - Apple M1: 216.96 MB/s (w/o intrinsics) -> 2133.69 MB/s Minor changes to this source can swing performance from 400 MB/s to 1400 MB/s or... 20 MB/s, depending on how it interacts with the optimizer. I have a sneaking suspicion that despite LLVM inheriting GCC's extremely strict inline assembly semantics, its passes are rather skittish around inline assembly (and almost certainly, its instruction cost models can assume nothing)
Comptime code can't execute assembly code, so we need some way to force comptime code to use the generic path. This should be replaced with whatever is implemented for ziglang#868, when that day comes. I am seeing that the result for the hash is incorrect in stage1 and crashes stage2, so presumably this never worked correctly. I will follow up on that soon.
This also fixes a bug where the feature gating was not taking effect at comptime due to ziglang#6768
Introduce `std.mem.readPackedInt` and improve bitcasting of packed memory layouts
crypto.sha2: Use intrinsics for SHA-256 on x86-64 and AArch64
Just checking that they aren't comptime isn't enough for `@Type` constructed tuples. Closes ziglang#13531
… format strings (ziglang#13526) * Export invalidFmtErr To allow consistent use of "invalid format string" compile error response for badly formatted format strings. See ziglang#13489 (comment). * Replace format compile errors with invalidFmtErr - Provides more consistent compile errors. - Gives user info about the type of the badly formated value. * Rename invalidFmtErr as invalidFmtError For consistency. Zig seems to use “Error” more often than “Err”. * std: add invalid format string checks to remaining custom formatters * pass reference-trace to comp when building build file; fix checkobjectstep
Stage2 bug fixes
* x/os/Reactor: implement remove function * x/os/Reactor: update tests
These parameters are only ever needed when `std.builtin` is out of sync with the compiler in which case panicking is the only valid operation anyways. Removing them causes a domino effect of functions no longer needing a `src` and/or a `block` parameter resulting in handling compilation errors where they are actually meaningful becoming simpler.
Partially implements ziglang#13528. Enough to unblock the wasi-bootstrap branch.
C backend: improve ergonomics of zig.h a little bit
In the process of 'remaining blocks', the length of processed message can be from 1 to 79. The value of 'n-1' is ranged from 0 to 3. So, st.hx[i] must be initialized at least from st.hx[0] to st.hx[3]
…g#13518) state: State -> state: *const State Suggested by @nektro Fixes ziglang#13510
This would only fail to compile when building *on* WASI.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This branch is a follow up on ziglang#10064
I'm trying to add Nvptx support to Zig stage 2.
I'll be documenting my progress/questions on the main issue.
sample command:
/home/guw/github/zig/stage2/bin/zig build-obj cuda_kernel.zig -target nvptx64-cuda -O ReleaseSafe -femit-llvm-ir
this will create a kernel.ptx and a kernel.ll (LLVM IR)
Right now the kernel.ptx doesn't actually contain .ptx code,
I need to dig deeper in LLVM docs to find how to ouput the .ptx