Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

minimal compiler changes so that you can compile to nvptx #1

Closed
wants to merge 2,860 commits into from
Closed

Conversation

gwenzek
Copy link
Owner

@gwenzek gwenzek commented Nov 2, 2021

This branch is a follow up on ziglang#10064

I'm trying to add Nvptx support to Zig stage 2.
I'll be documenting my progress/questions on the main issue.

sample command:

/home/guw/github/zig/stage2/bin/zig build-obj cuda_kernel.zig -target nvptx64-cuda -O ReleaseSafe -femit-llvm-ir
this will create a kernel.ptx and a kernel.ll (LLVM IR)

Right now the kernel.ptx doesn't actually contain .ptx code,
I need to dig deeper in LLVM docs to find how to ouput the .ptx

Vexu and others added 24 commits October 28, 2022 13:31
Similar to what was done for EdDSA, allow incremental creation
and verification of ECDSA signatures.

Doing so for ECDSA is trivial, and can be useful for TLS as well
as the future package manager.
These utility functions allow reading from (stage2) packed memory at
runtime-known offsets.
Packed memory has a well-defined layout that doesn't require
conversion from an integer to read from. Let's use it :-)

This change means that for bitcasting to/from a packed value that
is N layers deep, we no longer have to create N temporary big-ints
and perform N copies.

Other miscellaneous improvements:
  - Adds support for casting to packed enums and vectors
  - Fixes bitcasting to/from vectors outside of a packed struct
  - Adds a fast path for bitcasting <= u/i64
  - Fixes bug when bitcasting f80 which would clear following fields

This also changes the bitcast memory layout of exotic integers on
big-endian systems to match what's empirically observed on our targets.
Technically, this layout is not guaranteed by LLVM so we should probably
ban bitcasts that reveal these padding bits, but for now this is an
improvement.
Further enhance explanation of why expression is evaluated at comptime
Perform C-style arithmetic conversions on operands to division operator
in macros

Closes ziglang#13162
There's probably plenty of room to optimize these further in the
future, but for the moment this gives ~3x improvement on Intel
x86-64 processors, ~5x on AMD, and ~10x on M1 Macs.

These extensions are very new - Most processors prior to 2020 do
not support them.

AVX-512 is a slightly older alternative that we could use on Intel
for a much bigger performance bump, but it's been fused off on
Intel's latest hybrid architectures and it relies on computing
independent SHA hashes in parallel. In contrast, these SHA intrinsics
provide the usual single-threaded, single-stream interface, and should
continue working on new processors.

AArch64 also has SHA-512 intrinsics that we could take advantage
of in the future
This feature detection must be done at comptime so that we avoid
generating invalid ASM for the target.
This gets us most of the way back to the performance I had when
I was using the LLVM intrinsics:
  - Intel Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz:
       190.67 MB/s (w/o intrinsics) -> 1285.08 MB/s
  - AMD EPYC 7763 (VM) @ 2.45 GHz:
       240.09 MB/s (w/o intrinsics) -> 1360.78 MB/s
  - Apple M1:
       216.96 MB/s (w/o intrinsics) -> 2133.69 MB/s

Minor changes to this source can swing performance from 400 MB/s to
1400 MB/s or... 20 MB/s, depending on how it interacts with the
optimizer. I have a sneaking suspicion that despite LLVM inheriting
GCC's extremely strict inline assembly semantics, its passes are
rather skittish around inline assembly (and almost certainly, its
instruction cost models can assume nothing)
Comptime code can't execute assembly code, so we need some way to
force comptime code to use the generic path. This should be replaced
with whatever is implemented for ziglang#868, when that day comes.

I am seeing that the result for the hash is incorrect in stage1 and
crashes stage2, so presumably this never worked correctly. I will follow
up on that soon.
This also fixes a bug where the feature gating was not taking
effect at comptime due to ziglang#6768
Introduce `std.mem.readPackedInt` and improve bitcasting of packed memory layouts
crypto.sha2: Use intrinsics for SHA-256 on x86-64 and AArch64
Vexu and others added 27 commits November 12, 2022 15:41
Just checking that they aren't comptime isn't enough for `@Type` constructed tuples.

Closes ziglang#13531
… format strings (ziglang#13526)

* Export invalidFmtErr

To allow consistent use of "invalid format string" compile error
response for badly formatted format strings.

See ziglang#13489 (comment).

* Replace format compile errors with invalidFmtErr

- Provides more consistent compile errors.
- Gives user info about the type of the badly formated value.

* Rename invalidFmtErr as invalidFmtError

For consistency. Zig seems to use “Error” more often than “Err”.

* std: add invalid format string checks to remaining custom formatters

* pass reference-trace to comp when building build file; fix checkobjectstep
* x/os/Reactor: implement remove function

* x/os/Reactor: update tests
These parameters are only ever needed when `std.builtin` is out of sync
with the compiler in which case panicking is the only valid operation
anyways. Removing them causes a domino effect of functions no longer
needing a `src` and/or a `block` parameter resulting in handling
compilation errors where they are actually meaningful becoming simpler.
Partially implements ziglang#13528. Enough to unblock the wasi-bootstrap
branch.
C backend: improve ergonomics of zig.h a little bit
In the process of 'remaining blocks',
the length of processed message can be from 1 to 79.
The value of 'n-1' is ranged from 0 to 3.
So, st.hx[i] must be initialized at least from st.hx[0] to st.hx[3]
)

* point to init part of field delc when that's where the error occurs

* update test to reflect fixed error message

* only lookup source location in case of error
This would only fail to compile when building *on* WASI.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet