minimal compiler changes so that you can compile to nvptx #1

gwenzek · 2021-11-02T09:49:56Z

This branch is a follow up on ziglang#10064

I'm trying to add Nvptx support to Zig stage 2.
I'll be documenting my progress/questions on the main issue.

sample command:

/home/guw/github/zig/stage2/bin/zig build-obj cuda_kernel.zig -target nvptx64-cuda -O ReleaseSafe -femit-llvm-ir
this will create a kernel.ptx and a kernel.ll (LLVM IR)

Right now the kernel.ptx doesn't actually contain .ptx code,
I need to dig deeper in LLVM docs to find how to ouput the .ptx

Closes ziglang#13325

Similar to what was done for EdDSA, allow incremental creation and verification of ECDSA signatures. Doing so for ECDSA is trivial, and can be useful for TLS as well as the future package manager.

These utility functions allow reading from (stage2) packed memory at runtime-known offsets.

Packed memory has a well-defined layout that doesn't require conversion from an integer to read from. Let's use it :-) This change means that for bitcasting to/from a packed value that is N layers deep, we no longer have to create N temporary big-ints and perform N copies. Other miscellaneous improvements: - Adds support for casting to packed enums and vectors - Fixes bitcasting to/from vectors outside of a packed struct - Adds a fast path for bitcasting <= u/i64 - Fixes bug when bitcasting f80 which would clear following fields This also changes the bitcast memory layout of exotic integers on big-endian systems to match what's empirically observed on our targets. Technically, this layout is not guaranteed by LLVM so we should probably ban bitcasts that reveal these padding bits, but for now this is an improvement.

Further enhance explanation of why expression is evaluated at comptime

Perform C-style arithmetic conversions on operands to division operator in macros Closes ziglang#13162

There's probably plenty of room to optimize these further in the future, but for the moment this gives ~3x improvement on Intel x86-64 processors, ~5x on AMD, and ~10x on M1 Macs. These extensions are very new - Most processors prior to 2020 do not support them. AVX-512 is a slightly older alternative that we could use on Intel for a much bigger performance bump, but it's been fused off on Intel's latest hybrid architectures and it relies on computing independent SHA hashes in parallel. In contrast, these SHA intrinsics provide the usual single-threaded, single-stream interface, and should continue working on new processors. AArch64 also has SHA-512 intrinsics that we could take advantage of in the future

This feature detection must be done at comptime so that we avoid generating invalid ASM for the target.

This gets us most of the way back to the performance I had when I was using the LLVM intrinsics: - Intel Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz: 190.67 MB/s (w/o intrinsics) -> 1285.08 MB/s - AMD EPYC 7763 (VM) @ 2.45 GHz: 240.09 MB/s (w/o intrinsics) -> 1360.78 MB/s - Apple M1: 216.96 MB/s (w/o intrinsics) -> 2133.69 MB/s Minor changes to this source can swing performance from 400 MB/s to 1400 MB/s or... 20 MB/s, depending on how it interacts with the optimizer. I have a sneaking suspicion that despite LLVM inheriting GCC's extremely strict inline assembly semantics, its passes are rather skittish around inline assembly (and almost certainly, its instruction cost models can assume nothing)

Comptime code can't execute assembly code, so we need some way to force comptime code to use the generic path. This should be replaced with whatever is implemented for ziglang#868, when that day comes. I am seeing that the result for the hash is incorrect in stage1 and crashes stage2, so presumably this never worked correctly. I will follow up on that soon.

This also fixes a bug where the feature gating was not taking effect at comptime due to ziglang#6768

Introduce `std.mem.readPackedInt` and improve bitcasting of packed memory layouts

crypto.sha2: Use intrinsics for SHA-256 on x86-64 and AArch64

Closes ziglang#1897

Closes ziglang#9415

…able Closes ziglang#12721

… null` Closes ziglang#13481

Just checking that they aren't comptime isn't enough for `@Type` constructed tuples. Closes ziglang#13531

… format strings (ziglang#13526) * Export invalidFmtErr To allow consistent use of "invalid format string" compile error response for badly formatted format strings. See ziglang#13489 (comment). * Replace format compile errors with invalidFmtErr - Provides more consistent compile errors. - Gives user info about the type of the badly formated value. * Rename invalidFmtErr as invalidFmtError For consistency. Zig seems to use “Error” more often than “Err”. * std: add invalid format string checks to remaining custom formatters * pass reference-trace to comp when building build file; fix checkobjectstep

Stage2 bug fixes

* x/os/Reactor: implement remove function * x/os/Reactor: update tests

These parameters are only ever needed when `std.builtin` is out of sync with the compiler in which case panicking is the only valid operation anyways. Removing them causes a domino effect of functions no longer needing a `src` and/or a `block` parameter resulting in handling compilation errors where they are actually meaningful becoming simpler.

Partially implements ziglang#13528. Enough to unblock the wasi-bootstrap branch.

C backend: improve ergonomics of zig.h a little bit

In the process of 'remaining blocks', the length of processed message can be from 1 to 79. The value of 'n-1' is ranged from 0 to 3. So, st.hx[i] must be initialized at least from st.hx[0] to st.hx[3]

@nektro

…g#13518) state: State -> state: *const State Suggested by @nektro Fixes ziglang#13510

) * point to init part of field delc when that's where the error occurs * update test to reflect fixed error message * only lookup source location in case of error

This would only fail to compile when building *on* WASI.

closes ziglang#13539

gwenzek mentioned this pull request Nov 2, 2021

Pointers for NVPTX support ziglang/zig#10064

Closed

gwenzek force-pushed the nvptx branch from 310f3df to 500e81b Compare November 2, 2021 13:03

gwenzek force-pushed the nvptx branch 2 times, most recently from 328a665 to 0407552 Compare November 10, 2021 14:37

gwenzek force-pushed the nvptx branch from cfa558f to aa6873e Compare December 24, 2021 08:21

gwenzek force-pushed the nvptx branch from aa6873e to 9535a3c Compare March 18, 2022 09:02

Vexu and others added 24 commits October 28, 2022 13:31

Sema: further enhance explanation of why expr is evaluated at comptime

c3b85e4

value: properly hash null_value pointer

6fc7183

Closes ziglang#13325

NativeTargetInfo: remove unused error

d6943f8

std.sign.ecdsa: add support for incremental signatures (ziglang#13332)

f28e4e0

Similar to what was done for EdDSA, allow incremental creation and verification of ECDSA signatures. Doing so for ECDSA is trivial, and can be useful for TLS as well as the future package manager.

std.mem: Add readPackedInt, writePackedInt, etc.

c639c22

These utility functions allow reading from (stage2) packed memory at runtime-known offsets.

Value: Add @intCast in writeToPackedMemory for 32-bit targets

9d0a4b6

std.mem: Skip read/writePackedInt test on WASM32/64

03ed0a5

Merge pull request ziglang#13322 from Vexu/comptime-reason

bd32206

Further enhance explanation of why expression is evaluated at comptime

Enable bitcast test now that ziglang#13214 is resolved.

40b7792

translate-c: Better support for division in macros

c616141

Perform C-style arithmetic conversions on operands to division operator in macros Closes ziglang#13162

std.crypto: SHA-256 Properly gate comptime conditional

ee241c4

This feature detection must be done at comptime so that we avoid generating invalid ASM for the target.

CLI: report error when -fstage1 requested but not available

84e0c14

std.crypto: Use featureSetHas to gate intrinsics

67fa326

This also fixes a bug where the feature gating was not taking effect at comptime due to ziglang#6768

Enhance indexOfIgnoreCase with Boyer-Moore-Horspool algorithm

c66d3f6

Merge pull request ziglang#13221 from topolarity/packed-mem

c36eb4e

Introduce `std.mem.readPackedInt` and improve bitcasting of packed memory layouts

Merge pull request ziglang#13272 from topolarity/sha2-intrinsics

20925b2

crypto.sha2: Use intrinsics for SHA-256 on x86-64 and AArch64

cbe: implement optional slice representation change

48a2783

Sema: add error note for wrong pointer dereference syntax

61f5ea4

Closes ziglang#1897

Sema: fix floatToInt to zero bit ints

1ea1228

Closes ziglang#9415

parser: improve error message for missing var/const before local vari…

9607bd9

…able Closes ziglang#12721

Vexu and others added 27 commits November 12, 2022 15:41

Sema: ensure that !is_comptime and !is_typeof implies `sema.func !=…

a760ce5

… null` Closes ziglang#13481

llvm: check that tuple fields have runtime bits

87cf278

Just checking that they aren't comptime isn't enough for `@Type` constructed tuples. Closes ziglang#13531

Implements std.math.sign for float vectors.

fbc4331

langref: add appendix and explain 'container' terminology

32b97df

Merge pull request ziglang#13497 from Vexu/stage2-fixes

9961621

Stage2 bug fixes

pthread_sigmask

81dadbc

x/os/Reactor: implement remove function (ziglang#13330)

b2ffe11

* x/os/Reactor: implement remove function * x/os/Reactor: update tests

C backend: improve ergonomics of zig.h a little bit

77e7d97

Partially implements ziglang#13528. Enough to unblock the wasi-bootstrap branch.

zig.h: remove redundant definition of u16/i16

20e8c2d

Merge pull request ziglang#13536 from ziglang/cbe-zig-h

0b0292c

C backend: improve ergonomics of zig.h a little bit

std.crypto.ghash: fix uninitialized polynomial use (ziglang#13527)

b29057b

In the process of 'remaining blocks', the length of processed message can be from 1 to 79. The value of 'n-1' is ranged from 0 to 3. So, st.hx[i] must be initialized at least from st.hx[0] to st.hx[3]

musl.zig: remove unused enum (ziglang#13545)

d823680

crypto.bcrypt: fix massive speed regression when using stage2 (ziglan…

7eed028

…g#13518) state: State -> state: *const State Suggested by @nektro Fixes ziglang#13510

ci: init github actions support

bbd0775

CI: separate aarch64 and x86_64 macos scripts

37402e4

CI: aarch64-macos: set PATH env var for cmake

27e63bb

macos: x86_64: fix wrong path to cmake

e14d135

CI: aarch64-linux: init

5d7efa6

disable failing test on aarch64-macos

a50ad04

CI: disable github workflows until it is working in the ci branch

1498607

std.os.linux: Add setitimer and getitimer syscalls

ceb9fed

Fix error reporting the wrong line for struct field inits (ziglang#13502

c4f7663

) * point to init part of field delc when that's where the error occurs * update test to reflect fixed error message * only lookup source location in case of error

std.build: fix typo

024bac7

This would only fail to compile when building *on* WASI.

zig-cache: support windows drive + fwd-slash paths

a93fa29

closes ziglang#13539

fix Nvptx backend outputing files at the top level of zig-cache

1bea4f3

gwenzek force-pushed the nvptx branch from 9535a3c to 1bea4f3 Compare November 16, 2022 09:52

gwenzek closed this Nov 16, 2022

gwenzek deleted the nvptx branch January 31, 2024 20:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

minimal compiler changes so that you can compile to nvptx #1

minimal compiler changes so that you can compile to nvptx #1

gwenzek commented Nov 2, 2021

minimal compiler changes so that you can compile to nvptx #1

minimal compiler changes so that you can compile to nvptx #1

Conversation

gwenzek commented Nov 2, 2021