Skip to content

Commit

Permalink
API redesign with async support (#18)
Browse files Browse the repository at this point in the history
* Initial work on supporting some async memory transfers

Experiments with Rust Futures

Implemented derive for RustToCudaAsync

Implemented async kernel launch

Fixed RustToCudaAsync derive

LaunchPackage with non-mut Stream

Moved stream to be an explicit kernel argument

Updated ExchangeWrapperOn[Device|Host]Async::move_to_stream

Upgraded to fixed RustaCuda

Added scratch-space methods for uni-directional CudaExchangeItem

Added unsafe-aliasing API to SplitSlideOverCudaThreads[Const|Dynamic]Stride

Extended the CudaExchangeItem API with scratch and uMaybeUninit

Rename SplitSliceOverCudaThreads[Const|Dynamic]Strude::alias_[mut_]unchecked

Implemented #[cuda(crate)] and #[kernel(crate)] attributes

Added simple thread-block shared memory support

Fixed device utils doc tests

Convert cuda thread-block-shared memory address to generic

First steps towards better shared memory, including dynamic

Revert derive changes + R2C-based approach start

Some progress on shared slices

Backup of progress on compile-time PTX checking

Clean up the PTX JIT implementation

Add convenience functions for ThreadBlockShared arrays

Improve and fix CI

Remove broken ThreadBlockShared RustToCuda impl

Refactor kernel trait generation to push more safety constraints to the kernel definition

Fixed SomeCudaAlloc import

Added error handling to the compile-time PTX checking

Add PTX lint parsing, no actual support yet

Added lint checking support to monomorphised kernel impls

Improve kernel checking + added cubin dump lint

Fix kernel macro config parsing

Explicitly fitting Device[Const|Mut]Ref into device registers

Switched one std:: to core::

Remove register-sized CUDA kernel args check, unnecessary since rust-lang/rust#94703

Simplified the kernel parameter layout extraction from PTX

Fix up rebase issues

Install CUDA in all CI steps

Use CStr literals

Simplify and document the safety traits

Fix move_to_cuda bound

Fix clippy for 1.76

Cleaned up the rust-cuda device macros with better print

The implementation still uses String for dynamic formatting, which
currently pulls in loads of formatting and panic machinery.

While a custom String type that pre-allocated the exact format String
length can avoid some of that, the formatting machinery even for e.g.
usize is still large.

If `format_args!` is ever optimised for better inlining, the more
verbose and lower-level implementation could be reconsidered.

Switch to using more vprintf in embedded CUDA kernel

Make print example fully executable

Clean up the print example

ptr_from_ref is stable from 1.76

Exit on CUDA panic instead of abort to allow the host to handle the error

Backup of early progress for switching from kernel traits to functions

More work into kernel functions instead of traits

Eliminate almost all ArgsTrait usages

Some refactoring of the async kernel func type + wrap code

Early sketch of extracting type wrapping from macro into types and traits

Early work towards using trait for kernel type wrap, ptx jit workaround missing

Lift complete CPU kernel wrapper from proc macro into public functions

Add async launch helper

Further cleanup of the new kernel param API

Start cleaning up the public API

Allow passing ThreadBlockShared to kernels again

Remove unsound mutable lending to CUDA for now

Allow passing ThreadBlockSharedSlice to kernel for dynamic shared memory

Begin refactoring the public API with device feature

Refactoring to prepare for better module structure

Extract kernel module just for parameters

Add RustToCuda impls for &T, &mut T, &[T], and &mut [T] where T: RustToCuda

Large restructuring of the module layout for rust-cuda

Split rust-cuda-kernel off from rust-cuda-derive

Update codecov action to handle rust-cuda-kernel

Fix clippy lint

Far too much time spent getting rid of DeviceCopy

More refactoring and auditing kernel param bounds

First exploration towards a stricter async CUDA API

More experiments with async API

Further API experimentation

Further async API experimentation

Further async API design work

Add RustToCudaAsync impls for &T and &[T], but not &mut T or &mut [T]

Add back mostly unchanged exchange wrapper + buffer with RustToCudaAsync impls

Add back mostly unchanged anti-aliasing types with RustToCudaAsync impls

Progress on replacing ...Async with Async<...>

Seal more implementation details

Further small API improvements

Add AsyncProj helper API struct for async projections

Disable async derive in examples for now

Implement RustToCudaAsync derive impls

Further async API improvements to add drop behaviour

First sketch of the safety constraints of a new NoSafeAliasing trait

First steps towards reintroducing LendToCudaMut

Fix no-std Box import for LendRustToCuda derive

Re-add RustToCuda implementation for Final

Remove redundant RustToCudaAsyncProxy

More progress on less 'static bounds on kernel params

Further investigation of less 'static bounds

Remove 'static bounds from LendToCuda ref kernel params

Make CudaExchangeBuffer Sync

Make CudaExchangeBuffer Sync v2

Add AsyncProj proj_ref and proj_mut convenience methods

Add RustToCudaWithPortableBitCloneSemantics adapter

Fix invalid const fn bounds

Add Deref[Mut] to the adapters

Fix pointer type inference error

Try removing __rust_cuda_ffi_safe_assert module

Ensure async launch mutable borrow safety with barriers on use and stream move

Fix uniqueness guarantee for Stream using branded types

Try without ref proj

Try add extract ref

Fix doc link

clean up kernel signature check

Some cleanup before merging

Fix some clippy lints, add FIXMEs for others

Add docs for rust-cuda-derive

Small refactoring + added docs for rust-cuda-kernel

Bump MSRV to 1.77-nightly

Try trait-based kernel signature check

Try naming host kernel layout const

Try match against byte literal for faster comparison

Try with memcmp intrinsic

Try out experimental const-type-layout with compression

Try check

Try check again

* Fix CUDA install in CI

* Switch from kernel type signature check to random hash

* Fix CI-identified failures

* Use pinned nightly in CI

* Try splitting the kernel func signature type check

* Try with llvm-bitcode-linker

* Upgrade to latest ptx-builder

* Fix codecov by excluding ptx tests (codecov weirdly overrides linker)
  • Loading branch information
juntyr committed May 20, 2024
1 parent f395253 commit eba6c37
Show file tree
Hide file tree
Showing 138 changed files with 11,459 additions and 5,221 deletions.
150 changes: 41 additions & 109 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,13 @@ jobs:
rust: [nightly]

steps:
- name: Install CUDA
uses: Jimver/cuda-toolkit@v0.2.14
with:
method: network
use-github-cache: false
use-local-cache: false

- name: Checkout the Repository
uses: actions/checkout@v2

Expand All @@ -32,61 +39,27 @@ jobs:
toolchain: ${{ matrix.rust }}
profile: minimal
target: nvptx64-nvidia-cuda
override: true

- name: Install the rust-ptx-linker
run: |
wget https://apt.llvm.org/llvm.sh && chmod +x llvm.sh
sudo ./llvm.sh $(rustc --version -v | grep -oP "LLVM version: \K\d+")
rm llvm.sh
cargo install rust-ptx-linker --git https://github.com/juntyr/rust-ptx-linker --force
- name: Check without features on CPU
run: |
cargo check
- name: Check with alloc feature on CPU
run: |
cargo check \
--features alloc
- name: Check with derive feature on CPU
run: |
cargo check \
--features derive
override: false # FIXME

- name: Check with host feature on CPU
run: |
cargo check \
--features host
- name: Install cargo-hack
uses: taiki-e/install-action@cargo-hack

- name: Check with host,derive,alloc features on CPU
- name: Check feature powerset on the CPU
run: |
cargo check \
--features host,derive,alloc
cargo hack check --feature-powerset --optional-deps \
--skip device \
--keep-going
- name: Check without features on CUDA
- name: Check feature powerset on CUDA
run: |
cargo check \
cargo hack check --feature-powerset --optional-deps \
--skip host \
--keep-going \
--target nvptx64-nvidia-cuda
- name: Check with alloc feature on CUDA
run: |
cargo check \
--target nvptx64-nvidia-cuda \
--features alloc
- name: Check with derive feature on CUDA
run: |
cargo check \
--target nvptx64-nvidia-cuda \
--features derive
- name: Check all workspace targets
run: |
cargo check \
--workspace \
--all-targets
cargo check --workspace --all-targets
test:
name: Test Suite
Expand All @@ -113,14 +86,7 @@ jobs:
toolchain: ${{ matrix.rust }}
profile: minimal
target: nvptx64-nvidia-cuda
override: true

- name: Install the rust-ptx-linker
run: |
wget https://apt.llvm.org/llvm.sh && chmod +x llvm.sh
sudo ./llvm.sh $(rustc --version -v | grep -oP "LLVM version: \K\d+")
rm llvm.sh
cargo install rust-ptx-linker --git https://github.com/juntyr/rust-ptx-linker --force
override: false # FIXME

- name: Run the test-suite
run: |
Expand Down Expand Up @@ -154,6 +120,13 @@ jobs:
rust: [nightly]

steps:
- name: Install CUDA
uses: Jimver/cuda-toolkit@v0.2.14
with:
method: network
use-github-cache: false
use-local-cache: false

- name: Checkout the Repository
uses: actions/checkout@v2

Expand All @@ -164,67 +137,26 @@ jobs:
profile: minimal
components: clippy
target: nvptx64-nvidia-cuda
override: true

- name: Install the rust-ptx-linker
run: |
wget https://apt.llvm.org/llvm.sh && chmod +x llvm.sh
sudo ./llvm.sh $(rustc --version -v | grep -oP "LLVM version: \K\d+")
rm llvm.sh
cargo install rust-ptx-linker --git https://github.com/juntyr/rust-ptx-linker --force
- name: Check the code style without features on CPU
run: |
cargo clippy \
-- -D warnings
- name: Check the code style with alloc feature on CPU
run: |
cargo clippy \
--features alloc \
-- -D warnings
- name: Check the code style with derive feature on CPU
run: |
cargo clippy \
--features derive \
-- -D warnings
override: false # FIXME

- name: Check the code style with host feature on CPU
run: |
cargo clippy \
--features host \
-- -D warnings
- name: Check the code style with host,derive,alloc features on CPU
run: |
cargo clippy \
--features host,derive,alloc \
-- -D warnings
- name: Check the code style without features on CUDA
run: |
cargo clippy \
--target nvptx64-nvidia-cuda \
-- -D warnings
- name: Install cargo-hack
uses: taiki-e/install-action@cargo-hack

- name: Check the code style with alloc feature on CUDA
- name: Check feature powerset on the CPU
run: |
cargo clippy \
--target nvptx64-nvidia-cuda \
--features alloc \
cargo hack clippy --feature-powerset --optional-deps \
--skip device \
--keep-going \
-- -D warnings
- name: Check the code style with derive feature on CUDA
- name: Check feature powerset on CUDA
run: |
cargo clippy \
cargo hack clippy --feature-powerset --optional-deps \
--skip host \
--keep-going \
--target nvptx64-nvidia-cuda \
--features derive \
-- -D warnings
- name: Check the code style for all workspace targets
- name: Check all workspace targets
run: |
cargo clippy \
--workspace \
--all-targets \
-- -D warnings
cargo clippy --workspace --all-targets -- -D warnings
18 changes: 8 additions & 10 deletions .github/workflows/coverage.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,19 +27,17 @@ jobs:
profile: minimal
components: llvm-tools-preview
target: nvptx64-nvidia-cuda
override: true

- name: Install the rust-ptx-linker
run: |
wget https://apt.llvm.org/llvm.sh && chmod +x llvm.sh
sudo ./llvm.sh $(rustc --version -v | grep -oP "LLVM version: \K\d+")
rm llvm.sh
cargo install rust-ptx-linker --git https://github.com/juntyr/rust-ptx-linker --force
override: false # FIXME

- name: Generate the coverage data
run: |
cargo clean
cargo test --workspace --all-targets
cargo test \
--workspace \
--all-targets \
--exclude derive \
--exclude print \
--exclude single-source
env:
CARGO_INCREMENTAL: 0
RUSTFLAGS: -Cinstrument-coverage
Expand All @@ -56,8 +54,8 @@ jobs:
./grcov . -s . --binary-path ./target/debug/deps \
-t lcov -o coverage.lcov --branch \
--keep-only "src/*" \
--keep-only "rust-cuda-ptx-jit/*" \
--keep-only "rust-cuda-derive/*" \
--keep-only "rust-cuda-kernel/*" \
--ignore-not-existing \
--excl-line GRCOV_EXCL_LINE \
--excl-start GRCOV_EXCL_START \
Expand Down
4 changes: 3 additions & 1 deletion .github/workflows/rustdoc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,14 @@ jobs:
with:
toolchain: nightly
profile: minimal
override: true
override: false # FIXME

- name: Build the Documentation
run: |
RUSTDOCFLAGS="\
--enable-index-page \
--extern-html-root-url const_type_layout=https://docs.rs/const-type-layout/0.3.1/ \
--extern-html-root-url final=https://docs.rs/final/0.1.1/ \
--extern-html-root-url rustacuda=https://docs.rs/rustacuda/0.1.3/ \
--extern-html-root-url rustacuda_core=https://docs.rs/rustacuda_core/0.1.2/ \
--extern-html-root-url rustacuda_derive=https://docs.rs/rustacuda_derive/0.1.2/ \
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,6 @@ Cargo.lock

# These are backup files generated by rustfmt
**/*.rs.bk

# cargo expand dev output files
**/expanded.rs
18 changes: 8 additions & 10 deletions .gitpod.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -8,18 +8,16 @@ RUN echo "debconf debconf/frontend select Noninteractive" | sudo debconf-set-sel
echo "keyboard-configuration keyboard-configuration/layout select 'English (US)'" | sudo debconf-set-selections && \
echo "keyboard-configuration keyboard-configuration/layoutcode select 'us'" | sudo debconf-set-selections && \
echo "resolvconf resolvconf/linkify-resolvconf boolean false" | sudo debconf-set-selections && \
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin && \
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600 && \
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub && \
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /" && \
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.0-1_all.deb -O cuda_keyring.deb && \
sudo dpkg -i cuda_keyring.deb && \
rm cuda_keyring.deb && \
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin && \
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600 && \
sudo add-apt-repository deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ / && \
sudo apt-get update -q && \
sudo apt-get install cuda -y --no-install-recommends && \
wget https://apt.llvm.org/llvm.sh && chmod +x llvm.sh && \
sudo ./llvm.sh $(rustc --version -v | grep -oP "LLVM version: \K\d+") && \
rm llvm.sh && \
sudo apt-get install cuda-12-3 -y --no-install-recommends && \
sudo apt-get clean autoclean && \
sudo apt-get autoremove -y && \
sudo rm -rf /var/lib/{apt,dpkg,cache,log}/

RUN cargo install rust-ptx-linker --git https://github.com/juntyr/rust-ptx-linker --force && \
cargo install cargo-reaper --git https://github.com/juntyr/grim-reaper --force
RUN cargo install cargo-reaper --git https://github.com/juntyr/grim-reaper --force
8 changes: 7 additions & 1 deletion .vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,11 @@
"rust-analyzer.updates.askBeforeDownload": false,
"rust-analyzer.checkOnSave.command": "reap-clippy",
"rust-analyzer.cargo.allFeatures": false,
"rust-analyzer.cargo.features": ["alloc", "derive", "host"],
"rust-analyzer.cargo.features": [
"derive",
"final",
"host",
"kernel"
],
"rust-analyzer.showUnlinkedFileNotification": false,
}
35 changes: 19 additions & 16 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
[workspace]
members = [
".", "rust-cuda-derive", "rust-cuda-ptx-jit",
"examples/single-source", "examples/derive",
".", "rust-cuda-derive", "rust-cuda-kernel",
"examples/derive", "examples/print", "examples/single-source",
]
default-members = [
".", "rust-cuda-derive", "rust-cuda-ptx-jit"
".", "rust-cuda-derive", "rust-cuda-kernel",
]

[package]
Expand All @@ -19,23 +19,26 @@ rust-version = "1.79" # nightly

[features]
default = []
alloc = ["hashbrown"]
host = ["rustacuda", "rust-cuda-ptx-jit/host"]
derive = ["rustacuda_derive", "rust-cuda-derive"]
derive = ["dep:rustacuda_derive", "dep:rust-cuda-derive"]
device = []
final = ["dep:final"]
host = ["dep:rustacuda", "dep:regex", "dep:oneshot", "dep:safer_owning_ref"]
kernel = ["dep:rust-cuda-kernel"]

[dependencies]
rustacuda_core = "0.1.2"
rustacuda_core = { git = "https://github.com/juntyr/RustaCUDA", rev = "c6ea7cc" }

rustacuda = { version = "0.1.3", optional = true }
rustacuda_derive = { version = "0.1.2", optional = true }
rustacuda = { git = "https://github.com/juntyr/RustaCUDA", rev = "c6ea7cc", optional = true }
rustacuda_derive = { git = "https://github.com/juntyr/RustaCUDA", rev = "c6ea7cc", optional = true }

const-type-layout = { version = "0.3.0", features = ["derive"] }
regex = { version = "1.10", optional = true }

final = "0.1.1"
hashbrown = { version = "0.14", default-features = false, features = ["inline-more"], optional = true }
const-type-layout = { version = "0.3.1", features = ["derive"] }

rust-cuda-derive = { path = "rust-cuda-derive", optional = true }
rust-cuda-ptx-jit = { path = "rust-cuda-ptx-jit" }
safer_owning_ref = { version = "0.5", optional = true }
oneshot = { version = "0.1", optional = true, features = ["std", "async"] }

final = { version = "0.1.1", optional = true }

[dev-dependencies]
hashbrown = { version = "0.14", default-features = false, features = ["inline-more"] }
rust-cuda-derive = { path = "rust-cuda-derive", optional = true }
rust-cuda-kernel = { path = "rust-cuda-kernel", optional = true }
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
# rust-cuda &emsp; [![CI Status]][workflow] [![Rust Doc]][docs] [![License Status]][fossa] [![Code Coverage]][codecov] [![Gitpod Ready-to-Code]][gitpod]
# rust-cuda &emsp; [![CI Status]][workflow] [![MSRV]][repo] [![Rust Doc]][docs] [![License Status]][fossa] [![Code Coverage]][codecov] [![Gitpod Ready-to-Code]][gitpod]

[CI Status]: https://img.shields.io/github/actions/workflow/status/juntyr/rust-cuda/ci.yml?branch=main
[workflow]: https://github.com/juntyr/rust-cuda/actions/workflows/ci.yml?query=branch%3Amain

[MSRV]: https://img.shields.io/badge/MSRV-1.79.0--nightly-orange
[repo]: https://github.com/juntyr/rust-cuda

[Rust Doc]: https://img.shields.io/badge/docs-main-blue
[docs]: https://juntyr.github.io/rust-cuda/

Expand Down
5 changes: 2 additions & 3 deletions examples/derive/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,12 +1,11 @@
[package]
name = "derive"
version = "0.1.0"
authors = ["Juniper Tyree <juniper.langenstein@helsinki.fi>"]
authors = ["Juniper Tyree <juniper.tyree@helsinki.fi>"]
license = "MIT OR Apache-2.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
const-type-layout = { version = "0.3.0" }
rust-cuda = { path = "../../", features = ["derive", "host"] }
rc = { package = "rust-cuda", path = "../../", features = ["derive", "host"] }
6 changes: 4 additions & 2 deletions examples/derive/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,13 +1,15 @@
#![deny(clippy::pedantic)]
#![feature(const_type_name)]

#[derive(rust_cuda::common::LendRustToCuda)]
#[derive(rc::lend::LendRustToCuda)]
#[cuda(crate = "rc")]
struct Inner<T: Copy> {
#[cuda(embed)]
inner: T,
}

#[derive(rust_cuda::common::LendRustToCuda)]
#[derive(rc::lend::LendRustToCuda)]
#[cuda(crate = "rc")]
struct Outer<T: Copy> {
#[cuda(embed)]
inner: Inner<T>,
Expand Down
Loading

0 comments on commit eba6c37

Please sign in to comment.