Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API redesign with async support #18

Merged
merged 9 commits into from
May 20, 2024
Merged

API redesign with async support #18

merged 9 commits into from
May 20, 2024

Conversation

juntyr
Copy link
Owner

@juntyr juntyr commented May 19, 2024

No description provided.

juntyr and others added 9 commits May 19, 2024 06:05
Experiments with Rust Futures

Implemented derive for RustToCudaAsync

Implemented async kernel launch

Fixed RustToCudaAsync derive

LaunchPackage with non-mut Stream

Moved stream to be an explicit kernel argument

Updated ExchangeWrapperOn[Device|Host]Async::move_to_stream

Upgraded to fixed RustaCuda

Added scratch-space methods for uni-directional CudaExchangeItem

Added unsafe-aliasing API to SplitSlideOverCudaThreads[Const|Dynamic]Stride

Extended the CudaExchangeItem API with scratch and uMaybeUninit

Rename SplitSliceOverCudaThreads[Const|Dynamic]Strude::alias_[mut_]unchecked

Implemented #[cuda(crate)] and #[kernel(crate)] attributes

Added simple thread-block shared memory support

Fixed device utils doc tests

Convert cuda thread-block-shared memory address to generic

First steps towards better shared memory, including dynamic

Revert derive changes + R2C-based approach start

Some progress on shared slices

Backup of progress on compile-time PTX checking

Clean up the PTX JIT implementation

Add convenience functions for ThreadBlockShared arrays

Improve and fix CI

Remove broken ThreadBlockShared RustToCuda impl

Refactor kernel trait generation to push more safety constraints to the kernel definition

Fixed SomeCudaAlloc import

Added error handling to the compile-time PTX checking

Add PTX lint parsing, no actual support yet

Added lint checking support to monomorphised kernel impls

Improve kernel checking + added cubin dump lint

Fix kernel macro config parsing

Explicitly fitting Device[Const|Mut]Ref into device registers

Switched one std:: to core::

Remove register-sized CUDA kernel args check, unnecessary since rust-lang/rust#94703

Simplified the kernel parameter layout extraction from PTX

Fix up rebase issues

Install CUDA in all CI steps

Use CStr literals

Simplify and document the safety traits

Fix move_to_cuda bound

Fix clippy for 1.76

Cleaned up the rust-cuda device macros with better print

The implementation still uses String for dynamic formatting, which
currently pulls in loads of formatting and panic machinery.

While a custom String type that pre-allocated the exact format String
length can avoid some of that, the formatting machinery even for e.g.
usize is still large.

If `format_args!` is ever optimised for better inlining, the more
verbose and lower-level implementation could be reconsidered.

Switch to using more vprintf in embedded CUDA kernel

Make print example fully executable

Clean up the print example

ptr_from_ref is stable from 1.76

Exit on CUDA panic instead of abort to allow the host to handle the error

Backup of early progress for switching from kernel traits to functions

More work into kernel functions instead of traits

Eliminate almost all ArgsTrait usages

Some refactoring of the async kernel func type + wrap code

Early sketch of extracting type wrapping from macro into types and traits

Early work towards using trait for kernel type wrap, ptx jit workaround missing

Lift complete CPU kernel wrapper from proc macro into public functions

Add async launch helper

Further cleanup of the new kernel param API

Start cleaning up the public API

Allow passing ThreadBlockShared to kernels again

Remove unsound mutable lending to CUDA for now

Allow passing ThreadBlockSharedSlice to kernel for dynamic shared memory

Begin refactoring the public API with device feature

Refactoring to prepare for better module structure

Extract kernel module just for parameters

Add RustToCuda impls for &T, &mut T, &[T], and &mut [T] where T: RustToCuda

Large restructuring of the module layout for rust-cuda

Split rust-cuda-kernel off from rust-cuda-derive

Update codecov action to handle rust-cuda-kernel

Fix clippy lint

Far too much time spent getting rid of DeviceCopy

More refactoring and auditing kernel param bounds

First exploration towards a stricter async CUDA API

More experiments with async API

Further API experimentation

Further async API experimentation

Further async API design work

Add RustToCudaAsync impls for &T and &[T], but not &mut T or &mut [T]

Add back mostly unchanged exchange wrapper + buffer with RustToCudaAsync impls

Add back mostly unchanged anti-aliasing types with RustToCudaAsync impls

Progress on replacing ...Async with Async<...>

Seal more implementation details

Further small API improvements

Add AsyncProj helper API struct for async projections

Disable async derive in examples for now

Implement RustToCudaAsync derive impls

Further async API improvements to add drop behaviour

First sketch of the safety constraints of a new NoSafeAliasing trait

First steps towards reintroducing LendToCudaMut

Fix no-std Box import for LendRustToCuda derive

Re-add RustToCuda implementation for Final

Remove redundant RustToCudaAsyncProxy

More progress on less 'static bounds on kernel params

Further investigation of less 'static bounds

Remove 'static bounds from LendToCuda ref kernel params

Make CudaExchangeBuffer Sync

Make CudaExchangeBuffer Sync v2

Add AsyncProj proj_ref and proj_mut convenience methods

Add RustToCudaWithPortableBitCloneSemantics adapter

Fix invalid const fn bounds

Add Deref[Mut] to the adapters

Fix pointer type inference error

Try removing __rust_cuda_ffi_safe_assert module

Ensure async launch mutable borrow safety with barriers on use and stream move

Fix uniqueness guarantee for Stream using branded types

Try without ref proj

Try add extract ref

Fix doc link

clean up kernel signature check

Some cleanup before merging

Fix some clippy lints, add FIXMEs for others

Add docs for rust-cuda-derive

Small refactoring + added docs for rust-cuda-kernel

Bump MSRV to 1.77-nightly

Try trait-based kernel signature check

Try naming host kernel layout const

Try match against byte literal for faster comparison

Try with memcmp intrinsic

Try out experimental const-type-layout with compression

Try check

Try check again
@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 0% with 2609 lines in your changes are missing coverage. Please review.

Project coverage is 0.00%. Comparing base (f395253) to head (9b7a875).

Files Patch % Lines
rust-cuda-kernel/src/kernel/link/mod.rs 0.00% 707 Missing ⚠️
rust-cuda-kernel/src/kernel/wrapper/mod.rs 0.00% 271 Missing ⚠️
...st-cuda-kernel/src/kernel/specialise/param_type.rs 0.00% 196 Missing ⚠️
src/utils/adapter.rs 0.00% 162 Missing ⚠️
rust-cuda-derive/src/rust_to_cuda/impl.rs 0.00% 135 Missing ⚠️
...kernel/wrapper/generate/host_link_macro/get_ptx.rs 0.00% 134 Missing ⚠️
...kernel/src/kernel/wrapper/generate/cuda_wrapper.rs 0.00% 121 Missing ⚠️
rust-cuda-kernel/src/kernel/lints.rs 0.00% 113 Missing ⚠️
rust-cuda-derive/src/rust_to_cuda/field_copy.rs 0.00% 111 Missing ⚠️
...src/kernel/wrapper/generate/host_link_macro/mod.rs 0.00% 97 Missing ⚠️
... and 20 more

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #18       +/-   ##
==========================================
- Coverage   58.39%   0.00%   -58.40%     
==========================================
  Files          48      33       -15     
  Lines        3653    3290      -363     
==========================================
- Hits         2133       0     -2133     
- Misses       1520    3290     +1770     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@juntyr juntyr merged commit eba6c37 into main May 20, 2024
6 checks passed
@juntyr juntyr deleted the async-new branch May 20, 2024 07:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants