DEBIAN_FRONTEND=noninteractive TZ=Etc/UTC apt-get -y install tzdata apt install -y build-essential curl git pkg-config libssl-dev clang kmod docker cp cuda.run 8bc3b843eecc:/cuda.run ./cuda.run --silent --toolkit --samples ./cuda.run --silent --toolkit --driver export PATH="/usr/local/cuda-11.7/bin:$PATH" cargo test # fails due to runtime cuda not available ldconfig -p | grep libcuda # also does not find it
- cleanup the channels with types and errors
- write a better README.md
- what to do with the utils flag? (forward?)
- migrate other examples to make the api really usable
- add a test using a compressed messagepack stream (
serde-rmp
) - write a blogpost
- rename project on github
- use a single static for the accelsim example with all the context to avoid unsafe
- use the vendor
nvbit_release
only for downstream crates - the read thread should push into a rust channel
- clean up some code
- alias types used for the nvbit callback API's
- improve the buffer implementation
- include
cuda-sys
crate in nvbit-rs
nvcc -D_FORCE_INLINES -dc -c -std=c++11 -I../nvbit_release/core -Xptxas -cloning=no -Xcompiler -w -O3 -Xcompiler -fPIC tracer_tool.cu -o tracer_tool.o
nvcc -D_FORCE_INLINES -I../nvbit_release/core -maxrregcount=24 -Xptxas -astoolspatch --keep-device-functions -c inject_funcs.cu -o inject_funcs.o
nvcc -D_FORCE_INLINES -O3 tracer_tool.o inject_funcs.o -L../nvbit_release/core -lnvbit -lcuda -shared -o tracer_tool.so
now is a good time to introduce workspaces make the examples individual crates with cargo.toml and build.rs write the custom tracing kernels per example this way we might finally include the symbol
TODO - we find that Rust and C++ interop is hard - e.g. nvbit_get_related_functions
returns std::vector<CUfunction>
, for which there is no easy binding, even using &cxx::CxxVector<CUfunction>
does not work because CUfunction
is a FFI struct (by value).
-
a possible way is to provide a wrapper that copies to a
cxx::Vec<CUfuncton>
i guess (see this example) -
since we are tracing, and this would need to be performed for each unseen function, this copy overhead is not acceptable
- TODO: find out how often it is called and maybe still do it and measure
- UPDATE: get_related_functions is only called once, try it in rust
- TODO: find out how often it is called and maybe still do it and measure
-
other approach: only receive stuff from the channel, a simple struct...
- if that works: how can we decide which tracing function to use
- (since we cannot write new ones in rust)
- if that works: how can we decide which tracing function to use
-
figure out if we can somehow reuse the same nvbit names by using a namespace??
-
or wrap the calls in rust which calls the
ffi::rust_*
funcs. -
IMPORTANT OBSERVATION:
- i almost gave up on
cxx
, because it was only giving meUnsupported type
errors - i dont import
cxx::UniquePtr
orcxx::CxxVector
in theffi
module, so i was assuming i need to usecxx::
to reference the types. - they dont in the docs, but use them in the top level module ...
- turns out you need to omit the
cxx::
prefix because this is all macro magic ...
- i almost gave up on
- we must include the CUDA inject funcs? and the
nvbit_tool.h
into the binary somehow.- maybe statically compile them in the build script
- then link them with
nvbit-sys
crate - then,
nvbit_at_init
should be called - hopefully
The current goal is to get a working example of a tracer written in rust. Usage should be:
# install lld
sudo apt-get install -y lld
# create a shared dynamic library that implements the nvbit hooks
cargo build -p accelsim
# run the nvbit example CUDA application with the tracer
LD_PRELOAD=./target/debug/libaccelsim.so nvbit-sys/nvbit_release/test-apps/vectoradd/vectoradd
check the clang versions installed
apt list --installed | grep clang
When running clang nvbit.h
, it also complains about missing cassert.
-std=c++11
-I$(NVBIT_PATH)
is C++ STL, so we need: clang++ -std=c++11 nvbit.h
.
bindgen
does not work that well with C++ code, check this.
we need some clang stuff so that bindgen can find #include <cassert>
.
We will also need to include nvbit.h
, nvbit_tool.h
, and tracing injected functions, which require .cu
files to be compiled and linked with the binary.
this example shows how .cu
can be compiled and linked with the cc
crate.
Make sure that the C function hooks of nvbit are not mangled in the shared library:
nm -D ./target/debug/examples/libtracer.so
nm -D ./target/debug/build/nvbit-sys-08fdef510bde07a0/out/libinstrumentation.a
Problem: we need the instrument_inst
function to be present in the binary, just like
for the example:
nm -D /home/roman/dev/nvbit-sys/tracer_nvbit/tracer_tool/tracer_tool.so | grep instrument
# for a static library:
nm --debug-syms target/debug/build/accelsim-a67c1762e4619dad/out/libinstrumentation.a | grep instrument
Currently, its not :(
Make sure that we link statically:
ldd ./target/debug/examples/libtracer.so
Check what includes cxx
generated:
tre target/debug/build/nvbit-sys-*/out/cxxbridge/