rwkv_cuda simple minimal dependency test layernorm / softmax / wkv_forward using oneflow style custom cuda kernel gemm / gemv using slightly modified cutlass 3.1 argsort using thrust minimal test: under bin folder invoked with nodejs and koffi