-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Description
🐛 Describe the bug
I'm trying to JIT compile following blank C++ extension:
File test.py:
import os
from torch.utils.cpp_extension import load
sources = [
os.path.join(os.path.dirname(__file__), "csrc", "cuda_impl.cpp"),
os.path.join(os.path.dirname(__file__), "csrc", "cuda_kernel.cu"),
]
load(
name="my_cuda",
sources=sources,
extra_cflags=['-O3'],
is_python_module=False,
verbose=False
)
print("OK")File csrc/cuda_impl.cpp:
#include <torch/script.h>
#include <vector>
void cuda_forward_simple();
void cuda_backward_simple();
std::vector<at::Tensor> forward_simple(const at::Tensor & inputs) {
return {inputs};
}
std::vector<at::Tensor> backward_simple(const at::Tensor & grad_inputs, const at::Tensor & inputs) {
return {grad_inputs};
}
static auto registory_fwd_v1 =
torch::RegisterOperators("my_cuda::forward_simple", &forward_simple);
static auto registory_bwd_v1 =
torch::RegisterOperators("my_cuda::backward_simple", &backward_simple);
File csrc/cuda_kernel.cu
#include <torch/types.h>
#include <cuda.h>
#include <cuda_runtime.h>
namespace {
template <typename scalar_t>
__global__ void cuda_forward_kernel_simple()
{
}
template <typename scalar_t>
__global__ void cuda_backward_kernel_simple()
{
}
}
void cuda_forward_simple(const at::Tensor & inputs) {
AT_DISPATCH_FLOATING_TYPES_AND_HALF(inputs.type(), "forward_cuda_simple", ([&] {
cuda_forward_kernel_simple<scalar_t><<<1, 1>>>();
}));
}
void cuda_backward_simple(at::Tensor & grad_input, const at::Tensor & inputs) {
AT_DISPATCH_FLOATING_TYPES_AND_HALF(inputs.type(), "backward_cuda_simple", ([&] {
cuda_backward_kernel_simple<scalar_t><<<1, 1>>>();
}));
}
But I get following error:
Traceback (most recent call last):
File "C:\Users\1\PycharmProjects\test\troch_test.py", line 8, in <module>
load(
File "C:\Users\1\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\utils\cpp_extension.py", line 1284, in load
return _jit_compile(
File "C:\Users\1\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\utils\cpp_extension.py", line 1535, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "C:\Users\1\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\utils\cpp_extension.py", line 1934, in _import_module_from_library
torch.ops.load_library(filepath)
File "C:\Users\1\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\_ops.py", line 643, in load_library
ctypes.CDLL(path)
File "C:\Users\1\AppData\Local\Programs\Python\Python39\lib\ctypes\__init__.py", line 374, in __init__
self._handle = _dlopen(self._name, mode)
FileNotFoundError: Could not find module 'C:\Users\1\AppData\Local\torch_extensions\torch_extensions\Cache\py39_cu118\my_cuda\my_cuda.pyd' (or one of its dependencies). Try using the full path with constructor syntax.
C++ extension compilation succeeds, but Python is unable to load results my_cuda.pyd file.
Previously I've used SRU custom torch layer ( https://github.com/asappresearch/sru ), and it loads fine, so the problem is somewhere on my machine, but I can't understand where. I know I have mismatched CUDA version and torch-cuda version (because I had to install newer version of pytorch on a different enviroment), but it didn't cause any problem with SRU layer.
Can you help me?
Versions
Collecting environment information...
PyTorch version: 2.0.1+cu118
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A
OS: Майкрософт Windows 10 Домашняя
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A
Python version: 3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:36:42) [MSC v.1929 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19045-SP0
Is CUDA available: True
CUDA runtime version: 12.5.40
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 2060
Nvidia driver version: 555.85
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture=9
CurrentClockSpeed=2701
DeviceID=CPU0
Family=205
L2CacheSize=1024
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=2701
Name=Intel(R) Core(TM) i5-6400 CPU @ 2.70GHz
ProcessorType=3
Revision=24067
Versions of relevant libraries:
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.23.5
[pip3] onnx==1.16.1
[pip3] onnxruntime==1.18.0
[pip3] torch==2.0.1+cu118
[pip3] torchaudio==2.0.2
[pip3] torchvision==0.15.2
[conda] Could not collect
cc @peterjc123 @mszhanyi @skyline75489 @nbcsm @vladimir-aubrecht @iremyux @Blackhex @cristianPanaite @malfet @zou3519