FileNotFoundError: Could not find module when trying to load C++ extension

### 🐛 Describe the bug

I'm trying to JIT compile following blank C++ extension:


**File test.py:**

```python
import os
from torch.utils.cpp_extension import load

sources = [
    os.path.join(os.path.dirname(__file__), "csrc", "cuda_impl.cpp"),
    os.path.join(os.path.dirname(__file__), "csrc", "cuda_kernel.cu"),
]
load(
    name="my_cuda",
    sources=sources,
    extra_cflags=['-O3'],
    is_python_module=False,
    verbose=False
)
print("OK")
```

**File csrc/cuda_impl.cpp:**
```
#include <torch/script.h>
#include <vector>

void cuda_forward_simple();


void cuda_backward_simple();

std::vector<at::Tensor> forward_simple(const at::Tensor & inputs) {
    return {inputs};
}

std::vector<at::Tensor> backward_simple(const at::Tensor & grad_inputs, const at::Tensor & inputs) {
    return {grad_inputs};
}

static auto registory_fwd_v1 =
    torch::RegisterOperators("my_cuda::forward_simple", &forward_simple);
static auto registory_bwd_v1 = 
    torch::RegisterOperators("my_cuda::backward_simple", &backward_simple);
```

**File csrc/cuda_kernel.cu**
```
#include <torch/types.h>

#include <cuda.h>
#include <cuda_runtime.h>

namespace {

template <typename scalar_t>
__global__ void cuda_forward_kernel_simple()
{
}

template <typename scalar_t>
__global__ void cuda_backward_kernel_simple()
{

}

}

void cuda_forward_simple(const at::Tensor & inputs) {
    AT_DISPATCH_FLOATING_TYPES_AND_HALF(inputs.type(), "forward_cuda_simple", ([&] {
        cuda_forward_kernel_simple<scalar_t><<<1, 1>>>();
    }));
}

void cuda_backward_simple(at::Tensor & grad_input, const at::Tensor & inputs) {
    AT_DISPATCH_FLOATING_TYPES_AND_HALF(inputs.type(), "backward_cuda_simple", ([&] {
        cuda_backward_kernel_simple<scalar_t><<<1, 1>>>();
    }));
}
```

But I get following error:

```
Traceback (most recent call last):
  File "C:\Users\1\PycharmProjects\test\troch_test.py", line 8, in <module>
    load(
  File "C:\Users\1\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\utils\cpp_extension.py", line 1284, in load
    return _jit_compile(
  File "C:\Users\1\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\utils\cpp_extension.py", line 1535, in _jit_compile
    return _import_module_from_library(name, build_directory, is_python_module)
  File "C:\Users\1\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\utils\cpp_extension.py", line 1934, in _import_module_from_library
    torch.ops.load_library(filepath)
  File "C:\Users\1\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\_ops.py", line 643, in load_library
    ctypes.CDLL(path)
  File "C:\Users\1\AppData\Local\Programs\Python\Python39\lib\ctypes\__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
FileNotFoundError: Could not find module 'C:\Users\1\AppData\Local\torch_extensions\torch_extensions\Cache\py39_cu118\my_cuda\my_cuda.pyd' (or one of its dependencies). Try using the full path with constructor syntax.
```

C++ extension compilation succeeds, but Python is unable to load results my_cuda.pyd file.
Previously I've used SRU custom torch layer ( https://github.com/asappresearch/sru ), and it loads fine, so the problem is somewhere on my machine, but I can't understand where. I know I have mismatched CUDA version and torch-cuda version (because I had to install newer version of pytorch on a different enviroment), but it didn't cause any problem with SRU layer. 
Can you help me?

### Versions

Collecting environment information...
PyTorch version: 2.0.1+cu118
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A

OS: Майкрософт Windows 10 Домашняя
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:36:42) [MSC v.1929 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19045-SP0
Is CUDA available: True
CUDA runtime version: 12.5.40

CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 2060
Nvidia driver version: 555.85
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture=9


CurrentClockSpeed=2701


DeviceID=CPU0


Family=205


L2CacheSize=1024


L2CacheSpeed=


Manufacturer=GenuineIntel


MaxClockSpeed=2701


Name=Intel(R) Core(TM) i5-6400 CPU @ 2.70GHz


ProcessorType=3


Revision=24067

Versions of relevant libraries:
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.23.5
[pip3] onnx==1.16.1
[pip3] onnxruntime==1.18.0
[pip3] torch==2.0.1+cu118
[pip3] torchaudio==2.0.2
[pip3] torchvision==0.15.2
[conda] Could not collect


cc @peterjc123 @mszhanyi @skyline75489 @nbcsm @vladimir-aubrecht @iremyux @Blackhex @cristianPanaite @malfet @zou3519

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FileNotFoundError: Could not find module when trying to load C++ extension #130737

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

FileNotFoundError: Could not find module when trying to load C++ extension #130737

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions