Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker compose up fails with RuntimeError: Unexpected error from cudaGetDeviceCount() #128314

Open
stefanseeger opened this issue Jun 9, 2024 · 0 comments
Labels
module: build Build system issues module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@stefanseeger
Copy link

stefanseeger commented Jun 9, 2024

πŸ› Describe the bug

Fooocus does not appear to work anymore when ran inside docker with the latest nvidia drivers.

Run docker compose up

Stack trace:

$ docker compose up
[+] Running 2/2
 βœ” Network fooocus_default  Created                                                                                                                                                                                                                          0.1s 
 βœ” Container fooocus-app-1  Created                                                                                                                                                                                                                          0.2s 
Attaching to app-1
app-1  | [System ARGV] ['launch.py', '--listen']
app-1  | Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
app-1  | Fooocus version: 2.4.3
app-1  | Environment: config_path = /content/data/config.txt                                                                                                                                                                                                      
app-1  | Environment: config_example_path = /content/data/config_modification_tutorial.txt
app-1  | Environment: path_checkpoints = /content/data/models/checkpoints/
app-1  | Environment: path_loras = /content/data/models/loras/                                                                                                                                                                                                    
app-1  | Environment: path_embeddings = /content/data/models/embeddings/
app-1  | Environment: path_vae_approx = /content/data/models/vae_approx/                                                                                                                                                                                          
app-1  | Environment: path_upscale_models = /content/data/models/upscale_models/
app-1  | Environment: path_inpaint = /content/data/models/inpaint/                                                                                                                                                                                                
app-1  | Environment: path_controlnet = /content/data/models/controlnet/
app-1  | Environment: path_clip_vision = /content/data/models/clip_vision/                                                                                                                                                                                        
app-1  | Environment: path_fooocus_expansion = /content/data/models/prompt_expansion/fooocus_expansion/
app-1  | Environment: path_outputs = /content/app/outputs/                                                                                                                                                                                                        
app-1  | [Cleanup] Attempting to delete content of temp dir /tmp/fooocus
app-1  | [Cleanup] Cleanup successful                                                                                                                                                                                                                             
app-1  | Traceback (most recent call last):
app-1  |   File "/content/app/launch.py", line 140, in <module>
app-1  |     from webui import *                                                                                                                                                                                                                                  
app-1  |   File "/content/app/webui.py", line 10, in <module>                                                                                                                                                                                                     
app-1  |     import modules.async_worker as worker                                                                                                                                                                                                                
app-1  |   File "/content/app/modules/async_worker.py", line 3, in <module>
app-1  |     from modules.patch import PatchSettings, patch_settings, patch_all                                                                                                                                                                                   
app-1  |   File "/content/app/modules/patch.py", line 5, in <module>                                                                                                                                                                                              
app-1  |     import ldm_patched.modules.model_base
app-1  |   File "/content/app/ldm_patched/modules/model_base.py", line 2, in <module>                                                                                                                                                                             
app-1  |     from ldm_patched.ldm.modules.diffusionmodules.openaimodel import UNetModel, Timestep                                                                                                                                                                 
app-1  |   File "/content/app/ldm_patched/ldm/modules/diffusionmodules/openaimodel.py", line 15, in <module>                                                                                                                                                      
app-1  |     from ..attention import SpatialTransformer, SpatialVideoTransformer, default
app-1  |   File "/content/app/ldm_patched/ldm/modules/attention.py", line 9, in <module>                                                                                                                                                                          
app-1  |     from .sub_quadratic_attention import efficient_dot_product_attention                                                                                                                                                                                 
app-1  |   File "/content/app/ldm_patched/ldm/modules/sub_quadratic_attention.py", line 27, in <module>                                                                                                                                                           
app-1  |     from ldm_patched.modules import model_management
app-1  |   File "/content/app/ldm_patched/modules/model_management.py", line 121, in <module>                                                                                                                                                                     
app-1  |     total_vram = get_total_memory(get_torch_device()) / (1024 * 1024)                                                                                                                                                                                    
app-1  |   File "/content/app/ldm_patched/modules/model_management.py", line 90, in get_torch_device
app-1  |     return torch.device(torch.cuda.current_device())                                                                                                                                                                                                     
app-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 769, in current_device                                                                                                                                                     
app-1  |     _lazy_init()                                                                                                                                                                                                                                         
app-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 298, in _lazy_init
app-1  |     torch._C._cuda_init()                                                                                                                                                                                                                                
app-1  | RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 500: named symbol not found                                                        
app-1 exited with code 1

Versions

PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 11 Pro
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.10.5 (tags/v3.10.5:f377153, Jun 6 2022, 16:14:13) [MSC v.1929 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.22631-SP0
Is CUDA available: N/A
CUDA runtime version: 9.1.85
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1080
Nvidia driver version: 555.99
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: N/A

CPU:
Architecture=9
CurrentClockSpeed=3600
DeviceID=CPU0
Family=107
L2CacheSize=3072
L2CacheSpeed=
Manufacturer=AuthenticAMD
MaxClockSpeed=3600
Name=AMD Ryzen 5 3600 6-Core Processor
ProcessorType=3
Revision=28928

Versions of relevant libraries:
[pip3] numpy==1.23.5
[pip3] pytorch-lightning==1.9.4
[pip3] torchmetrics==1.2.0
[pip3] torchsde==0.2.5
[conda] Could not collect

cc @malfet @seemethere @ptrblck @msaroufim

@bdhirsh bdhirsh added module: build Build system issues module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Jun 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: build Build system issues module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

2 participants