-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[In Gunicorn & multiprocessing environment] Cannot re-initialize CUDA in forked subprocess #68861
Comments
Not familiar with Gunicorn. You may want to set |
I looked through the source code of gunicorn and torch, the reason why this error happened is that gunicorn will fork the worker process , and torch will set a forked child process in_bad_fork. Explaination
I made some modifications to gunicorn and try to dig in to it , and indeed , after fork ,the child process's Solutionwhat we have to do is to clear this change your code in your flask app from
to
|
Hi @wayneleif would you mind to expand the above discussion I have no idea what could |
For anyone struggle works with gunicorn + pytorch, this might be useful. The key is not run any cuda relate code in master process(including torch.cuda.device_count()
torch.cuda.is_available()
torch.tensor(1).cuda()
torch.autocast("cuda") If you still got error after check you code, this issue might be helpful: #17199 Full code: import torch
from gunicorn.app.base import BaseApplication
class StandaloneApplication(BaseApplication):
def __init__(self, app, options=None):
self.options = options or {}
self.application = app
super().__init__()
def load_config(self):
config = {key: value for key, value in self.options.items()
if key in self.cfg.settings and value is not None}
for key, value in config.items():
self.cfg.set(key.lower(), value)
def load(self):
return self.application
# Executing any of the following lines will result in Cannot re-initialize CUDA in forked subprocess error
# torch.cuda.device_count()
# torch.cuda.is_available()
# torch.tensor(1).cuda()
def post_worker_init(worker):
conv = torch.nn.Conv2d(330, 330, 3, 3, 3, 3)
conv.cuda()
print("finish post_worker_init")
def handler_app(environ, start_response):
response_body = b'Works fine'
status = '200 OK'
response_headers = [
('Content-Type', 'text/plain'),
]
start_response(status, response_headers)
return [response_body]
def main():
host = "0.0.0.0"
port = 7860
options = {
'bind': f"{host}:{port}",
'workers': 1,
'worker_class': 'uvicorn.workers.UvicornWorker',
'post_worker_init': post_worker_init,
'timeout': 120,
}
StandaloneApplication(handler_app, options).run()
if __name__ == "__main__":
main() |
I am getting "RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method"
I have developed a REST API (Gunicorn; Gevent; Flask; Python) which runs a model loaded by PyTorch, and has multiple workers and threads to support parallel executions.
I receive the above-mentioned error when I call the model, e.g.: model_name(imgs)
Versions:
Python=3.8.8
Gunicorn=20.1.0
Gevent=21.1.2
Flask=2.0.0
torch=1.8.1
I have tried to do the following:
Any help will be appreciated.
cc @VitalyFedyunin
The text was updated successfully, but these errors were encountered: