Uvicorn workers working (concurrent) locally, but failing (sequential) on Kubernetes deployment #6477

FayZ676 · 2023-02-27T22:14:41Z

FayZ676
Feb 27, 2023

First Check

I added a very descriptive title here.
I used the GitHub search to find a similar question and didn't find it.
I searched the FastAPI documentation, with the integrated search.
I already searched in Google "How to X in FastAPI" and didn't find any information.
I already read and followed all the tutorial in the docs and didn't find an answer.
I already checked if it is not related to FastAPI but to Pydantic.
I already checked if it is not related to FastAPI but to Swagger UI.
I already checked if it is not related to FastAPI but to ReDoc.

Commit to Help

I commit to help with one of those options 👆

Example Code

# Dockerfile Configuration

FROM <base_image>

COPY requirements.txt /app/requirements.txt
RUN pip install --no-cache-dir --upgrade -r /app/requirements.txt
COPY . /app/

# CMD ["gunicorn", "--preload", "--timeout", "0", "-kuvicorn.workers.UvicornWorker", "-w", "3", "--threads", "8", "--log-file", "-"]
CMD ["uvicorn", "main:app", "--port", "8000", "--host", "0.0.0.0", "--workers", "4"]
# CMD ["gunicorn", "--preload", "--timeout", "0", "-w", "4", "-k", "uvicorn.workers.UvicornWorker", "--threads", "9", "--bind", "0.0.0.0:8000", "main:app"]>

Description

I have an AI based API built with FastAPI. When I run locally in a docker container and test concurrency with Apache Benchmark I get successful results. However, when I deploy to Google Kubernetes Engine the API immediately reverts back to being sequential.

Two things I noted that might be helpful context in diagnosing:

The AI model we are using can only run inference on a single GPU. This works fine on our local dev machines, but maybe there is a system problem in the cloud when trying to create multiple worker processes over a single GPU???
When I look at the startup logs locally I can clearly see 4 separate AI tasks start up. However the cloud logs show 4 tasks begin to start and then only 1 task actually getting started. You can view the logs in the details section below.

I am not sure why uvicorn is able to handle concurrency on a local level, but fail to do so when deployed to GKE. I just want to figure out whether this is a problem with my FastAPI/Uvicorn/Gunicorn implementation or not. Any help would be greatly appreciated!

Operating System

Linux

Operating System Details

No response

FastAPI Version

latest

Python Version

latest

Additional Context

Startup Command

uvicorn main:app --port 8000 --host 0.0.0.0 --workers 4

Local Startup Logs

Docker Run Command: docker run --rm --gpus all -p 8000:8000 <image_name>

==========
== CUDA ==
==========

CUDA Version 11.8.0

Container image Copyright (c) 2016-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO:     Started parent process [1]
2023-02-27 20:46:50 | INFO | fairseq.tasks.text_to_speech | Please install tensorboardX: pip install tensorboardX
2023-02-27 20:46:50 | INFO | fairseq.tasks.text_to_speech | Please install tensorboardX: pip install tensorboardX
2023-02-27 20:46:50 | INFO | fairseq.tasks.text_to_speech | Please install tensorboardX: pip install tensorboardX
2023-02-27 20:46:50 | INFO | fairseq.tasks.text_to_speech | Please install tensorboardX: pip install tensorboardX
2023-02-27 20:46:52 | INFO | tasks.ofa_task | source dictionary: 59457 types
2023-02-27 20:46:52 | INFO | tasks.ofa_task | target dictionary: 59457 types
2023-02-27 20:46:52 | INFO | tasks.ofa_task | source dictionary: 59457 types
2023-02-27 20:46:52 | INFO | tasks.ofa_task | target dictionary: 59457 types
2023-02-27 20:46:52 | INFO | tasks.ofa_task | source dictionary: 59457 types
2023-02-27 20:46:52 | INFO | tasks.ofa_task | target dictionary: 59457 types
2023-02-27 20:46:53 | INFO | tasks.ofa_task | source dictionary: 59457 types
2023-02-27 20:46:53 | INFO | tasks.ofa_task | target dictionary: 59457 types
/usr/local/lib/python3.10/dist-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/usr/local/lib/python3.10/dist-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/usr/local/lib/python3.10/dist-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/usr/local/lib/python3.10/dist-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/usr/local/lib/python3.10/dist-packages/torchvision/transforms/transforms.py:329: UserWarning: Argument 'interpolation' of type int is deprecated since 0.13 and will be removed in 0.15. Please use InterpolationMode enum.
  warnings.warn(
INFO:     Started server process [29]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
/usr/local/lib/python3.10/dist-packages/torchvision/transforms/transforms.py:329: UserWarning: Argument 'interpolation' of type int is deprecated since 0.13 and will be removed in 0.15. Please use InterpolationMode enum.
  warnings.warn(
INFO:     Started server process [28]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
/usr/local/lib/python3.10/dist-packages/torchvision/transforms/transforms.py:329: UserWarning: Argument 'interpolation' of type int is deprecated since 0.13 and will be removed in 0.15. Please use InterpolationMode enum.
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/torchvision/transforms/transforms.py:329: UserWarning: Argument 'interpolation' of type int is deprecated since 0.13 and will be removed in 0.15. Please use InterpolationMode enum.
  warnings.warn(
INFO:     Started server process [30]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Started server process [31]
INFO:     Waiting for application startup.
INFO:     Application startup complete.

Cloud Startup Logs

2023-02-27 13:47:29.987 PST
==========
2023-02-27 13:47:29.987 PST
== CUDA ==
2023-02-27 13:47:29.987 PST
==========
2023-02-27 13:47:30.001 PST
{}
2023-02-27 13:47:30.001 PST
CUDA Version 11.8.0
2023-02-27 13:47:30.004 PST
{}
2023-02-27 13:47:30.004 PST
Container image Copyright (c) 2016-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2023-02-27 13:47:30.007 PST
{}
2023-02-27 13:47:30.007 PST
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
2023-02-27 13:47:30.007 PST
By pulling and using the container, you accept the terms and conditions of this license:
2023-02-27 13:47:30.007 PST
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
2023-02-27 13:47:30.007 PST
{}
2023-02-27 13:47:30.007 PST
A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
2023-02-27 13:47:30.040 PST
{}
2023-02-27 13:47:30.569 PST
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
2023-02-27 13:47:30.569 PST
INFO: Started parent process [1]
2023-02-27 13:47:35.489 PST
2023-02-27 21:47:35 | INFO | fairseq.tasks.text_to_speech | Please install tensorboardX: pip install tensorboardX
2023-02-27 13:47:35.492 PST
2023-02-27 21:47:35 | INFO | fairseq.tasks.text_to_speech | Please install tensorboardX: pip install tensorboardX
2023-02-27 13:47:35.573 PST
2023-02-27 21:47:35 | INFO | fairseq.tasks.text_to_speech | Please install tensorboardX: pip install tensorboardX
2023-02-27 13:47:35.666 PST
2023-02-27 21:47:35 | INFO | fairseq.tasks.text_to_speech | Please install tensorboardX: pip install tensorboardX
2023-02-27 13:48:13.733 PST
2023-02-27 21:48:13 | INFO | tasks.ofa_task | source dictionary: 59457 types
2023-02-27 13:48:13.733 PST
2023-02-27 21:48:13 | INFO | tasks.ofa_task | target dictionary: 59457 types
2023-02-27 13:48:18.151 PST
/usr/local/lib/python3.10/dist-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
2023-02-27 13:48:18.151 PST
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
2023-02-27 13:48:30.224 PST
/usr/local/lib/python3.10/dist-packages/torchvision/transforms/transforms.py:329: UserWarning: Argument 'interpolation' of type int is deprecated since 0.13 and will be removed in 0.15. Please use InterpolationMode enum.
2023-02-27 13:48:30.224 PST
warnings.warn(
2023-02-27 13:48:30.739 PST
INFO: Started server process [28]
2023-02-27 13:48:30.739 PST
INFO: Waiting for application startup.
2023-02-27 13:48:30.739 PST
INFO: Application startup complete.

Apache Benchmark Results: Local

Concurrency works fine.

Apache Benchmark Results: Cloud

Failed concurrency. Results always come back sequential.

Answered by Kludex

Jan 23, 2024

This is not a FastAPI issue.

View full answer

Kludex · 2024-01-23T08:30:46Z

Kludex
Jan 23, 2024
Collaborator

This is not a FastAPI issue.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uvicorn workers working (concurrent) locally, but failing (sequential) on Kubernetes deployment #6477

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Uvicorn workers working (concurrent) locally, but failing (sequential) on Kubernetes deployment #6477

Uh oh!

Uh oh!

FayZ676 Feb 27, 2023

First Check

Commit to Help

Example Code

Description

Operating System

Operating System Details

FastAPI Version

Python Version

Additional Context

Startup Command

Local Startup Logs

Cloud Startup Logs

Apache Benchmark Results: Local

Apache Benchmark Results: Cloud

Replies: 1 comment

Uh oh!

Kludex Jan 23, 2024 Collaborator

FayZ676
Feb 27, 2023

Kludex
Jan 23, 2024
Collaborator