# Tester service

FastAPI uses ASGI and asyncio to provide asynchronous server capabilities.

Here, we do an analysis on how FastAPI is properly used and how it performs against a similar Flask setup.

We want to measure a few metrics:

- Flask/Gunicorn vs FastAPI/Uvicorn on a control test with a CPU bound task (to ensure similar serverside execution times). Metrics : number of spawned processes and performance on N requests where N > workers-count.  
- Flask vs FastAPI with IO bound tasks. Metrics : number of spawned processes and performance on N requests where N > workers-count.  
- FastAPI bad use-cases vs proper async usage given a CPU bound task.  
- Additionally we look at how FastAPI and Uvicorn spawn processes on the server machine under different usage contexts (ex: development with reload vs production with no reload).  

In [1]:
import requests
import asyncio
import httpx
from time import perf_counter
import nest_asyncio
nest_asyncio.apply()

# Config

## Tests design

We use a Tester service and 3 Target Services (can be expanded to more cases):
- Tester : Jupyter notebook with analysis.
- Target-FastAPI:  FastAPI app served over Uvicorn app server configured as single worker.  
- Target-FlaskSingle:  Flask app served over Gunicorn app server configured for 1xCPU cores.  
- Target-FlaskDual:  Flask app served over Gunicorn app server configured for 2xCPU cores.  

All services run under the same Docker network and are accessible via docker DNS.

In [2]:
# Uses Docker DNS for service addressing
FASTAPI_SERVICE = "fastapianalytics:5000"
FLASK_SINGLE_SERVICE = "flask1xanalytics:5000"
FLASK_DUAL_SERVICE = "flask2xanalytics:5000"

In [3]:
LONG_HTTP_TIMEOUT = 300 # seconds -> for long CPU bound runs, adjust http timeouts. 
# Async calls that run sync on server (badly configured async) need to be totalled up because they are scheduled and start together but a blocked server
# will run them sequentially, so the timeouts will add up

## Test APIs

All target services provide the same API urls for synchronous requests.  
Only FastAPI target provides the async test apis.


### Calibration step
The first test is actually a calibration to server-machine resources: we run this to get an evaluation on how long the server needs for one calculation (varies with available CPU and memory, external processes, ..). 

### Baseline duration
Based on the calibration value we can then determine the number of iterations required to have a specific duration for server operations.

### Performance Test configuration
Each test then sends a specific number of requests to a target, where each request is configured to have a specific duration for the serverside processing (i.e. each request takes X seconds to process serverside) .

The tests are run in sequence, not parallel on the target servers. This is because we are using a single physical machine (services run in a Docker Network hosted on one machine) and services will compete for resources.

In [4]:
CALIBRATION_API = "test/calibrate"
SYNC_API_PERFORMANCE = "test/sync-cpubound/{}" # format the string and add specific duration for each test
BAD_ASYNC_API_PERFORMANCE = "test/badasync-cpubound/{}" # ONLY on the FastAPI, Flask is always synchronous
RESMON_API = "test/resmon" # format the string and add specific duration for each test


Set up a convenience function to construct our urls

In [5]:
def get_url(service, api, iterations=None):
    partial = '/'.join(['http:/', service, api])
    return partial.format(iterations) if iterations else partial
# TESTS
print(get_url(FASTAPI_SERVICE, CALIBRATION_API))
print(get_url(FASTAPI_SERVICE, SYNC_API_PERFORMANCE, 2))

http://fastapianalytics:5000/test/calibrate
http://fastapianalytics:5000/test/sync-cpubound/2


# Calibration CPU bound tests

Since response times may vary on host performance at a given moment, use an average over N requests to determine calibration

In [6]:
AVGSAMPLES_FOR_CALIB = 50

Prepare a convenience function to send http requests with or without iterations parameter

In [7]:
import aiohttp
async def http_send_async(url, client: aiohttp.ClientSession, iterations:int=0):
        async with client.get(url) as resp:
            return await resp.json()

# async with aiohttp.ClientSession() as client:
#     result = await http_send_async(get_url(FASTAPI_SERVICE, CALIBRATION_API), client)
# result

In [8]:
async def http_send_async(url, client: httpx.AsyncClient, iterations:int=0):
    # await asyncio.sleep(2)
    await client.get(url, timeout= LONG_HTTP_TIMEOUT)
    # response.raise_for_status()  # Raise an exception for bad status codes
    # return response.json()

# TEST
async with httpx.AsyncClient() as client:
   result = await http_send_async(get_url(FASTAPI_SERVICE, CALIBRATION_API), client)
print(result)

None


In [11]:
limits = httpx.Limits(max_keepalive_connections=20, max_connections=20)
# async def send_n_concurrent_requests(n:int, url: str)-> dict:
async def send_n_concurrent_requests(loop: asyncio.AbstractEventLoop, n:int, url: str)-> dict:
    """
    Schedules N concurrent async http requests on the running event loop
    """
    async with httpx.AsyncClient(limits=limits) as client:
        requests =[client.get(url, timeout= LONG_HTTP_TIMEOUT) for _ in range(n)]
        results = await asyncio.gather(*requests)

    #------------
    # async with aiohttp.ClientSession() as client:
        
        # tasks = [loop.create_task(http_send_async(url, client)) for _ in range(n)]
        # print('After tasks creation: ', len(asyncio.all_tasks()))
        # results= await asyncio.gather(*tasks)

        print(*[r.json() for r in results], sep='\n')
# TESTS
start= perf_counter()
results = await send_n_concurrent_requests(asyncio.get_event_loop(), 5, get_url(FASTAPI_SERVICE, SYNC_API_PERFORMANCE, 50))
# task = asyncio.get_running_loop().create_task(send_n_concurrent_requests( 10, get_url(FASTAPI_SERVICE, SYNC_API_PERFORMANCE, 50)))
# await task
end= perf_counter()
print(end-start)

{'message': '--------\n Received at: 2025-01-04T18:21:59.555367 -- Duration of 50 iterations', 'result': 1.4475434509986371}
{'message': '--------\n Received at: 2025-01-04T18:21:58.064120 -- Duration of 50 iterations', 'result': 1.488456185998075}
{'message': '--------\n Received at: 2025-01-04T18:22:02.403109 -- Duration of 50 iterations', 'result': 1.5088864329991338}
{'message': '--------\n Received at: 2025-01-04T18:22:01.003626 -- Duration of 50 iterations', 'result': 1.3974660179992497}
{'message': '--------\n Received at: 2025-01-04T18:21:56.403780 -- Duration of 50 iterations', 'result': 1.6578565609997895}
7.521668965000572


In [8]:
async def avg_iteration_times(ntests: int, url:str, iterations:int=0):
    """
    Send requests to the target url ntests time and average the returned iteration durations
    """
    total = 0
    async with httpx.AsyncClient() as client:
        for _ in range(ntests):
            res = await http_send_async(url, client) # this will trigger exception for errors
            total +=res.get('result')

    return total/ntests
# TESTS
fa_iter_d = await avg_iteration_times(AVGSAMPLES_FOR_CALIB, get_url(FASTAPI_SERVICE, CALIBRATION_API))
print(f'Average of {AVGSAMPLES_FOR_CALIB} - duration on single iteration: {fa_iter_d} seconds')

Average of 50 - duration on single iteration: 0.03297409542014065 seconds


### Fast API Target

In [142]:
fa_iter_d = await avg_iteration_times(AVGSAMPLES_FOR_CALIB, get_url(FASTAPI_SERVICE, CALIBRATION_API))
print(f'Average of {AVGSAMPLES_FOR_CALIB} runs - duration on single iteration: {fa_iter_d} seconds')

Average of 50 runs - duration on single iteration: 0.03413210183993215 seconds


### Flask targets

In [143]:
fsk1x_iter_d = await avg_iteration_times(AVGSAMPLES_FOR_CALIB, get_url(FLASK_SINGLE_SERVICE, CALIBRATION_API))
print(f'Average of {AVGSAMPLES_FOR_CALIB} runs - duration on single iteration: {fsk1x_iter_d} seconds')

Average of 50 runs - duration on single iteration: 0.04899092547973851 seconds


In [144]:
fsk2x_iter_d = await avg_iteration_times(AVGSAMPLES_FOR_CALIB, get_url(FLASK_DUAL_SERVICE, CALIBRATION_API))
print(f'Average of {AVGSAMPLES_FOR_CALIB} runs - duration on single iteration: {fsk2x_iter_d} seconds')

Average of 50 runs - duration on single iteration: 0.06447064997995768 seconds


### Iterations for N seconds processing time

These do not reflect actual relative performance between services. The resources available for each service, when running the CPU bound task, may differ as do the actual, random values, in the test matrices.  
The values only reflect a close approximation to how many iterations we need, on each service, to have a similarly timed computation.

In [145]:
N = 2 # seconds
faiter = int(N/fa_iter_d)
fsk1xiter = int(N/fsk1x_iter_d)
fsk2xiter = int(N/fsk2x_iter_d)
faiter, fsk1xiter, fsk2xiter

(58, 40, 31)

### Baseline resource monitor snapshot

Get a snapshot of resources used on the target services, when idle but fully started

In [146]:
def get_pcount_from_resmon_str(resmonstr: str)-> int:
    """
    Return a process count (assumes a default configuration of top command, i.e. is a row based parser)
    """
    # scroll lines until processes header line: PID USER PR ..
    rowlist = resmonstr.splitlines()
    headersub = "PID USER      PR  NI    VIRT    RES    SHR"
    try:
        startidx = next(i for i, string in enumerate(rowlist) if headersub in string)
    except StopIteration:
        print("Error: missing header row or header row has different format than expected")
        return -1
    
    return {
            "pcount": len(rowlist[startidx+1:]),
            "rows": rowlist[startidx:]
            }
# TESTS
get_pcount_from_resmon_str(fa_resmon_base.get("result"))

{'pcount': 4,
 'rows': ['  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND',
  '    1 root      20   0   33708  28276  10140 S   0.0   0.4   0:46.99 uvicorn',
  '    7 root      20   0   15920  11856   5952 S   0.0   0.1   0:00.03 python3.13',
  '    8 root      20   0 1605340 260252  20576 S   0.0   3.3   0:27.51 python3.13',
  '   45 root      20   0    9004   4680   2812 R   0.0   0.1   0:00.00 top']}

In [147]:
fa_resmon_base = send_http(get_url(FASTAPI_SERVICE, RESMON_API))
print("Full output:\n", fa_resmon_base.get("result"))
get_pcount_from_resmon_str(fa_resmon_base.get("result"))

Full output:
 top - 16:04:38 up  4:27,  0 user,  load average: 0.17, 0.15, 0.17
Tasks:   4 total,   1 running,   3 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us, 11.1 sy,  0.0 ni, 77.8 id, 11.1 wa,  0.0 hi,  0.0 si,  0.0 st 
MiB Mem :   7749.8 total,   3023.6 free,   4315.1 used,    675.0 buff/cache     
MiB Swap:   2048.0 total,   2048.0 free,      0.0 used.   3434.7 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
    1 root      20   0   33708  28736  10140 S   0.0   0.4   2:15.16 uvicorn
    7 root      20   0   15920  11856   5952 S   0.0   0.1   0:00.03 python3.13
  910 root      20   0 1605312 259304  19744 S   0.0   3.3   0:29.46 python3.13
  942 root      20   0    9004   4720   2848 R   0.0   0.1   0:00.00 top



{'pcount': 4,
 'rows': ['  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND',
  '    1 root      20   0   33708  28736  10140 S   0.0   0.4   2:15.16 uvicorn',
  '    7 root      20   0   15920  11856   5952 S   0.0   0.1   0:00.03 python3.13',
  '  910 root      20   0 1605312 259304  19744 S   0.0   3.3   0:29.46 python3.13',
  '  942 root      20   0    9004   4720   2848 R   0.0   0.1   0:00.00 top']}

In [148]:
fsk_1x_resmon_base = send_http(get_url(FLASK_SINGLE_SERVICE, RESMON_API))
print("Full output:\n", fsk_1x_resmon_base.get("result"))
get_pcount_from_resmon_str(fsk_1x_resmon_base.get("result"))

Full output:
 top - 16:04:38 up  4:27,  0 user,  load average: 0.17, 0.15, 0.17
Tasks:   5 total,   1 running,   4 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st 
MiB Mem :   7749.8 total,   3022.0 free,   4316.2 used,    675.5 buff/cache     
MiB Swap:   2048.0 total,   2048.0 free,      0.0 used.   3433.6 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
    9 root      20   0 1661584 243028  15884 S   6.7   3.1   0:52.18 gunicorn
    1 root      20   0   29716  23924   9876 S   0.0   0.3   0:01.15 gunicorn
    7 root      20   0 1661656 243080  15860 S   0.0   3.1   0:48.93 gunicorn
    8 root      20   0 1661584 243064  15848 S   0.0   3.1   0:53.00 gunicorn
  109 root      20   0    9004   4820   2940 R   0.0   0.1   0:00.00 top



{'pcount': 5,
 'rows': ['  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND',
  '    9 root      20   0 1661584 243028  15884 S   6.7   3.1   0:52.18 gunicorn',
  '    1 root      20   0   29716  23924   9876 S   0.0   0.3   0:01.15 gunicorn',
  '    7 root      20   0 1661656 243080  15860 S   0.0   3.1   0:48.93 gunicorn',
  '    8 root      20   0 1661584 243064  15848 S   0.0   3.1   0:53.00 gunicorn',
  '  109 root      20   0    9004   4820   2940 R   0.0   0.1   0:00.00 top']}

In [149]:
fsk_2x_resmon_base = send_http(get_url(FLASK_DUAL_SERVICE, RESMON_API))
print("Full output:\n", fsk_2x_resmon_base.get("result"))
get_pcount_from_resmon_str(fsk_2x_resmon_base.get("result"))

Full output:
 top - 16:04:39 up  4:27,  0 user,  load average: 0.17, 0.15, 0.17
Tasks:   7 total,   1 running,   6 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us, 12.5 sy,  0.0 ni, 75.0 id,  0.0 wa,  0.0 hi, 12.5 si,  0.0 st 
MiB Mem :   7749.8 total,   3022.0 free,   4316.2 used,    675.5 buff/cache     
MiB Swap:   2048.0 total,   2048.0 free,      0.0 used.   3433.7 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
    1 root      20   0   29716  23632   9976 S   0.0   0.3   0:01.17 gunicorn
    7 root      20   0 1661652 242476  15636 S   0.0   3.1   1:04.18 gunicorn
   39 root      20   0 1661588 242372  15636 S   0.0   3.1   0:48.14 gunicorn
   40 root      20   0 1661588 242352  15636 S   0.0   3.1   0:48.67 gunicorn
   72 root      20   0 1661588 242336  15636 S   0.0   3.1   0:54.79 gunicorn
  104 root      20   0 1661588 242336  15636 S   0.0   3.1   0:59.45 gunicorn
  175 root      20   0    9004   4772   2896 R   0.0   0.1   0:00.0

{'pcount': 7,
 'rows': ['  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND',
  '    1 root      20   0   29716  23632   9976 S   0.0   0.3   0:01.17 gunicorn',
  '    7 root      20   0 1661652 242476  15636 S   0.0   3.1   1:04.18 gunicorn',
  '   39 root      20   0 1661588 242372  15636 S   0.0   3.1   0:48.14 gunicorn',
  '   40 root      20   0 1661588 242352  15636 S   0.0   3.1   0:48.67 gunicorn',
  '   72 root      20   0 1661588 242336  15636 S   0.0   3.1   0:54.79 gunicorn',
  '  104 root      20   0 1661588 242336  15636 S   0.0   3.1   0:59.45 gunicorn',
  '  175 root      20   0    9004   4772   2896 R   0.0   0.1   0:00.00 top']}

# Base performance CPU bound - no parallelization

This test will trigger **single requests** to synchronous apis on all targets.  
These api-endpoints contain simple calls to CPU bound tasks, with no extra logic for parallelization.  
Each request is parametrized to have the number of iterations required to induce a specific processing-time on the target server.  
Here we establish a baseline for future tests where multiple requests are used to check for the target-server ability to handle concurrent requests.  

In [150]:
async with httpx.AsyncClient() as client:
    fa_cpu_base = await http_send_async(get_url(FASTAPI_SERVICE, SYNC_API_PERFORMANCE, faiter), client)
fa_cpu_base

{'message': '--------\n Received at: 2025-01-04T16:05:07.133719 -- Duration of 58 iterations',
 'result': 1.8057356679983059}

In [151]:
async with httpx.AsyncClient() as client:
    fsk1x_cpu_base =await http_send_async(get_url(FLASK_SINGLE_SERVICE, SYNC_API_PERFORMANCE, fsk1xiter), client)
fsk1x_cpu_base

{'message': 'Duration of 40 iterations', 'result': 1.2667629059997125}

In [152]:
async with httpx.AsyncClient() as client:
    fsk2x_cpu_base =await http_send_async(get_url(FLASK_DUAL_SERVICE, SYNC_API_PERFORMANCE, fsk2xiter), client)
fsk2x_cpu_base

{'message': 'Duration of 31 iterations', 'result': 0.9445747890003986}

The above measurements may deviate from the intended processing time, depending on the resources available on the target service container at the time of the request.  
The main point here is to make sure requests are sent faster than they are processed, during concurrent requests tests

# Concurrent requests tests

Here, the challenge is sending concurrent requests, since the normal http_send would by synchronous and block until receiving a response.  
To further complicate things, Jupyter is running its own event loop so we need to tap into it with our tasks.  

In [12]:
loop = asyncio.get_running_loop()
print(loop)
print("Baseline Jupyter scheduled tasks: \n",[(t.get_name(), t.get_coro()) for t in list(asyncio.all_tasks())], sep='\n')

<_UnixSelectorEventLoop running=True closed=False debug=False>
Baseline Jupyter scheduled tasks: 

[('Task-3', <coroutine object Kernel.dispatch_queue at 0x7f56e4103100>)]


In [155]:
async def send_n_concurrent_requests(loop: asyncio.AbstractEventLoop, n:int, url: str)-> dict:
    """
    Schedules N concurrent async http requests on the running event loop
    """
    async with httpx.AsyncClient() as client:
        tasks = [loop.create_task(http_send_async(url, client)) for _ in range(n)]
        print('After tasks creation: ', len(asyncio.all_tasks()))
        results= await asyncio.gather(*tasks)

    return results
# TESTS
start= perf_counter()
results = await send_n_concurrent_requests(asyncio.get_running_loop(), 10, get_url(FASTAPI_SERVICE, SYNC_API_PERFORMANCE, faiter))
end= perf_counter()
print(end-start, *results, sep='\n')

After tasks creation:  12
17.10492639900076
{'message': '--------\n Received at: 2025-01-04T16:06:10.314989 -- Duration of 58 iterations', 'result': 1.7479861669999082}
{'message': '--------\n Received at: 2025-01-04T16:06:13.830969 -- Duration of 58 iterations', 'result': 1.7160713279990887}
{'message': '--------\n Received at: 2025-01-04T16:06:12.064186 -- Duration of 58 iterations', 'result': 1.7654554700002336}
{'message': '--------\n Received at: 2025-01-04T16:06:24.079642 -- Duration of 58 iterations', 'result': 1.6774582809994172}
{'message': '--------\n Received at: 2025-01-04T16:06:15.548613 -- Duration of 58 iterations', 'result': 1.6637734880005155}
{'message': '--------\n Received at: 2025-01-04T16:06:20.699347 -- Duration of 58 iterations', 'result': 1.6885174840008403}
{'message': '--------\n Received at: 2025-01-04T16:06:17.213647 -- Duration of 58 iterations', 'result': 1.7359221469996555}
{'message': '--------\n Received at: 2025-01-04T16:06:22.388590 -- Duration of 58

## Async requests to blocking APIs

Here we test as follows.  
The Flask services handle loads synchronously (blocking the running thread and process) but they also run multiple workers from the beginning. This means that we expect them to out-perform FastAPI on this test.  
The api we are targetting on the FastAPI service is one where the CPU bound task is triggered directly from the api. This should result in blocking the thread & process where the event loop runs. Which means FasAPI will runn all requests in sequence, so response time should increase linearly.  
Note : this is an improper way to trigger a CPU bound task on FastAPI so the behavior is expected.

### FastAPI target w blocking task

In [109]:
fa_times = []
for n in range(1,6):
    start= perf_counter()
    await send_n_concurrent_requests(asyncio.get_running_loop(), n, get_url(FASTAPI_SERVICE, SYNC_API_PERFORMANCE, faiter))
    fa_times.append(perf_counter() - start)

fa_times

[1.9264696720001666,
 3.6781316950000473,
 5.699639510999987,
 7.533728120000887,
 9.365989138999794]

### Flask targets with blocking tasks

In [110]:
fsk1x_times = []
for n in range(1,6):
    start= perf_counter()
    await send_n_concurrent_requests(asyncio.get_running_loop(), n, get_url(FLASK_SINGLE_SERVICE, SYNC_API_PERFORMANCE, faiter))
    fsk1x_times.append(perf_counter() - start)

fsk1x_times

[2.232686190000095,
 2.566646810999373,
 3.1656890220001515,
 4.722645226998793,
 5.347812461999638]

In [112]:
fsk2x_times = []
for n in range(1,6):
    start= perf_counter()
    await send_n_concurrent_requests(asyncio.get_running_loop(), n, get_url(FLASK_DUAL_SERVICE, SYNC_API_PERFORMANCE, faiter))
    fsk2x_times.append(perf_counter() - start)

fsk2x_times

[1.8925547309991089,
 2.375839641001221,
 2.9147490659997857,
 3.1490542699993966,
 3.6605018909995124]

Here we see immediately the difference between the multiple running workers on the Flask-Targets vs the single worker of the FastAPI target IF this single worker is blocked by a CPU bound task.

### FastAPI target with "hidden" blocking task

Here we test a situation where a CPU bound task is nested at a deeper level on the call-tree from a FastAPI app.  
This can happen by accident if a developer calls a CPU bound task from a proper async function.

In [128]:
fa_times = []
results = dict()
for n in range(1,6):
    start= perf_counter()
    results[n] = await send_n_concurrent_requests(asyncio.get_running_loop(), n, get_url(FASTAPI_SERVICE, BAD_ASYNC_API_PERFORMANCE, faiter))
    fa_times.append(perf_counter() - start)

fa_times, results

([2.0685485810008686,
  4.0416611189994,
  5.939327529000366,
  7.786892374000672,
  9.707984624001256],
 {1: [{'message': 'At 2025-01-04T15:46:30.508034 -- duration for 65 iterationss',
    'result': 2.056700623001234,
    'concurrent_tasks': 4}],
  2: [{'message': 'At 2025-01-04T15:46:32.577743 -- duration for 65 iterationss',
    'result': 2.024836127000526,
    'concurrent_tasks': 4},
   {'message': 'At 2025-01-04T15:46:34.603504 -- duration for 65 iterationss',
    'result': 2.0012396090005495,
    'concurrent_tasks': 4}],
  3: [{'message': 'At 2025-01-04T15:46:40.593607 -- duration for 65 iterationss',
    'result': 1.9526234860004479,
    'concurrent_tasks': 4},
   {'message': 'At 2025-01-04T15:46:36.631524 -- duration for 65 iterationss',
    'result': 1.933330689000286,
    'concurrent_tasks': 4},
   {'message': 'At 2025-01-04T15:46:38.566271 -- duration for 65 iterationss',
    'result': 2.0244049220000306,
    'concurrent_tasks': 4}],
  4: [{'message': 'At 2025-01-04T15:46:4

We see the same performance even though each request in a burst is now scheduled to run concurrently