# Tester service

FastAPI uses ASGI and asyncio to provide asynchronous server capabilities.

Here, we do an analysis on how FastAPI is properly used and how it performs against a similar Flask setup.

We want to measure a few metrics:

- Flask/Gunicorn vs FastAPI/Uvicorn on a control test with a CPU bound task (to ensure similar serverside execution times). Metrics : number of spawned processes and performance on N requests where N > workers-count.  
- Flask vs FastAPI with IO bound tasks. Metrics : number of spawned processes and performance on N requests where N > workers-count.  
- FastAPI bad use-cases vs proper async usage given a CPU bound task.  
- Additionally we look at how FastAPI and Uvicorn spawn processes on the server machine under different usage contexts (ex: development with reload vs production with no reload).  

In [1]:
import requests

# Config

## Tests design

We use a Tester service and 3 Target Services (can be expanded to more cases):
- Tester : Jupyter notebook with analysis.
- Target-FastAPI:  FastAPI app served over Uvicorn app server configured as single worker.  
- Target-FlaskSingle:  Flask app served over Gunicorn app server configured for 1xCPU cores.  
- Target-FlaskDual:  Flask app served over Gunicorn app server configured for 2xCPU cores.  

All services run under the same Docker network and are accessible via docker DNS.

In [2]:
# Uses Docker DNS for service addressing
FASTAPI_SERVICE = "fastapianalytics:5000"
FLASK_SINGLE_SERVICE = "flask1xanalytics:5000"
FLASK_DUAL_SERVICE = "flask2xanalytics:5000"

## Test APIs

All target services provide the same API urls for synchronous requests.  
Only FastAPI target provides the async test apis.


### Calibration step
The first test is actually a calibration to server-machine resources: we run this to get an evaluation on how long the server needs for one calculation (varies with available CPU and memory, external processes, ..). 

### Baseline duration
Based on the calibration value we can then determine the number of iterations required to have a specific duration for server operations.

### Performance Test configuration
Each test then sends a specific number of requests to a target, where each request is configured to have a specific duration for the serverside processing (i.e. each request takes X seconds to process serverside) .

The tests are run in sequence, not parallel on the target servers. This is because we are using a single physical machine (services run in a Docker Network hosted on one machine) and services will compete for resources.

In [3]:
CALIBRATION_API = "test/calibrate"
SYNC_API_PERFORMANCE = "test/sync-cpubound/{}" # format the string and add specific duration for each test

Set up a convenience function to construct our urls

In [4]:
def get_url(service, api, iterations=None):
    partial = '/'.join(['http:/', service, api])
    return partial.format(iterations) if iterations else partial
# TESTS
print(get_url(FASTAPI_SERVICE, CALIBRATION_API))
print(get_url(FASTAPI_SERVICE, SYNC_API_PERFORMANCE, 2))

http://fastapianalytics:5000/test/calibrate
http://fastapianalytics:5000/test/sync-cpubound/2


# Calibration CPU bound tests

Since response times may vary on host performance at a given moment, use an average over N requests to determine calibration

In [5]:
AVGSAMPLES_FOR_CALIB = 50

Prepare a convenience function to send http requests with or without iterations parameter

In [6]:
def send_http(url, iterations:int=0):
    try:
        
        if iterations: # if there is a specific number of iterations, add it to the url
            url = '/'.join([url, str(iterations)])
        response = requests.request('GET', url)
        response.raise_for_status()  # Raise an exception for bad status codes
        return response.json()
    except requests.exceptions.RequestException as e:
        print(f"Error sending request: {e}")
        raise e
# TEST
send_http(get_url(FASTAPI_SERVICE, CALIBRATION_API))

{'message': 'Duration of one iteration', 'result': 0.0886105999998108}

### Fast API Target

In [8]:
fa_iter_d = sum([send_http(get_url(FASTAPI_SERVICE, CALIBRATION_API)).get('result') for _ in range(AVGSAMPLES_FOR_CALIB)])/AVGSAMPLES_FOR_CALIB
print(f'Average of {AVGSAMPLES_FOR_CALIB} - duration on single iteration: {fa_iter_d} seconds')

Average of 50 - duration on single iteration: 0.0348293459200795 seconds


### Flask targets

In [9]:
fsk1x_iter_d = sum([send_http(get_url(FLASK_SINGLE_SERVICE, CALIBRATION_API)).get('result') for _ in range(AVGSAMPLES_FOR_CALIB)])/AVGSAMPLES_FOR_CALIB
print(f'Average of {AVGSAMPLES_FOR_CALIB} - duration on single iteration: {fsk1x_iter_d} seconds')

Average of 50 - duration on single iteration: 0.02992334980008309 seconds


In [10]:
fsk2x_iter_d = sum([send_http(get_url(FLASK_DUAL_SERVICE, CALIBRATION_API)).get('result') for _ in range(AVGSAMPLES_FOR_CALIB)])/AVGSAMPLES_FOR_CALIB
print(f'Average of {AVGSAMPLES_FOR_CALIB} - duration on single iteration: {fsk2x_iter_d} seconds')

Average of 50 - duration on single iteration: 0.030408311340288492 seconds


### Iterations for N seconds processing time

These do not reflect actual relative performance between services. The resources available for each service, when running the CPU bound task, may differ as do the actual, random values, in the test matrices.  
The values only reflect a close approximation to how many iterations we need, on each service, to have a similarly timed computation.

In [11]:
N = 2 # seconds
faiter = int(N/fa_iter_d)
fsk1xiter = int(N/fsk1x_iter_d)
fsk2xiter = int(N/fsk2x_iter_d)
faiter, fsk1xiter, fsk2xiter

(57, 66, 65)

# Base performance CPU bound - no parallelization

This test will trigger **single requests** to synchronous apis on all targets.  
These api-endpoints contain simple calls to CPU bound tasks, with no extra logic for parallelization.  
Each request is parametrized to have the number of iterations required to induce a specific processing-time on the target server.  
Here we establish a baseline for future tests where multiple requests are used to check for the target-server ability to handle concurrent requests.  

In [12]:
fa_cpu_base =send_http(get_url(FASTAPI_SERVICE, SYNC_API_PERFORMANCE, faiter))
fa_cpu_base

{'message': 'Duration of 57 iterations', 'result': 1.6083624960010638}

In [13]:
fsk1x_cpu_base =send_http(get_url(FLASK_SINGLE_SERVICE, SYNC_API_PERFORMANCE, fsk1xiter))
fsk1x_cpu_base

{'message': 'Duration of 66 iterations', 'result': 1.8777725399995688}

In [14]:
fsk2x_cpu_base =send_http(get_url(FLASK_DUAL_SERVICE, SYNC_API_PERFORMANCE, fsk2xiter))
fsk2x_cpu_base

{'message': 'Duration of 65 iterations', 'result': 1.929555138998694}

We can already observe some pretty big differences between FastAPI and Flask performance: even though on a single iteration the FastAPI target was slower, when iterating multiple times the performance actually improved over both Flask Targets.  
The relative performance between the 2 Flask Targets seems to be consistent to what we see on the single calibration measurements: tha one faster during calibration is still faster over multiple iterations.