Skip to content

lukehsiao/axum-fastapi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📊
FastAPI vs Axum Benchmark with Postgres

A simple comparison of Python/FastAPI/SQLAlchemy vs Rust/Axum/sqlx.


This repo contains two implementations of a very simple web server.

Contents

What the servers do

In both cases, the server fetches users from the users table with the following query and returns the results.

SELECT * FROM "users" ORDER BY user_id LIMIT 100

Postgres database is seeded with 2000 users using the script in scripts/init_db.sh. It is run with docker, and configured to support up to 1000 connections (though both servers only use connection pools of size 5).

SQLAlchemy

This is why it’s perfectly fine for create_engine() to default to using a QueuePool of size five without regard to whether or not the application really needs five connections queued up - the pool would only grow to that size if the application actually used five connections concurrently, in which case the usage of a small pool is an entirely appropriate default behavior.

In Rust, we set the max_connections to 5 to match.

The FastAPI server

The FastAPI server is modeled almost directly from the FastAPI tutorial on SQL databases. When benchmarking, we run it with uvicorn and a single worker (the default). While this may seem somewhat unfair (throughput and latency improve with more workers), this is FastAPI's recommendation when running in docker on k8s, as many people do.

In particular, when running on Kubernetes you will probably not want to use Gunicorn and instead run a single Uvicorn process per container...

Increasing the number of workers to N improves throughput and latency, but also multiplies memory usage by N, as each worker runs its own process. As is typically done with FastAPI, we use SQLAlchemy and Pydantic for structured responses.

The Full Async FastAPI server

This FastAPI server takes a different, more optimal approach of doing everything asynchronously. It deviates more from the FastAPI tutorial, but is also very simple, and actually more structurally similar to the Axum server. When benchmarking, we still run it with uvicorn and a single worker (the default).

The Axum server

The Axum server is modeled almost directly from the Axum example for sqlx and postgres.

Although Rust does have ORMs (e.g., diesel, SeaORM), the compile-time checking of SQLx means that many applications get by without a full-fledged ORM. This repository could be modified to use diesel as well, since Axum provides similar examples. But, that is left as an exercise to the reader.

Modifying the code

In both cases, the code is extremely basic, and should be easy to tweak and experiment with.

Example Benchmark Results

On my personal PC with 64 GB of DDR5 RAM and a Ryzen 7 7800X3D (8-core, 16-thread), I saw the following. Server and postgres all running locally.

Here's a table comparing the results

Metric FastAPI FastAPI (async) Axum
Throughput (rps) 612 2267 15363
50% latency (ms) 15.4 2.2 0.6
99% latency (ms) 29.1 2.5 0.9
99.9% latency (ms) 33.4 3.1 1.0
Peak Memory Usage (MiB) 78 69 11
Peak CPU Usage (%) 7.0 5.9 15.9

Comparing to the synchronous FastAPI baseline specifically, we find the following improvements (×).

Metric FastAPI FastAPI (async) Axum
Throughput (×) 1 3.70 25.1
50% latency (1/×) 1 7.0 25.7
99% latency (1/×) 1 11.7 32.3
99.9% latency (1/×) 1 10.8 33.4
Peak Memory Usage (1/×) 1 1.1 7.1
Peak CPU Usage (×) 1 0.8 2.3

FastAPI

Details

oha -n 50000 -c 10 --disable-keepalive http://localhost:8000/
Summary:
  Success rate: 100.00%
  Total:        81.7200 secs
  Slowest:      0.0383 secs
  Fastest:      0.0051 secs
  Average:      0.0163 secs
  Requests/sec: 611.8453

  Total data:   490.14 MiB
  Size/request: 10
  Size/sec:     6.00 MiB

Response time histogram:
  0.005 [1]     |
  0.008 [29]    |
  0.012 [1328]  |■■
  0.015 [20848] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.018 [18842] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.022 [3972]  |■■■■■■
  0.025 [2614]  |■■■■
  0.028 [1685]  |■■
  0.032 [533]   |
  0.035 [124]   |
  0.038 [24]    |

Response time distribution:
  10.00% in 0.0130 secs
  25.00% in 0.0141 secs
  50.00% in 0.0154 secs
  75.00% in 0.0173 secs
  90.00% in 0.0217 secs
  95.00% in 0.0249 secs
  99.00% in 0.0291 secs
  99.90% in 0.0334 secs
  99.99% in 0.0374 secs

Details (average, fastest, slowest):
  DNS+dialup:   0.0001 secs, 0.0000 secs, 0.0005 secs
  DNS-lookup:   0.0000 secs, 0.0000 secs, 0.0004 secs

Status code distribution:
  [200] 50000 responses

FastAPI (async)

Details

oha -n 50000 -c 10 --disable-keepalive http://localhost:8000/
Summary:
  Success rate: 100.00%
  Total:        22.0537 secs
  Slowest:      22.0526 secs
  Fastest:      0.0019 secs
  Average:      0.0044 secs
  Requests/sec: 2267.1906

  Total data:   490.14 MiB
  Size/request: 10
  Size/sec:     22.22 MiB

Response time histogram:
   0.002 [1]     |
   2.207 [49993] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
   4.412 [1]     |
   6.617 [0]     |
   8.822 [0]     |
  11.027 [0]     |
  13.232 [0]     |
  15.437 [0]     |
  17.642 [1]     |
  19.848 [1]     |
  22.053 [3]     |

Response time distribution:
  10.00% in 0.0021 secs
  25.00% in 0.0021 secs
  50.00% in 0.0022 secs
  75.00% in 0.0022 secs
  90.00% in 0.0024 secs
  95.00% in 0.0024 secs
  99.00% in 0.0025 secs
  99.90% in 0.0031 secs
  99.99% in 2.7683 secs

Details (average, fastest, slowest):
  DNS+dialup:   0.0000 secs, 0.0000 secs, 0.0005 secs
  DNS-lookup:   0.0000 secs, 0.0000 secs, 0.0004 secs

Status code distribution:
  [200] 50000 responses

Axum

Details

oha -n 50000 -c 10 --disable-keepalive http://localhost:8000/
Summary:
  Success rate: 100.00%
  Total:        3.2546 secs
  Slowest:      0.0014 secs
  Fastest:      0.0003 secs
  Average:      0.0006 secs
  Requests/sec: 15362.6923

  Total data:   490.14 MiB
  Size/request: 10
  Size/sec:     150.60 MiB

Response time histogram:
  0.000 [1]     |
  0.000 [3]     |
  0.001 [813]   |■
  0.001 [24488] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.001 [19610] |■■■■■■■■■■■■■■■■■■■■■■■■■
  0.001 [4344]  |■■■■■
  0.001 [650]   |
  0.001 [74]    |
  0.001 [8]     |
  0.001 [4]     |
  0.001 [5]     |

Response time distribution:
  10.00% in 0.0006 secs
  25.00% in 0.0006 secs
  50.00% in 0.0006 secs
  75.00% in 0.0007 secs
  90.00% in 0.0007 secs
  95.00% in 0.0008 secs
  99.00% in 0.0009 secs
  99.90% in 0.0010 secs
  99.99% in 0.0013 secs

Details (average, fastest, slowest):
  DNS+dialup:   0.0000 secs, 0.0000 secs, 0.0004 secs
  DNS-lookup:   0.0000 secs, 0.0000 secs, 0.0003 secs

Status code distribution:
  [200] 50000 responses

Flamegraphs

For the curious, there are flamegraphs provided from my machine in the directories of the servers. For rust, it was collected by running the benchmark and using cargo-flamegraph, while for python, it was collected using py-spy.

What about with more uvicorn workers?

If I run

uvicorn app.main:app --log-level critical --host 0.0.0.0 --port 8000 --workers 16

both the memory usage and CPU usage increase (e.g, up to ~1200 MiB).

Then, the results look like

oha -n 50000 -c 10 --disable-keepalive http://localhost:8000/
Summary:
  Success rate: 100.00%
  Total:        4.7476 secs
  Slowest:      0.0030 secs
  Fastest:      0.0006 secs
  Average:      0.0009 secs
  Requests/sec: 10531.5539

  Total data:   490.14 MiB
  Size/request: 10
  Size/sec:     103.24 MiB

Response time histogram:
  0.001 [1]     |
  0.001 [11841] |■■■■■■■■■■■■■■
  0.001 [26594] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.001 [7971]  |■■■■■■■■■
  0.002 [2927]  |■■■
  0.002 [481]   |
  0.002 [121]   |
  0.002 [35]    |
  0.002 [12]    |
  0.003 [12]    |
  0.003 [5]     |

Response time distribution:
  10.00% in 0.0008 secs
  25.00% in 0.0008 secs
  50.00% in 0.0009 secs
  75.00% in 0.0010 secs
  90.00% in 0.0012 secs
  95.00% in 0.0013 secs
  99.00% in 0.0016 secs
  99.90% in 0.0021 secs
  99.99% in 0.0027 secs

Details (average, fastest, slowest):
  DNS+dialup:   0.0001 secs, 0.0000 secs, 0.0010 secs
  DNS-lookup:   0.0000 secs, 0.0000 secs, 0.0005 secs

Status code distribution:
  [200] 50000 responses

This is a significant improvement in both throughput and latency. Not quite a linear improvement with 16× more processes, and still slower than Axum.

What about coordinated omission?

WARNING: Unlike the other results, this was done on a machine with only 16GB of DDR4 RAM and a AMD Ryzen 3700X.

oha, the load generator I'm using, does support compensating for coordinated omission. But, if I do so, it really makes FastAPI look bad. So bad, that I'd highly suspect I'm doing something wrong, but haven't dug into it yet.

Here's what it looks like with -q 10000 and --latency-correction:

Metric FastAPI Axum
Throughput (rps) 317 9920
50% latency (ms) 75000 16.2
99% latency (ms) 151000 40.4

I think you'll agree that this looks crazy, and suggests there is something I should tweak about the setup. If you have ideas, please reach out!

FastAPI

❯ oha -n 50000 -c 10 --disable-keepalive --latency-correction -q 10000 http://localhost:8000/
Summary:
  Success rate: 100.00%
  Total:        157.5955 secs
  Slowest:      152.5937 secs
  Fastest:      0.0147 secs
  Average:      76.0228 secs
  Requests/sec: 317.2680

  Total data:   490.90 MiB
  Size/request: 10.05 KiB
  Size/sec:     3.11 MiB

Response time histogram:
    0.015 [1]    |
   15.273 [4820] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■
   30.531 [4859] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■
   45.788 [5246] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
   61.046 [5362] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
   76.304 [5037] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
   91.562 [4983] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  106.820 [5207] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  122.078 [4564] |■■■■■■■■■■■■■■■■■■■■■■■■■■■
  137.336 [5088] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  152.594 [4833] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■

Response time distribution:
  10% in 15.7830 secs
  25% in 38.8800 secs
  50% in 75.2023 secs
  75% in 113.5457 secs
  90% in 136.7149 secs
  95% in 145.2185 secs
  99% in 151.1093 secs

Details (average, fastest, slowest):
  DNS+dialup:   0.0001 secs, 0.0000 secs, 0.0011 secs
  DNS-lookup:   0.0000 secs, 0.0000 secs, 0.0003 secs

Status code distribution:
  [200] 50000 responses

Axum

❯ oha -n 50000 -c 10 --disable-keepalive --latency-correction -q 10000 http://localhost:8000/
Summary:
  Success rate: 100.00%
  Total:        5.0403 secs
  Slowest:      0.0415 secs
  Fastest:      0.0020 secs
  Average:      0.0199 secs
  Requests/sec: 9920.0133

  Total data:   490.90 MiB
  Size/request: 10.05 KiB
  Size/sec:     97.40 MiB

Response time histogram:
  0.002 [1]     |
  0.006 [2400]  |■■■■■
  0.010 [1570]  |■■■
  0.014 [9299]  |■■■■■■■■■■■■■■■■■■■■
  0.018 [14379] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.022 [4971]  |■■■■■■■■■■■
  0.026 [3931]  |■■■■■■■■
  0.030 [4365]  |■■■■■■■■■
  0.034 [2941]  |■■■■■■
  0.038 [1462]  |■■■
  0.042 [4681]  |■■■■■■■■■■

Response time distribution:
  10% in 0.0105 secs
  25% in 0.0137 secs
  50% in 0.0162 secs
  75% in 0.0265 secs
  90% in 0.0371 secs
  95% in 0.0394 secs
  99% in 0.0404 secs

Details (average, fastest, slowest):
  DNS+dialup:   0.0000 secs, 0.0000 secs, 0.0011 secs
  DNS-lookup:   0.0000 secs, 0.0000 secs, 0.0004 secs

Status code distribution:
  [200] 50000 responses

Running your own

I've provided a Justfile to help run things the way I did. Specifically, you can set up the database with just initdb (you'll need docker and postgres installed). You can run a server with just python or just rust. You can run the benchmark with just oha. Note that the number of workers, C, can be increased depending on how many threads your CPU has. If you do too many, oha will behave oddly. I did so using tmux, but multiple shells will also work. Monitor the system utilization of uvicorn or rust-axum however you please; I recommend btm with the filter cpu>0 and (uvicorn or rust-axum or docker or oha) on the Process Widget for a nice view.

Example setup

screenshot

Complaints?

Benchmarks are hard. If you think something is wrong or unfair, please let me know!

License

This repository is distributed under the terms of the Blue Oak license. Any contributions are licensed under the same license, and acknowledge via the Developer Certificate of Origin.

See LICENSE for details.