Skip to content

Commit 0a8d000

Browse files
authored
Another blog pass (#477)
1 parent ac668cb commit 0a8d000

File tree

5 files changed

+42
-37
lines changed

5 files changed

+42
-37
lines changed

pgml-docs/docs/blog/scaling-postgresml-to-one-million-requests-per-second.md

Lines changed: 38 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -13,24 +13,22 @@ image_alt: PostgresML at 1 million requests per second
1313
November 7, 2022
1414
</p>
1515

16-
The question "Does it Scale?" has become somewhat of a meme in software engineering. There is a good reason for it though, because most businesses plan for success. If your app, online store, or SaaS takes off, you want to be sure that the system powering it can serve all your customers.
16+
The question "Does it Scale?" has become somewhat of a meme in software engineering. There is a good reason for it though, because most businesses plan for success. If your app, online store, or SaaS becomes popular, you want to be sure that the system powering it can serve all your new customers.
1717

18-
At PostgresML, we are very concerned with scale. Our engineering background took us through scaling OLTP and OLAP Postgres to 100 TB+, so we're certain that Postgres scales, but could we scale machine learning alongside it?
18+
At PostgresML, we are very concerned with scale. Our engineering background took us through scaling PostgreSQL to 100 TB+, so we're certain that it scales, but could we scale machine learning alongside it?
1919

20-
In this post, we'll discuss some challenges facing machine learning inference with PostgresML, and how we solved them to achieve an impressive **1 million XGBoost predictions per second** on commodity hardware.
20+
In this post, we'll discuss how we horizontally scaled PostgresML to achieve more than **1 million XGBoost predictions per second** on commodity hardware.
2121

2222
If you missed our previous post and are wondering why someone would combine machine learning and Postgres, take a look at our PostgresML vs. Python [benchmark](/blog/postgresml-is-8x-faster-than-python-http-microservices).
2323

2424

25-
## An Image Worth Four Thousand Words
25+
## Architecture Overview
2626

27-
Our thesis, and the reason why we chose Postgres as our host for machine learning, is that scaling machine learning inference is very similar to scaling read queries in a typical database cluster.
27+
If you're familiar with how one runs PostgreSQL at scale, you can skip straight to the [results](#results).
2828

29-
Inference speed varies based on the model complexity (e.g. `n_estimators` for XGBoost) and the size of the dataset (how many features the model uses), which is analogous to query complexity and table size in the database world. Scaling the latter is mostly a solved problem.
29+
Part of our thesis, and the reason why we chose Postgres as our host for machine learning, is that scaling machine learning inference is very similar to scaling read queries in a typical database cluster.
3030

31-
### System Architecture
32-
33-
If you're a Postgres enthusiast (or a database engineer), scaling Postgres may not be a secret to you, and you can jump straight to the [results](#results). For everyone else, here is a diagram showing the final state of our system:
31+
Inference speed varies based on the model complexity (e.g. `n_estimators` for XGBoost) and the size of the dataset (how many features the model uses), which is analogous to query complexity and table size in the database world and, as we'll demonstrate further on, scaling the latter is mostly a solved problem.
3432

3533
<center>
3634
![Scaling PostgresML](/images/illustrations/scaling-postgresml-3.svg) <br />
@@ -55,17 +53,17 @@ Our architecture has four components that may need to scale up or down based on
5553

5654
We intentionally don't discuss scaling the primary in this post, because sharding, which is the most effective way to do so, is a fascinating subject that deserves its own series of posts. Spoiler alert: we sharded Postgres without any problems.
5755

58-
#### Clients
56+
### Clients
5957

6058
Clients are regular Postgres connections coming from web apps, job queues, or pretty much anywhere that needs data. They can be long-living or ephemeral and they typically grow in number as the application scales.
6159

6260
Most modern deployments use containers which are added as load on the app increases, and removed as the load decreases. This is called dynamic horizontal scaling, and it's an effective way to adapt to changing traffic patterns experienced by most businesses.
6361

64-
#### Load Balancer
62+
### Load Balancer
6563

6664
The load balancer is a way to spread traffic across horizontally scalable components, by routing new connections to targets in a round robin (or random) fashion. It's typically a very large box (or a fast router), but even those need to be scaled if traffic suddenly increases. Since we're running our system on AWS, this is already taken care of, for a reasonably small fee, by using an Elastic Load Balancer.
6765

68-
#### PgCat
66+
### PgCat
6967

7068
<center>
7169
<img src="https://raw.githubusercontent.com/levkk/pgcat/main/pgcat3.png" alt="PgCat" height="300" width="auto" /> <br />
@@ -83,10 +81,22 @@ There are many poolers available presently, the most notable being PgBouncer, wh
8381
In this benchmark, we used its load balancing feature to evenly distribute XGBoost predictions across our Postgres replicas.
8482

8583

86-
#### Postgres Replicas
84+
### Postgres Replicas
85+
86+
Scaling Postgres reads is pretty straight forward. If more read queries are coming in, we add a replica to serve the increased load. If the load is decreasing, we remove a replica to save money. The data is replicated from the primary, so all replicas are identical, and all of them can serve any query, or in our case, an XGBoost prediction. PgCat can dynamically add and remove replicas from its config without disconnecting clients, we can add and remove replicas as needed, without downtime.
8787

88-
Scaling Postgres reads is a solved problem. If more read queries are coming in, add a replica to serve the increased load. If the load is decreasing, remove a replica to save money. The data is replicated from the primary, so all replicas are identical, and all of them can serve any query, or in our case, an XGBoost prediction.
88+
#### Parallelizing XGBoost
8989

90+
Scaling XGBoost predictions is a little bit more interesting. XGBoost cannot serve predictions concurrently because of internal data structure locks. This is common to many other machine learning algorithms as well, because making predictions can temporarily modify internal components of the model.
91+
92+
PostgresML bypasses that limitation because of how Postgres itself handles concurrency:
93+
94+
<center>
95+
<img height="300" width="auto" style="height: 300px" src="/images/illustrations/postgres-multiprocess-2.png" alt="Inside a replica" /> <br />
96+
_PostgresML concurrency_
97+
</center>
98+
99+
PostgreSQL uses the fork/multiprocessing architecture to serve multiple clients concurrenctly: each new client connection becomes an independent OS process. During connection startup, PostgresML loads all models inside the process' memory space. This means that each connection has its own copy of the XGBoost model and PostgresML ends up serving multiple XGBoost predictions at the same time without any lock contention.
90100

91101
## Results
92102

@@ -115,9 +125,9 @@ The most impressive result is serving close to a million predictions with an ave
115125

116126
Batching is a proven method to optimize performance. If you need to get several data points, batch the requests into one query, and it will run faster than making individual requests.
117127

118-
We should precede this result by stating that PostgresML does not yet have a batch prediction API as such. Our `pgml.predict()` function can predict multiple points, but we haven't implemented a query pattern to pass multiple Postgres rows to that function at the same time. Once we do, based on our tests, we should see a quadratic increase in performance.
128+
We should precede this result by stating that PostgresML does not yet have a batch prediction API as such. Our `pgml.predict()` function can predict multiple points, but we haven't implemented a query pattern to pass multiple rows to that function at the same time. Once we do, based on our tests, we should see a substantial increase in batch prediction performance.
119129

120-
Regardless of that limitation, we still managed to get better results by batching queries together since Postgres needed to do less query parsing and data fetching, and we saved on network round trip time as well.
130+
Regardless of that limitation, we still managed to get better results by batching queries together since Postgres needed to do less query parsing and searching, and we saved on network round trip time as well.
121131

122132
<center>
123133
<iframe width="600" height="371" seamless frameborder="0" scrolling="no" src="https://docs.google.com/spreadsheets/d/e/2PACX-1vRm4aEylX8xMNmO-HFFxr67gbZDQ8rh_vss1HvX0tWAUD_zxkwYYNhiBObT1LVe8m6ELZ0seOzmH0ZL/pubchart?oid=1506211879&amp;format=interactive"></iframe>
@@ -127,7 +137,7 @@ Regardless of that limitation, we still managed to get better results by batchin
127137
<iframe width="600" height="371" seamless frameborder="0" scrolling="no" src="https://docs.google.com/spreadsheets/d/e/2PACX-1vRm4aEylX8xMNmO-HFFxr67gbZDQ8rh_vss1HvX0tWAUD_zxkwYYNhiBObT1LVe8m6ELZ0seOzmH0ZL/pubchart?oid=1488435965&amp;format=interactive"></iframe>
128138
</center>
129139

130-
If batching did not work at all, we would see a linear increase in latency and a linear decrease in throughput. That did not happen; instead, we got a good increase in throughput and a sublinear increase in latency. A modest success, but a success nonetheless.
140+
If batching did not work at all, we would see a linear increase in latency and a linear decrease in throughput. That did not happen; instead, we got a 1.5x improvement by batching 5 predictions together, and a 1.2x improvement by batching 20. A modest success, but a success nonetheless.
131141

132142
### Graceful Degradation and Queuing
133143

@@ -141,7 +151,7 @@ If batching did not work at all, we would see a linear increase in latency and a
141151

142152
All systems, at some point in their lifetime, will come under more load than they were designed for; what happens then is an important feature (or bug) of their design. Horizontal scaling is never immediate: it takes a bit of time to spin up additional hardware to handle the load. It can take a second, or a minute, depending on availability, but in both cases, existing resources need to serve traffic the best way they can.
143153

144-
We were hoping to test PostgresML to its breaking point, but we couldn't quite get there. As load (number of clients) increased beyond provisioned capacity, the only thing we saw was a gradual increase in latency. Throughput remained roughly the same. This gradual latency increase was caused by simple queuing: the replicas couldn't serve requests concurrently, so the requests had to patiently wait in the poolers.
154+
We were hoping to test PostgresML to its breaking point, but we couldn't quite get there. As the load (number of clients) increased beyond provisioned capacity, the only thing we saw was a gradual increase in latency. Throughput remained roughly the same. This gradual latency increase was caused by simple queuing: the replicas couldn't serve requests concurrently, so the requests had to patiently wait in the poolers.
145155

146156
<center>
147157
![Queuing](/images/illustrations/queueing.svg) <br />
@@ -154,25 +164,25 @@ Queueing overall is not desirable, but it's a feature, not a bug. While autoscal
154164

155165
As the demand on PostgresML increases, the system gracefully handles the load. If the number of replicas stays the same, latency slowly increases, all the while remaining well below acceptable ranges. Throughput holds as well, as increasing number of clients evenly split available resources.
156166

157-
If we increase the number of replicas, latency decreases and throughput increases and eventually stabilizies as the number of clients increases in parallel. We get the best result with 5 replicas, but this number is variable and can be changed as needs for latency compete with cost.
167+
If we increase the number of replicas, latency decreases and throughput increases, as the number of clients increases in parallel. We get the best result with 5 replicas, but this number is variable and can be changed as needs for latency compete with cost.
158168

159169

160170
## What's Next
161171

162-
Horizontal scaling and high availability are fascinating topics in software engineering. After doing this benchmark, we had no more doubts about our chosen architecture. Needing to serve 1 million predictions per second is rare, but having the ability to do that, and more if desired, is an important aspect for any new system.
172+
Horizontal scaling and high availability are fascinating topics in software engineering. Needing to serve 1 million predictions per second is rare, but having the ability to do that, and more if desired, is an important aspect for any new system.
163173

164174
The next challenge for us is to scale writes horizontally. In the database world, this means sharding the database into multiple separate machines using a hashing function, and automatically routing both reads and writes to the right shards. There are many possible solutions on the market for this already, e.g. Citus and Foreign Data Wrappers, but none are as horizontally scalable as we like, although we will incorporate them into our architecture until we build the one we really want.
165175

166176
For that purpose, we're building our own open source [Postgres proxy](https://github.com/levkk/pgcat/) which we discussed earlier in the article. As we progress further in our journey, we'll be adding more features and performance improvements.
167177

168-
By combining PgCat with PostgresML, we are aiming to build the next generation of machine learning infrastructure that can power anything from two-person startups, like us, to unicorns and massive enterprises, without the data ever leaving our favorite database.
178+
By combining PgCat with PostgresML, we are aiming to build the next generation of machine learning infrastructure that can power anything from tiny startups to unicorns and massive enterprises, without the data ever leaving our favorite database.
169179

170180

171181
## Methodology
172182

173183
### ML
174184

175-
This time, we used an XGBoost model with 100 trees
185+
This time, we used an XGBoost model with 100 trees:
176186

177187
```postgresql
178188
SELECT * FROM pgml.train(
@@ -186,7 +196,7 @@ SELECT * FROM pgml.train(
186196
);
187197
```
188198

189-
and fetched our predictions the usual way
199+
and fetched our predictions the usual way:
190200

191201
```postgresql
192202
SELECT pgml.predict(
@@ -208,7 +218,7 @@ SELECT pgml.predict(
208218
FROM flights_mat_3 LIMIT :limit;
209219
```
210220

211-
where `:limit` is the batch size of 1, 5, and 20, depending on the benchmark.
221+
where `:limit` is the batch size of 1, 5, and 20.
212222

213223
#### Model
214224

@@ -217,7 +227,7 @@ The model is roughly the same as the one we used in our previous [post](/blog/po
217227
### Hardware
218228

219229
#### Client
220-
The client was a `c5n.4xlarge` box on EC2. We chose the `c5n` class to have the 100 GBit NIC, since we wanted it to saturate our network as much as possible. Thousands of clients were simulated using [`pgbench`](https://www.postgresql.org/docs/current/pgbench.html) with `-c n` where `n` is the number of clients.
230+
The client was a `c5n.4xlarge` box on EC2. We chose the `c5n` class to have the 100 GBit NIC, since we wanted it to saturate our network as much as possible. Thousands of clients were simulated using [`pgbench`](https://www.postgresql.org/docs/current/pgbench.html).
221231

222232
#### PgCat Pooler
223233
PgCat, written in asynchronous Rust, was running on `c5.xlarge` machines (4 vCPUs, 8GB RAM) with 4 Tokio workers. We used between 1 and 35 machines, and scaled them in increments of 5-20 at a time.
@@ -232,19 +242,10 @@ Postgres replicas were running on `c5.9xlarge` machines with 36 vCPUs and 72 GB
232242

233243
Raw latency data is available [here](https://static.postgresml.org/benchmarks/reads-latency.csv) and raw throughput data is available [here](https://static.postgresml.org/benchmarks/reads-throughput.csv).
234244

235-
## Feedback
236-
237-
Many thanks and ❤️ to all those who are supporting this endeavor. We’d love to hear feedback from the broader ML and Engineering community about applications and other real world scenarios to help prioritize our work. You can show your support by starring us on our [Github](https://github.com/postgresml/postgresml/).
238-
239-
240-
## We're Hiring!
241-
242-
[PostgresML](https://github.com/postgresml/postgresml/) and [PgCat](https://github.com/levkk/pgcat/) are free and open source, and to support their development, and many more things we're building, we started a company. We're only a few months old, and we have raised enough funding to say, for the first time ever: we're hiring!
243-
244-
We're looking for software engineers interested in machine learning, solving big problems, databases, and anything in between. Don't hesitate to reach out to <a href="mailto:team@postgresml.org">team@postgresml.org</a> or in [Discord](https://discord.gg/DmyJP3qJ7U).
245-
246245
## Call to Early Adopters
247246

248-
If your organization can benefit from simplified and fast machine learning, get in touch! We can help deploy PostgresML internally, and collaborate on new and existing features. Join our [Discord](https://discord.gg/DmyJP3qJ7U) or [email](mailto:team@postgresml.org) us!
247+
[PostgresML](https://github.com/postgresml/postgresml/) and [PgCat](https://github.com/levkk/pgcat/) are free and open source. If your organization can benefit from simplified and fast machine learning, get in touch! We can help deploy PostgresML internally, and collaborate on new and existing features. Join our [Discord](https://discord.gg/DmyJP3qJ7U) or [email](mailto:team@postgresml.org) us!
248+
249+
Many thanks and ❤️ to all those who are supporting this endeavor. We’d love to hear feedback from the broader ML and Engineering community about applications and other real world scenarios to help prioritize our work. You can show your support by starring us on our [Github](https://github.com/postgresml/postgresml/).
249250

250251

22.8 KB
Loading
166 KB
Loading
149 KB
Loading

0 commit comments

Comments
 (0)