-
-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance testing point-in-polygon #7
Comments
Hi there, I did some stress test for spatial so that we can have an idea on the performances. I used Gatling and the pip-service scenario. The service and injector are on different machines. SpecService:
Injector:
ScenarioWe use a set of regions, get a random point in the region and do a PIP request on the endpoint In this scenario, we have a total of 75,000 users arriving in 60 seconds. Each user makes a unique request. The goal is a 95th percentile below 750ms. Gatling will inject 1,250 req/s
ResultsI launched the scenario 3 times.
ConclusionWithout Linux cache, spatial can't handle this scenario, the number of user is growing until the end (815.217 req/s). With the CPU chart, we see the bottleneck : iowait. Since we can't control the Linux cache, I can say that 800 req/s is the first limit for spatial. Next tests will be without multi-core and with fewer users. |
Nice benchmarks 👍, I had a quick look at the query generation for PIP and there are definitely some 'quick wins' to reduce latency. Recently I added #65 which can probably mean we can delete a bunch of query logic for finding the default names. There's probably other things which can be improved too. If possible could you please keep your benchmarking scripts around so we can run a comparison once this feature lands? |
I did some similar benchmarking in the past and found that reducing the number of users greatly improved the performance, I think in a real-world scenario we're going to be having <5 'users' connected (ie. open HTTP streams). I'd be interested to see what difference it makes to reduce the user count, assuming that |
One of the really nice things about using SQLite is that it's so easy (and cheap!) to scale this compared to something which is memory-bound. So I'm more interested in throughput than latency, although we should still make it run as efficiently as possible 😄 If we can run several high-CPU instances (or threads) of this service it'll be capable of PIP-ing many thousands per-second and can theoretically scale linearly as more servers are added. One interesting thing to note is that the And! (and this is the interesting bit) this is also true of Docker, so you can run multiple containers/pods on the same physical machine using |
Okay so #67 should hopefully improve these numbers! |
Okay 👍 I wrote all the info I need in my comment if I need to redo the same benchmark 😄 For results the pdf will still be present, I should remove the online version when we release spatial or close this issue.
Yes, for me, what we should target is at least 500 req/s for a 95th percentile at 750ms without Linux cache and I think it is possible. 🌈 |
Any reason you are testing with the Linux cache (mmap mode) disabled? |
The Linux cache is not disable, I flush the cache before the stress test to simulate a cold start. |
Guess what ? |
A little suspense ... So, new benchmark with #67 with and without Linux cache. Same scenario as before. Results
With a cache flush, the 95th percentile is at 43,889ms without timeout which is better ! |
BOOM 💥 |
Dang, that's some great performance. I guess we need to get serious about integrating it into Pelias :) |
Yeah I'm really happy with that because I put a lot of faith in this architecture and it's nice to know it's bearing fruit. |
I just ran k6 for a comparison from another load-testing util on my dev server (16 threads @3.6Ghz) and it flew through it: This is actually not a great test since it used the same lat/lon for each request. k6 run --vus 20 --iterations 100000 test.js
iteration_duration.........: avg=5.59ms min=1.41ms med=4.4ms max=42.88ms p(90)=8.61ms p(95)=10.04ms
iterations.................: 100000 3565.959785/s $ cat test.js
import http from 'k6/http';
const url = 'http://localhost:3000/query/pip/_view/pelias/174.766843/-41.288788'
export default function() {
http.get(url);
} |
I'm using |
Basic benchmarks show that the point-in-polygon API takes between 0 & 1 millisecond to execute.
We don't fully understand what the performance is like:
This ticket is to figure out how to generate benchmarks which return more than simply vanity metrics.
It would be ideal if we can automate this process to measure performance over time, as new features are added.
The text was updated successfully, but these errors were encountered: