Optimize State merge #126

indrekj · 2019-07-05T17:53:56Z

NOTE: Merge PR 136 and 127 before this. This PR has more complex changes and needs a more in depth review.

I was inspecting why Presence.list() was quite slow in our production servers. It took couple of milliseconds when traffic was low but over few hundred milliseconds when traffic was at peak. We have about 40K presence connections between 5 instances. So far it seems that processing heartbeats is the most expensive part which block those list calls.

Within my investigation I have discovered few places that can be optimized. I've tried to ensure I didn't break anything as well as I could, but the codebase is complex and I cannot be sure that I didn't miss some edge case. All the tests however pass.

All feedback is welcome and if you have recommendations what else to check / optimize then please let me know.

Optimizations

observe_removes

observe_removes was one of the slowest parts of the State merge
function. Each time observe_removes was called, we did a full ets table
scan on the values table and copied all elements except from local
replica and then filtered them one by one. This was especially slow when
there were multiple nodes each with many tracked processes. E.g. given 4
replicas each with 25K records, we had to scan all 100K records and then
copied 75K from ets for additional processing.

We had to go through that each time if there was a non-empty delta, even
if we didn't need to remove any records.

I created a new ets table to map tags to value keys. This allows us to
find out which records we first want to query before even using ets.
Best case scenario when the delta only includes adds we don't have to do
any ets queries and the whole observe_removes is basically skipped.

With removes we now only fetch what's needed. Also because we have tags
table we now don't have to do full table scans any more.

compact

compact was another slow part. Most slowness came from
Enum.sort(MapSet.to_list(cloud))) part. I don't really see why we have
to sort this. Instead I'm now passing the cloud without modifications.
I changed the do_compact function to expect the cloud to be unordered. I
don't see why it had to be sorted.

Benchmarks

phoenix.pub_sub.bench

mix phoenix.pub_sub.bench --size 100000 --delta-size 10000

>> Merging delta with 10000 joins and leaves into 200000 element set...
   = 132.18ms    0.13s (OLD)
   = 49.9ms      0.05s (NEW)

Everything else stayed more or less the same.

my custom benchmark

I wrote a benchmark that spawned 4 VMs and set up a tracker on each of
them with pool size of 1. I then added 10K records to each tracker
divided between 10 topics and after 5 seconds, I updated 2.5K records on each
tracker. These are the merge calls:

Before:

[node1] >> Connected nodes: [:"node2@10.200.0.207", :"node3@10.200.0.207", :"node4@10.200.0.207"]
Starting pubsub with pool size of 1
[node1] >> Starting tracker with pool size of 1
[node1] >> MERGE                 16.44ms
[node1] >> Tracked 10000 clients
[node1] >> MERGE                 48.97ms
[node1] >> MERGE                 38.71ms
[node1] >> MERGE                 54.84ms
[node1] >> MERGE                 41.26ms
[node1] >> MERGE                 41.48ms
[node1] >> Updated 2.5e3 clients
[node1] >> MERGE                 62.81ms
[node1] >> MERGE                 68.38ms
[node1] >> MERGE                 77.4ms
[node1] >> MERGE                 37.67ms
[node1] >> MERGE                 21.12ms
[node1] >> EXIT

After:

[node1] >> Connected nodes: [:"node4@10.200.0.207", :"node2@10.200.0.207", :"node3@10.200.0.207"]
Starting pubsub with pool size of 1
[node1] >> Starting tracker with pool size of 1
[node1] >> MERGE                 18.82ms
[node1] >> Tracked 10000 clients
[node1] >> MERGE                 19.07ms
[node1] >> MERGE                 28.12ms
[node1] >> MERGE                 21.37ms
[node1] >> MERGE                 31.98ms
[node1] >> MERGE                 22.54ms
[node1] >> Updated 2.5e3 clients 
[node1] >> MERGE                 36.29ms
[node1] >> MERGE                 46.59ms
[node1] >> MERGE                 43.54ms
[node1] >> MERGE                 3.81ms
[node1] >> MERGE                 2.02ms
[node1] >> EXIT

State/new expects a second argument which is the shard name.

* Extract was used wrongly. Same state object was used in all three arguments. First argument should be the local and other two should be remote. * State.replica_up/2 needs to be called, otherwise extract does not return a proper remote map. * Added couple of assert statements to ensure merging actually works and everything is set up correctly.

* observe_removes observe_removes was one of the slowest parts of the State merge function. Each time observe_removes was called, we did a full ets table scan on the values table and copied all elements except from local replica and then filtered them one by one. This was especially slow when there were multiple nodes each with many tracked processes. E.g. given 4 replicas each with 25K records, we had to scan all 100K records and then copied 75K from ets for additional processing. We had to go through that each time if there was a non-empty delta, even if we didn't need to remove any records. I created a new ets table to map tags to value keys. This allows us to find out which records we first want to query before even using ets. Best case scenario when the delta only includes adds we don't have to do any ets queries and the whole observe_removes is basically skipped. With removes we now only fetch what's needed. Also because we have tags table we now don't have to do full table scans any more. * compact compact was another slow part. Most slowness came from `Enum.sort(MapSet.to_list(cloud)))` part. I don't really see why we have to sort this. Instead I'm now passing the cloud without modifications. I changed the do_compact function to expect the cloud to be unordered. I don't see why it had to be sorted. = Benchmarks == phoenix.pub_sub.bench `mix phoenix.pub_sub.bench --size 100000 --delta-size 10000` ``` >> Merging delta with 10000 joins and leaves into 200000 element set... = 132.18ms 0.13s (OLD) = 49.9ms 0.05s (NEW) ``` Everything else stayed more or less the same. == my custom benchmark I wrote a benchmark that spawned 4 VMs and set up a tracker on each of them with pool size of 1. I then added 10K records to each tracker diveded by 10 topics and after 5 seconds, I updated 2.5K records on each tracker. These are the merge calls: Before: ``` [node1] >> Connected nodes: [:"node2@10.200.0.207", :"node3@10.200.0.207", :"node4@10.200.0.207"] Starting pubsub with pool size of 1 [node1] >> Starting tracker with pool size of 1 [node1] >> MERGE 16.44ms [node1] >> Tracked 10000 clients [node1] >> MERGE 48.97ms [node1] >> MERGE 38.71ms [node1] >> MERGE 54.84ms [node1] >> MERGE 41.26ms [node1] >> MERGE 41.48ms [node1] >> Updated 2.5e3 clients [node1] >> MERGE 62.81ms [node1] >> MERGE 68.38ms [node1] >> MERGE 77.4ms [node1] >> MERGE 37.67ms [node1] >> MERGE 21.12ms [node1] >> EXIT ``` After: ``` [node1] >> Connected nodes: [:"node4@10.200.0.207", :"node2@10.200.0.207", :"node3@10.200.0.207"] Starting pubsub with pool size of 1 [node1] >> Starting tracker with pool size of 1 [node1] >> MERGE 18.82ms [node1] >> Tracked 10000 clients 304.18ms [node1] >> MERGE 19.07ms [node1] >> MERGE 28.12ms [node1] >> MERGE 21.37ms [node1] >> MERGE 31.98ms [node1] >> MERGE 22.54ms [node1] >> Updated 2.5e3 clients 98.42ms [node1] >> MERGE 36.29ms [node1] >> MERGE 46.59ms [node1] >> MERGE 43.54ms [node1] >> MERGE 3.81ms [node1] >> MERGE 2.02ms [node1] >> EXIT ```

Using Map.update has few inefficiencies: both initial argument and update function are created, which is unnecessary and it also involves anonymous update function call. Related: https://medium.com/learn-elixir/optimizing-work-with-nested-maps-in-elixir-3131da27a13e I got about a 25% performance increase locally for the given reduce. In the big picture this is a very minor improvement though.

indrekj · 2020-01-07T15:50:58Z

Closing this atm. Will remove the compaction part as I'm still not sure if it is needed or not. Will add observe_removes part to our own managed branch and do some testing in beta and production.

indrekj added 2 commits July 5, 2019 14:59

Fix phoenix pubsub bench task

bc8e585

State/new expects a second argument which is the shard name.

indrekj force-pushed the merge-perf-for-pr branch 3 times, most recently from 1b2a616 to 3fc1468 Compare July 6, 2019 20:41

indrekj added 2 commits September 8, 2019 18:36

indrekj force-pushed the merge-perf-for-pr branch from 3fc1468 to f11c404 Compare September 8, 2019 15:37

indrekj closed this Jan 7, 2020

indrekj deleted the merge-perf-for-pr branch January 7, 2020 15:51

indrekj restored the merge-perf-for-pr branch January 7, 2020 15:51

indrekj deleted the merge-perf-for-pr branch January 15, 2020 09:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize State merge #126

Optimize State merge #126

indrekj commented Jul 5, 2019 •

edited

Loading

indrekj commented Jan 7, 2020

Optimize State merge #126

Optimize State merge #126

Conversation

indrekj commented Jul 5, 2019 • edited Loading

Optimizations

observe_removes

compact

Benchmarks

phoenix.pub_sub.bench

my custom benchmark

indrekj commented Jan 7, 2020

indrekj commented Jul 5, 2019 •

edited

Loading