Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

62w subscriptions cost about 85min #629

Closed
blankbro opened this issue Sep 14, 2021 · 11 comments
Closed

62w subscriptions cost about 85min #629

blankbro opened this issue Sep 14, 2021 · 11 comments
Assignees

Comments

@blankbro
Copy link

blankbro commented Sep 14, 2021

Expected behavior

We hope load all the data in 2 minutes.

Actual behavior

62w subscriptions cost about 85min.

Steps to reproduce

  1. Start moquetto
  2. Start the client and subscribe to 62w topic
  3. Restart moquetto

Minimal yet complete reproducer code (or URL to code) or complete log file

image

image

image

image

Moquette MQTT version

0.15

JVM version (e.g. java -version)

openjdk version "1.8.0_292"
OpenJDK Runtime Environment (build 1.8.0_292-b10)
OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode)

OS version (e.g. uname -a)

Linux MyServerName 3.10.0-1062.18.1.el7.x86_64 #1 SMP Tue Mar 17 23:49:17 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

@blankbro blankbro changed the title 60w subscriptions cost about 30min 62w subscriptions cost about 85min Sep 14, 2021
@hylkevds
Copy link
Collaborator

85 minutes for 600k subscriptions seems very slow.

Could you give us some statistics on those subscriptions? How many clients, how many topics, how deep are those topics?

@hylkevds
Copy link
Collaborator

Did a quick profile, and I suspect you have a very flat tree.
Currently CNode is very inefficient when adding items to such a flat tree:

boolean anyChildrenMatch(Token token) {
for (INode iNode : children) {
final CNode child = iNode.mainNode();
if (child.equalsToken(token)) {
return true;
}
}
return false;
}

For every check, it loops over the children. If there are many thousands, that takes time. This can easily be fixed by making children a Map<Token, INode>. However, for each add, the CNode is copied, and copying that map means iterating over all items, which takes just as much time.
We'd need a map implementation that can be quickly cloned...

@blankbro
Copy link
Author

blankbro commented Sep 15, 2021

85 minutes for 600k subscriptions seems very slow.

Could you give us some statistics on those subscriptions? How many clients, how many topics, how deep are those topics?


621206 subscriptions
56475 clients
56448 topics

deep count
1 1
3 59
4 56387
5 1

wecom-temp-85cacb5d1a4dc7c63765eb3714753e09

wecom-temp-e93bb8ffe5181641e0bf8d32d9f76d28

wecom-temp-fe36206f4fca10473297f5f2ca8520cd

wecom-temp-77bde4b9f66a7d71101466f798ae78c6

We notice that CTrie is thread-safe. It took 15 minutes to load 620k subscriptions with 8 threads, but CPU usage is 100%. We look forward to your suggestions.

@blankbro
Copy link
Author

Did a quick profile, and I suspect you have a very flat tree.
Currently CNode is very inefficient when adding items to such a flat tree:

boolean anyChildrenMatch(Token token) {
for (INode iNode : children) {
final CNode child = iNode.mainNode();
if (child.equalsToken(token)) {
return true;
}
}
return false;
}

For every check, it loops over the children. If there are many thousands, that takes time. This can easily be fixed by making children a Map<Token, INode>. However, for each add, the CNode is copied, and copying that map means iterating over all items, which takes just as much time.
We'd need a map implementation that can be quickly cloned...


Let's try. We'll feed back here if we make progress.

@hylkevds
Copy link
Collaborator

Let's try. We'll feed back here if we make progress.

If you are experimenting with it, you could also try making children a sorted list based on mainNode.token, and using Collections.binarySearch() instead of the iteration to find the right INode based on a token.

@hylkevds
Copy link
Collaborator

I had a quick go at it in #630
Massively improves performance, though it will still slow down when there are many children.

@blankbro
Copy link
Author

blankbro commented Sep 15, 2021

I had a quick go at it in #630
Massively improves performance, though it will still slow down when there are many children.

Description

data structure for CNode.children
algorithm for CNode.anyChildrenMatch

List<INode> + for each

subscription count thread count handle time
52k 1 1m
221k 1 6m42s
180k 8 1m,it's not thread-safe
376k 8 4m58s,it's not thread-safe

ConcurrentHashMap<Token, INode> + containsKey

subscription count thread count handle time
87k 1 1m
190k 1 10m
208k 8 1m
523k 8 10m
621k 8 15m44s

List<INode> + Collections.binarySearch

subscription count thread count handle time
87k 1 1m
221k 1 5m28s
503k 1 37m46s
540k 1 44m10s
621k 8 44s,it's not thread-safe

List<INode> and Collections.binarySearch with 8 threads cost only 44s, but it's not thread-safe.


We have a suggestion for #630.

image

image

@hylkevds
Copy link
Collaborator

Yeah, good catch.
I'm surprised the Collections.binarySearch isn't faster in your single-Threaded tests... that looks odd.

@hylkevds
Copy link
Collaborator

I've also just noticed there are massive speed differences based on the order in which I offer the subscriptions. The slowest and fastest I found are:

        List<Subscription> results = new ArrayList<>();
        for (int a = 0; a < 10; a++) {
            for (int d = 0; d < 10; d++) {
                for (int c = 0; c < 100; c++) {
                    for (int b = 0; b < 100; b++) {
                        count++;
                        results.add(clientSubOnTopic("Client-" + a, "mainTopic-" + b + "/subTopic-" + c + "/subSubTopic" + d));
                    }
                }
            }
        }

Added 100000 subscriptions in 1579 ms (63/ms)
Added 200000 subscriptions in 4840 ms (41/ms)
Added 300000 subscriptions in 9778 ms (31/ms)
Added 400000 subscriptions in 17231 ms (23/ms)
Added 500000 subscriptions in 45564 ms (11/ms)
Added 600000 subscriptions in 79372 ms (8/ms)
Added 700000 subscriptions in 130032 ms (5/ms)
Added 800000 subscriptions in 189498 ms (4/ms)
Added 900000 subscriptions in 245719 ms (4/ms)
Added 1000000 subscriptions in 300763 ms (3/ms)
Added 1000000 subscriptions in 300763 ms (3325/s)

Versus:

        List<Subscription> results = new ArrayList<>();
        for (int a = 0; a < 10; a++) {
            for (int b = 0; b < 100; b++) {
                for (int c = 0; c < 100; c++) {
                    for (int d = 0; d < 10; d++) {
                        count++;
                        results.add(clientSubOnTopic("Client-" + a, "mainTopic-" + b + "/subTopic-" + c + "/subSubTopic" + d));
                    }
                }
            }
        }

Added 100000 subscriptions in 394 ms (254/ms)
Added 200000 subscriptions in 675 ms (296/ms)
Added 300000 subscriptions in 1015 ms (296/ms)
Added 400000 subscriptions in 1420 ms (282/ms)
Added 500000 subscriptions in 1820 ms (275/ms)
Added 600000 subscriptions in 2412 ms (249/ms)
Added 700000 subscriptions in 2899 ms (241/ms)
Added 800000 subscriptions in 3386 ms (236/ms)
Added 900000 subscriptions in 3906 ms (230/ms)
Added 1000000 subscriptions in 4538 ms (220/ms)
Added 1000000 subscriptions in 4538 ms (220361/s)

Only the order of the subscriptions is different, not the actual subscriptions. And the order is not completely inverted either. In both cases the client-id loop (a) is the outer loop.

In moquette the subscriptions are coming from a Set, so the order is undefined.

blankbro pushed a commit to blankbro/moquette that referenced this issue Sep 24, 2021
- Changed CNode.children List<INode> to Map<Token, INode>
- Changed CNode.add to return boolean
- Optimize execution logic for CNode.anyChildrenMatch, CNode.childOf
- Optimize execution logic for CTire.recursiveMatch
@andsel andsel self-assigned this Oct 15, 2021
@andsel
Copy link
Collaborator

andsel commented Oct 15, 2021

Hi @zexin321 I saw your PR #633 I would like to have some profiling tests that we can share to have a common base to discuss on.
Please could add an integration test suite that contains the test you have done to proof the efficiency of the fix?

@blankbro
Copy link
Author

blankbro commented Aug 15, 2022

Hi @andsel I found the root cause of this problem( When there are a lot of children and subscriptions, CNode.copy will be slow) , you can to run amzexin/moquette fix629_bug-reproduce CTrieTest.testAdd620kSubscribe
See the effect after optimization(620k subscribes cost about 2s), you can to run amzexin/moquette fix629_CNode-Performance CTrieTest.testAdd620kSubscribe

blankbro pushed a commit to blankbro/moquette that referenced this issue Aug 18, 2022
- Changed CNode.children List<INode> to Map<Token, INode>
- Changed CNode.add to return boolean
- Optimize execution logic for CNode.anyChildrenMatch, CNode.childOf
- Optimize execution logic for CTire.recursiveMatch
blankbro pushed a commit to blankbro/moquette that referenced this issue Aug 18, 2022
blankbro pushed a commit to blankbro/moquette that referenced this issue Aug 18, 2022
blankbro pushed a commit to blankbro/moquette that referenced this issue Aug 18, 2022
blankbro pushed a commit to blankbro/moquette that referenced this issue Aug 18, 2022
blankbro pushed a commit to blankbro/moquette that referenced this issue Aug 18, 2022
blankbro pushed a commit to blankbro/moquette that referenced this issue Aug 18, 2022
blankbro pushed a commit to blankbro/moquette that referenced this issue Aug 18, 2022
@andsel andsel closed this as completed in 62cb2b3 Jun 20, 2023
blankbro pushed a commit to blankbro/moquette that referenced this issue Jun 21, 2023
blankbro added a commit to blankbro/moquette that referenced this issue Jun 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants