Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Someone intrest this result? #55

Closed
qinxian opened this issue Jun 12, 2013 · 9 comments
Closed

Someone intrest this result? #55

qinxian opened this issue Jun 12, 2013 · 9 comments

Comments

@qinxian
Copy link

qinxian commented Jun 12, 2013

This is some result abount OnePublisherToOneProcessorRawBatchThroughputTest
Run 0, Disruptor=806,451,612 ops/sec
Run 1, Disruptor=821,018,062 ops/sec
Run 2, Disruptor=1,122,964,626 ops/sec
Run 3, Disruptor=1,164,144,353 ops/sec
Run 4, Disruptor=1,133,144,475 ops/sec
Run 5, Disruptor=1,186,239,620 ops/sec
Run 6, Disruptor=1,175,088,131 ops/sec
Run 7, Disruptor=1,143,510,577 ops/sec
Run 8, Disruptor=1,174,398,120 ops/sec
Run 9, Disruptor=1,153,402,537 ops/sec
Run 10, Disruptor=1,154,068,090 ops/sec
Run 11, Disruptor=1,175,088,131 ops/sec
Run 12, Disruptor=1,133,144,475 ops/sec
Run 13, Disruptor=1,049,868,766 ops/sec
Run 14, Disruptor=1,094,690,749 ops/sec
Run 15, Disruptor=1,164,144,353 ops/sec
Run 16, Disruptor=1,186,239,620 ops/sec
Run 17, Disruptor=1,219,512,195 ops/sec
Run 18, Disruptor=1,207,729,468 ops/sec
Run 19, Disruptor=1,196,888,090 ops/sec
In my memory, the best result I saw, is 1,500M o/s, just by some try
//mybusyspin
Run 0, Disruptor=1,583,531,274 ops/sec
Run 1, Disruptor=1,454,545,454 ops/sec
Run 2, Disruptor=1,803,426,510 ops/sec
Run 3, Disruptor=1,728,608,470 ops/sec
Run 4, Disruptor=1,777,777,777 ops/sec
Run 5, Disruptor=1,728,608,470 ops/sec
Run 6, Disruptor=1,908,396,946 ops/sec
Run 7, Disruptor=1,754,385,964 ops/sec
Run 8, Disruptor=1,937,984,496 ops/sec
Run 9, Disruptor=1,706,484,641 ops/sec
Run 10, Disruptor=1,801,801,801 ops/sec
Run 11, Disruptor=1,776,198,934 ops/sec
Run 12, Disruptor=1,855,287,569 ops/sec
Run 13, Disruptor=1,828,153,564 ops/sec
Run 14, Disruptor=1,471,670,345 ops/sec
Run 15, Disruptor=1,801,801,801 ops/sec
Run 16, Disruptor=1,752,848,378 ops/sec
Run 17, Disruptor=1,910,219,675 ops/sec
Run 18, Disruptor=1,828,153,564 ops/sec
Run 19, Disruptor=1,855,287,569 ops/sec

Hahha, this is my new modest-lock, applied for both end
Run 0, Concurrentor=1,644,736,842 ops/sec
Run 1, Concurrentor=1,640,689,089 ops/sec
Run 2, Concurrentor=1,968,503,937 ops/sec
Run 3, Concurrentor=1,968,503,937 ops/sec
Run 4, Concurrentor=1,968,503,937 ops/sec
Run 5, Concurrentor=1,968,503,937 ops/sec
Run 6, Concurrentor=1,998,001,998 ops/sec
Run 7, Concurrentor=1,968,503,937 ops/sec
Run 8, Concurrentor=1,968,503,937 ops/sec
Run 9, Concurrentor=1,968,503,937 ops/sec
Run 10, Concurrentor=2,000,000,000 ops/sec
Run 11, Concurrentor=1,968,503,937 ops/sec
Run 12, Concurrentor=1,968,503,937 ops/sec
Run 13, Concurrentor=1,968,503,937 ops/sec
Run 14, Concurrentor=1,968,503,937 ops/sec
Run 15, Concurrentor=2,000,000,000 ops/sec
Run 16, Concurrentor=2,000,000,000 ops/sec
Run 17, Concurrentor=2,000,000,000 ops/sec
Run 18, Concurrentor=1,968,503,937 ops/sec
Run 19, Concurrentor=1,968,503,937 ops/sec

so, does it have some xxx?

@mikeb01
Copy link
Contributor

mikeb01 commented Jun 12, 2013

Actually that test is a little bit of inside joke on my part, demonstrating how you can lie with a benchmark. All it is doing is testing how fast the sequencer can signal the consumer, but only on every 10th update. It doesn't do any actual useful work.

How does the modest-lock fair with the OnePublisherToOneProcessorUniCastThroughputTest? Also do you have a link to the code?

If all I wanted to do was improve that test I could just have one thread polling a sequence with another thread updating it in batches of 10.

Existing Disruptor:
Starting Disruptor tests
Run 0, Disruptor=1,998,001,998 ops/sec
Run 1, Disruptor=2,079,002,079 ops/sec
Run 2, Disruptor=2,157,497,303 ops/sec
Run 3, Disruptor=2,114,164,904 ops/sec
Run 4, Disruptor=2,152,852,529 ops/sec
Run 5, Disruptor=2,205,071,664 ops/sec
Run 6, Disruptor=3,577,817,531 ops/sec
Run 7, Disruptor=3,546,099,290 ops/sec
Run 8, Disruptor=3,610,108,303 ops/sec

Simple polling code:
Starting Disruptor tests
Run 0, Disruptor=6,191,950,464 ops/sec
Run 1, Disruptor=6,042,296,072 ops/sec
Run 2, Disruptor=6,369,426,751 ops/sec
Run 3, Disruptor=6,289,308,176 ops/sec
Run 4, Disruptor=6,389,776,357 ops/sec

@qinxian
Copy link
Author

qinxian commented Jun 13, 2013

OnePublisherToOneProcessorRawBatchThroughputTest
Seems both batch 10, a little improvement in my machine. strange! should be ~=10x
Run 0, Disruptor=871,839,581 ops/sec
Run 1, Disruptor=811,030,008 ops/sec
Run 2, Disruptor=1,231,527,093 ops/sec
Run 3, Disruptor=1,320,132,013 ops/sec
Run 4, Disruptor=1,320,132,013 ops/sec
Run 5, Disruptor=1,185,536,455 ops/sec
Run 6, Disruptor=1,143,510,577 ops/sec
Run 7, Disruptor=1,067,235,859 ops/sec
Run 8, Disruptor=1,243,008,079 ops/sec
Run 9, Disruptor=1,268,230,818 ops/sec
Run 10, Disruptor=1,391,788,448 ops/sec
Run 11, Disruptor=1,334,222,815 ops/sec
Run 12, Disruptor=1,320,132,013 ops/sec
Run 13, Disruptor=1,292,824,822 ops/sec
Run 14, Disruptor=1,255,492,780 ops/sec
Run 15, Disruptor=1,267,427,122 ops/sec
Run 16, Disruptor=1,219,512,195 ops/sec
Run 17, Disruptor=1,243,008,079 ops/sec
Run 18, Disruptor=1,164,144,353 ops/sec
Run 19, Disruptor=1,085,187,194 ops/sec

OK lets back to the 10-1 pattern.
Indeed the modest-lock very simple,just like this:

if((counter&1)==1) Thread.yield();
return counter-1;

This result after applied the modest-lock to this test, still use write10:read1 pattern
Run 0, Disruptor=1,257,071,024 ops/sec
Run 1, Disruptor=1,292,824,822 ops/sec
Run 2, Disruptor=1,579,778,830 ops/sec
Run 3, Disruptor=1,542,020,046 ops/sec
Run 4, Disruptor=1,506,024,096 ops/sec
Run 5, Disruptor=1,581,027,667 ops/sec
Run 6, Disruptor=1,523,229,246 ops/sec
Run 7, Disruptor=1,506,024,096 ops/sec
Run 8, Disruptor=1,471,670,345 ops/sec
Run 9, Disruptor=1,471,670,345 ops/sec
Run 10, Disruptor=1,506,024,096 ops/sec
Run 11, Disruptor=1,542,020,046 ops/sec
Run 12, Disruptor=1,542,020,046 ops/sec
Run 13, Disruptor=1,543,209,876 ops/sec
Run 14, Disruptor=1,506,024,096 ops/sec
Run 15, Disruptor=1,542,020,046 ops/sec
Run 16, Disruptor=1,506,024,096 ops/sec
Run 17, Disruptor=1,454,545,454 ops/sec
Run 18, Disruptor=1,543,209,876 ops/sec
Run 19, Disruptor=1,506,024,096 ops/sec

change SingleProduceSequencer to next reserve operation by Thread.yield instead of parkNanos
Run 0, Disruptor=1,257,071,024 ops/sec
Run 1, Disruptor=1,292,824,822 ops/sec
Run 2, Disruptor=1,579,778,830 ops/sec
Run 3, Disruptor=1,542,020,046 ops/sec
Run 4, Disruptor=1,506,024,096 ops/sec
Run 5, Disruptor=1,581,027,667 ops/sec
Run 6, Disruptor=1,523,229,246 ops/sec
Run 7, Disruptor=1,506,024,096 ops/sec
Run 8, Disruptor=1,471,670,345 ops/sec
Run 9, Disruptor=1,471,670,345 ops/sec
Run 10, Disruptor=1,506,024,096 ops/sec
Run 11, Disruptor=1,542,020,046 ops/sec
Run 12, Disruptor=1,542,020,046 ops/sec
Run 13, Disruptor=1,543,209,876 ops/sec
Run 14, Disruptor=1,506,024,096 ops/sec
Run 15, Disruptor=1,542,020,046 ops/sec
Run 16, Disruptor=1,506,024,096 ops/sec
Run 17, Disruptor=1,454,545,454 ops/sec
Run 18, Disruptor=1,543,209,876 ops/sec
Run 19, Disruptor=1,506,024,096 ops/sec

apply the modest-lock to the next reserve operation:
Run 0, Disruptor=1,490,312,965 ops/sec
Run 1, Disruptor=1,489,203,276 ops/sec
Run 2, Disruptor=1,662,510,390 ops/sec
Run 3, Disruptor=1,683,501,683 ops/sec
Run 4, Disruptor=1,662,510,390 ops/sec
Run 5, Disruptor=1,683,501,683 ops/sec
Run 6, Disruptor=1,683,501,683 ops/sec
Run 7, Disruptor=1,662,510,390 ops/sec
Run 8, Disruptor=1,662,510,390 ops/sec
Run 9, Disruptor=1,662,510,390 ops/sec
Run 10, Disruptor=1,661,129,568 ops/sec
Run 11, Disruptor=1,662,510,390 ops/sec
Run 12, Disruptor=1,662,510,390 ops/sec
Run 13, Disruptor=1,662,510,390 ops/sec
Run 14, Disruptor=1,683,501,683 ops/sec
Run 15, Disruptor=1,683,501,683 ops/sec
Run 16, Disruptor=1,662,510,390 ops/sec
Run 17, Disruptor=1,684,919,966 ops/sec
Run 18, Disruptor=1,683,501,683 ops/sec
Run 19, Disruptor=1,684,919,966 ops/sec

BTW: a strange feeling: Indeed, we just do guess for the OS scheduler program.
All we need is the right scheduler, but ...

@qinxian
Copy link
Author

qinxian commented Jun 13, 2013

I created a gist at here:https://gist.github.com/qinxian/5771879

@mikeb01
Copy link
Contributor

mikeb01 commented Jun 13, 2013

Are you running with HyperThreading enabled?

@qinxian
Copy link
Author

qinxian commented Jun 13, 2013

NO!
AMD x3:)

@qinxian
Copy link
Author

qinxian commented Jun 13, 2013

BTW, I tried the JDK8 @contended annotation at field.
seems the padding volatile long field faster than the long[] implementation.

@mikeb01
Copy link
Contributor

mikeb01 commented Jun 14, 2013

The different I see with ModestLock is not as marked as your results and the difference on the OnePublisherToOneProcessorUniCastThroughputTest is lower than the noise. I think these small optimisations will vary between hardware platforms. One of reasons we made the WaitStrategy pluggable is to allow these types of optimisations. If it speeds up your system end to end, then go for it, but don't base your decision on the OnePublisherToOneProcessorRawBatchThroughputTest, as it doesn't test anything useful, base it on your own macro-benchmarks.

I've also had a go with @contended, didn't make a massive difference, but it should be a little bit quicker as it would remove one indirection. Unfortunately it will be a while before Java 8 is the standard. I might do a Java 8 specific version if there is enough interest.

@qinxian
Copy link
Author

qinxian commented Jun 14, 2013

Expected when P10-C10 pattern improvement 2.X over P10-C1 pattern in your result.

Indeed I always use JDK8 with win8 on AMD HT.
At these cases, If both ends employ modest-lock, the version work at 2.X effect.
so some deduction on the spec-relative seems: Multiple publisher can obtain a profit from similar way.
From above messages, seems you test for Intel HT. so seems it's IHT vs. AHT.
but I still intrest the Intel HT modest-lock test results at now.

Of cause, the results just only an one-test-case. As to if useful or useless, depends.
but it does be some kind of reference, right?

BTW, like before "guess" words, some sadness on the kernel. One like me, only works at high level, no some willing to lower level, maybe cannot, maybe kernel cannot. A real world!

BTW, you do plan refactor the WaitStrategy to more generalized? I did some work.

@mikeb01
Copy link
Contributor

mikeb01 commented Jun 15, 2013

I'm going to close this as it not really an issue just a discussion, which can happen on the google groups page.

@mikeb01 mikeb01 closed this as completed Jun 15, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants