Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace bit shift with __builtin_ctzll in HyperLogLog #13218

Open
wants to merge 5 commits into
base: unstable
Choose a base branch
from

Conversation

panzhongxian
Copy link

@panzhongxian panzhongxian commented Apr 17, 2024

Replace bit shift with __builtin_ctzll in HyperLogLog

Builtin function __builtin_ctzll is more effective than bit shift even though "in the average case there are high probabilities to find a 1 after a few iterations" mentioned in the source file comment.

I wrote a program to test whether it's more effective. Let me try to explain the test cases and the result.

Here is the source code.

There are 4 define cases in the program:

  • RANDOM: just generate random uint64_t. This is a base time cost when the next two cases is run.
  • BITSHIFT: counting the trailing zeros of the random numbers with bit shift method.
  • BUILTIN: counting the trailing zeros of the random numbers with builtin __builtin_ctzll
  • CHECK: call two functions and compare their results; print out the distribution of tailing zeros length.

More explainations:

  • ret storing the sum of trailing zeros length, is use to void skipping the process when -O2 flag is used.
  • It is less posible to get a longer trailing zeros. To make the CHECK case can cover more long trailing zeros numbers, I left-shift the random number: num = (num << (n % 50)) | ((uint64_t)1 << 51);

Now let me show the result:

1. Run first 3 cases and compare the time

The result is as following:

> gcc -DRANDOM -O2 comparison.c && ./a.out
time consume: 10.820000 seconds
ret: 0xfa55d526137dcde3

> gcc -DBITSHIFT -O2 comparison.c && ./a.out
time consume: 14.440000 seconds
ret: 0x2fb03566

> gcc -DBUILTIN -O2 t.c && ./a.out
time consume: 10.720000 seconds
ret: 0x2fb03566

After removing the random number generating costs, we got this(much faster):

random generation Bit Shift __builtin_ctzll
include 14.44 s 10.72 s
exclude 3.62 s -0.1s (almost same with only random number generation)

Meanwhile the ret of two cases is the same on. This means the correction of the new method.

2. Run check case

As mentioned before, I left shifted the number. The result of two different counting method for each random number is same. And the distribution of trailing zeros length is as following:

> gcc -DCHECK -O2 comparison.c && ./a.out
time consume: 19.620000 seconds
 0: 0
 1: 3999773
 2: 6001116
 3: 6998319
 4: 7499012
 5: 7752334
 6: 7874867
 7: 7940717
 8: 7967613
 9: 7979342
10: 7995361
11: 7994381
12: 8001097
13: 7998487
14: 7998411
15: 8001045
16: 7998618
17: 7996643
18: 8000381
19: 8002905
20: 7994363
21: 8001226
22: 8004708
23: 8000645
24: 7996479
25: 8002017
26: 7996515
27: 8003087
28: 7999442
29: 7995457
30: 8002225
31: 8000386
32: 8000367
33: 7997993
34: 8001343
35: 7998145
36: 8006298
37: 7999528
38: 8000649
39: 7999591
40: 7998841
41: 7998956
42: 7997533
43: 7997693
44: 7998085
45: 8004006
46: 8000330
47: 8001280
48: 7998936
49: 8001060
50: 8000476
51: 8001918

3. Conclusion

The builtin function __builtin_ctzll is correct in our case and much more effective than raw bit shift.

A replacement will bring a significant help.

@panzhongxian panzhongxian changed the title Replace bit-shift hllPatLen with De Bruijn method Replace bit-shift hllPatLen with De Bruijn method in HyperLogLog Apr 17, 2024
@sundb
Copy link
Collaborator

sundb commented Apr 17, 2024

@panzhongxian thanks, what about just using __builtin_ctzll?

@panzhongxian panzhongxian changed the title Replace bit-shift hllPatLen with De Bruijn method in HyperLogLog Replace bit shift with __builtin_ctzll in HyperLogLog Apr 17, 2024
@panzhongxian
Copy link
Author

Hi @sundb . I tried the __builtin_ctzll and updated my test file. It has almost same time cost with De Bruijn method. And they are both faster than the raw bit shift.

Can you merge it? Thanks.

@sundb
Copy link
Collaborator

sundb commented Apr 17, 2024

@panzhongxian thanks, please wait for another eye.

src/hyperloglog.c Outdated Show resolved Hide resolved
@sundb
Copy link
Collaborator

sundb commented Apr 17, 2024

@panzhongxian please also update the top comment which doesn't mention __builtin_ctzll.

src/hyperloglog.c Outdated Show resolved Hide resolved
@panzhongxian
Copy link
Author

panzhongxian commented Apr 17, 2024

@panzhongxian please also update the top comment which doesn't mention __builtin_ctzll.

@sundb Updated.

@sundb
Copy link
Collaborator

sundb commented Apr 17, 2024

outdated?

     * This may sound like inefficient, but actually in the average case
     * there are high probabilities to find a 1 after a few iterations. */

@panzhongxian
Copy link
Author

outdated?

     * This may sound like inefficient, but actually in the average case
     * there are high probabilities to find a 1 after a few iterations. */

@sundb Yes. Removed now.

@sundb
Copy link
Collaborator

sundb commented Apr 17, 2024

@panzhongxian thanks, It's ok now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Review
Development

Successfully merging this pull request may close these issues.

None yet

2 participants