Skip to content

intvnut/bit_count

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 

Repository files navigation

Bit Counting Benchmark

License

Everything in this project is authored by me (Joe Zbiciak, joe.zbiciak@leftturnonly.info), and is licensed under the Creative Commons Attribution-ShareAlike 4.0 International license, aka. CC BY-SA 4.0.

Background

This is a really simple benchmark of different methods for counting 1 bits in a 32-bit value.

Description

The benchmark itself uses the IEEE CRC-32 to generate a sequence of random 32-bit values. The IEEE CRC-32 polynomial is an maximal sequence polynomial for an LFSR.

The benchmark first checks that the three computational methods (popcnt32_a, popcnt32_b, and popcnt32_c) all return the same values for the full 32-bit input range. This takes a few seconds on a modern computer.

Next, it initializes a lookup table for a fourth version, popcnt32_z, that merely looks up the value in the lookup table. A fifth version, popcnt32_e, uses the first 256 entries of this same table for byte-at-a-time lookup.

Finally, it measures the amount of time it takes to run through the entire 2^32 - 1 sequence of the IEEE CRC-32, counting the 1 bits in each state. In order to force the compiler to measure this, the code uses a volatile sum variable to sum up the 1 bits counted.

Because computing the CRC sequence also takes time, the code also measures that as the "Null" loop. You can see from the run time that the Null loop takes nearly as long as popcnt32_a, popcnt32_b, and popcnt32_c on many systems. Meanwhile, popcnt32_z takes a lot longer than the others.

Data

The following output comes from my M1 Max based MacBook Pro, compiled with clang -mtune=native -mcpu-native -DHAVE_BUILTIN_POPCOUNT -O3 -o bit_count bit_count.c with Apple Clang version 13.1.6 (clang-1316.0.21.2.3). (See below for details on HAVE_BUILTIN_POPCOUNT.)

$ ./bit_count
Testing implementations...
Errs: 0  OK: 4294967296
Initializing LUT implementation... Done.
  Null:           4225023 clocks  7FFFFFFF80000000
 Ver A:           7921125 clocks  1000000000
 Ver B:           7544140 clocks  1000000000
 Ver C:           7032806 clocks  1000000000
 Ver D:           4237567 clocks  1000000000
 Ver E:           5532954 clocks  1000000000
 Ver Z:          35113958 clocks  1000000000
$

And this data comes from my aging AMD Phenom™ II X4 965 machine that has plenty of DDR3 DRAM. My machine wasn't swapping during this test. It just has that much less memory bandwidth than the Apple M1 Max, apparently?

$ ./bit_count
Testing implementations...
Errs: 0  OK: 4294967296
Initializing LUT implementation... Done.
  Null:           8650000 clocks  7FFFFFFF80000000
 Ver A:          15610000 clocks  1000000000
 Ver B:          16450000 clocks  1000000000
 Ver C:          13080000 clocks  1000000000
 Ver E:          13240000 clocks  1000000000
 Ver Z:         304970000 clocks  1000000000

HAVE_BUILTIN_POPCOUNT

If you want to try the GCC extension __builtin_popcount, compile with -DHAVE_BUILTIN_POPCOUNT. To actually use your CPU's native instruction (if it has one), you will likely need additional flags, such as -march=native -mtune=native (GCC) or -march=native -mcpu=native (Clang).

When built with this macro defined, Ver D will appear in the results.

The following data is from my Phenom™ II X4 965 built this way:

$ ./bit_count 
Testing implementations...
Errs: 0  OK: 4294967296
Initializing LUT implementation... Done.
  Null:           8650000 clocks  7FFFFFFF80000000
 Ver A:          15610000 clocks  1000000000
 Ver B:          16450000 clocks  1000000000
 Ver C:          13080000 clocks  1000000000
 Ver D:          11330000 clocks  1000000000
 Ver E:          13240000 clocks  1000000000
 Ver Z:         304970000 clocks  1000000000
$

Copyright © 2023, Joe Zbiciak joe.zbiciak@leftturnonly.info
SPDX-License-Identifier: CC-BY-SA-4.0

About

Benchmark different approaches to counting bits

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages