Skip to content

RobThree/ASCII-Benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ASCII-Benchmark

Inspired by https://lemire.me/blog/2020/07/21/avoid-character-by-character-processing-when-performance-matters/

Test strings in the files allCountries.txt.gz and cities500.txt.gz are sourced from Geonames.org where I have extracted the second column (name) of the corresponding files. The enwik8.gz file is sourced from The Large Text Compression Benchmark. Testfiles have been gzipped in order to save space. These files have a nice mix of ASCII / non-ASCII data.

Results

Intel Core i9-10900X:

Benching allCountries.txt.gz
        Lines           : 7,522,986
        Avg. length     : 14.58
        Max. length     : 151
        Non-Ascii lines : 23.52 %
Measuring methods... please be patient...
Regex           Avg: 1.1677s    Min: 1.1001s    Max: 1.2178s       6,442,421 strings/sec
Branchy1        Avg: 0.0552s    Min: 0.0497s    Max: 0.0581s     136,181,375 strings/sec
Branchy2        Avg: 0.0533s    Min: 0.0481s    Max: 0.0615s     141,014,802 strings/sec
Branchless      Avg: 0.0561s    Min: 0.0519s    Max: 0.0584s     134,095,915 strings/sec
Hybrid          Avg: 0.0518s    Min: 0.0483s    Max: 0.0553s     145,139,683 strings/sec

Benching cities500.txt.gz
        Lines           : 165,957
        Avg. length     : 10.14
        Max. length     : 65
        Non-Ascii lines : 20.12 %
Measuring methods... please be patient...
Regex           Avg: 0.0224s    Min: 0.0218s    Max: 0.0231s       7,404,511 strings/sec
Branchy1        Avg: 0.0011s    Min: 0.0008s    Max: 0.0013s     152,820,546 strings/sec
Branchy2        Avg: 0.0011s    Min: 0.0008s    Max: 0.0013s     152,532,605 strings/sec
Branchless      Avg: 0.0011s    Min: 0.0009s    Max: 0.0013s     148,741,642 strings/sec
Hybrid          Avg: 0.0012s    Min: 0.0008s    Max: 0.0015s     142,158,282 strings/sec

Benching enwik8.gz
        Lines           : 1,128,024
        Avg. length     : 87.32
        Max. length     : 4,173
        Non-Ascii lines : 6.35 %
Measuring methods... please be patient...
Regex           Avg: 0.2885s    Min: 0.2559s    Max: 0.3619s       3,910,404 strings/sec
Branchy1        Avg: 0.0163s    Min: 0.0158s    Max: 0.0173s      69,345,382 strings/sec
Branchy2        Avg: 0.0150s    Min: 0.0141s    Max: 0.0168s      75,191,574 strings/sec
Branchless      Avg: 0.0160s    Min: 0.0156s    Max: 0.0164s      70,479,739 strings/sec
Hybrid          Avg: 0.0141s    Min: 0.0134s    Max: 0.0151s      80,107,376 strings/sec

About

Inspired by "Avoid character-by-character processing when performance matters"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages