-
Basic implementation of different substring search algorithms:
Algorithm Explanation #Cycles per iteration Time taken for 1M iteartions (seconds) STL Find 166K cycles 6.25 s Brute Force For every position in the text consider it as a starting point and check for match (2 loops) 164K cycles 6.17s Brute Force Optimized Use a single loop and backtrack text and pattern when there is a mismatch 151K cycles 5.68s KMP Algo - Uses a Discrete Finite State Machine (DFA) to create a prefix table for searching (longest prefix which is also a suffix).
- Best for long text and pattern with short radix (e.g. DNA sequence)156K cycles 5.86s Booyer Moore - Scan pattern characters from right to left and skip as many as M characters whenever there is a mismatch.
- Best for longer patterns with a bigger radix.212K cycles 7.96s Rabin Karp (Monte-Carlo Approach) - Use a rolling hash to compare the pattern hash for each position in the text.
- Monte-Carlo approach tolerates hash false-positives (do not manually check substring with text)378K cycles 14.19 s -
Note:
- Micro-benchmarks were measured on an Ubuntu 16.04 VM running on Intel(R) Xeon(R) CPU X5650 @ 2.67GHz with GCC 5.4.0 with C++11 and OPT flags(-O3 -Os, -march=native)
- The above data was obtained with 1M randomized test strings with a text size of 100K characters and a pattern size of 100.
string
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
parent directory.. | ||||