moved comments out of the code

1 parent de75a87 commit 5084f05b14ce366c13dbd1f39008eab9379c7a84 @jabagawee jabagawee committed Oct 2, 2012
Showing with 16 additions and 7 deletions.
  1. +16 −7 algorithms/searching/
23 algorithms/searching/
@@ -8,6 +8,22 @@
Uses a bad-character shift of the rightmost character of the window to
compute shifts.
+ The trick to this algorithm is `bmbc`, a lookup table with a default
+ value equal to the length of the pattern to be searched, so that
+ the algorithm can skip `len(pattern)` indices through the string
+ for efficiency's sake. For example, if we're searching through the
+ string "cotton milled paper" for the pattern "grumble", we look at
+ the last letter "r" (BMH goes backwards through a string) and notices
+ that it is not equal to "e". Thus, we can afford to jump our search
+ index back a whole seven characters.
+ However, not all the entries in `bmbc` are equal to `len(pattern)`.
+ If we searched the string "adventure time" for "grumble", we'd find
+ the "e" to match but mismatch the "m" and "l" in the string and
+ pattern, respectively. In this case, we can only jump back six
+ characters safely, which is why `bmbc` contains values that are not
+ simply `len(pattern)`.
Pre: a string > substring.
Post: returns a list of indices where the substring was found.
@@ -30,14 +46,7 @@ def search(text, pattern):
offsets = []
if pattern_length > text_length:
return offsets
- # bmbc is a lookup-tuple of "skip values"
- # if we're looking at an index of text, and we
- # can't find part of pattern there, we can safely
- # skip back up to pattern_length characters
bmbc = [pattern_length] * 256
- # if we do find part of pattern there, but it's a
- # failed search at that index, we jump back
- # (pattern_length - index - 1) characters
for index, char in enumerate(pattern[:-1]):
bmbc[ord(char)] = pattern_length - index - 1
bmbc = tuple(bmbc)

