Skip to content

Boyer-Moore algorithm updates #330

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Dec 26, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ script:
- xcodebuild test -project ./Array2D/Tests/Tests.xcodeproj -scheme Tests
- xcodebuild test -project ./AVL\ Tree/Tests/Tests.xcodeproj -scheme Tests
- xcodebuild test -project ./Binary\ Search/Tests/Tests.xcodeproj -scheme Tests
- xcodebuild test -project ./Boyer-Moore/Tests/Tests.xcodeproj -scheme Tests
# - xcodebuild test -project ./Binary\ Search\ Tree/Solution\ 1/Tests/Tests.xcodeproj -scheme Tests
- xcodebuild test -project ./Bloom\ Filter/Tests/Tests.xcodeproj -scheme Tests
# - xcodebuild test -project ./Bounded\ Priority\ Queue/Tests/Tests.xcodeproj -scheme Tests
Expand Down
140 changes: 78 additions & 62 deletions Boyer-Moore/BoyerMoore.playground/Contents.swift
Original file line number Diff line number Diff line change
@@ -1,72 +1,88 @@
//: Playground - noun: a place where people can play

/*
Boyer-Moore string search

This code is based on the article "Faster String Searches" by Costas Menico
from Dr Dobb's magazine, July 1989.
http://www.drdobbs.com/database/faster-string-searches/184408171
*/
extension String {
func indexOf(pattern: String) -> String.Index? {
// Cache the length of the search pattern because we're going to
// use it a few times and it's expensive to calculate.
let patternLength = pattern.characters.count
assert(patternLength > 0)
assert(patternLength <= characters.count)

// Make the skip table. This table determines how far we skip ahead
// when a character from the pattern is found.
var skipTable = [Character: Int]()
for (i, c) in pattern.characters.enumerated() {
skipTable[c] = patternLength - i - 1
}

// This points at the last character in the pattern.
let p = pattern.index(before: pattern.endIndex)
let lastChar = pattern[p]

// The pattern is scanned right-to-left, so skip ahead in the string by
// the length of the pattern. (Minus 1 because startIndex already points
// at the first character in the source string.)
var i = index(startIndex, offsetBy: patternLength - 1)

// This is a helper function that steps backwards through both strings
// until we find a character that doesn’t match, or until we’ve reached
// the beginning of the pattern.
func backwards() -> String.Index? {
var q = p
var j = i
while q > pattern.startIndex {
j = index(before: j)
q = index(before: q)
if self[j] != pattern[q] { return nil }
}
return j
}

// The main loop. Keep going until the end of the string is reached.
while i < endIndex {
let c = self[i]

// Does the current character match the last character from the pattern?
if c == lastChar {

// There is a possible match. Do a brute-force search backwards.
if let k = backwards() { return k }

// If no match, we can only safely skip one character ahead.
i = index(after: i)
} else {
// The characters are not equal, so skip ahead. The amount to skip is
// determined by the skip table. If the character is not present in the
// pattern, we can skip ahead by the full pattern length. However, if
// the character *is* present in the pattern, there may be a match up
// ahead and we can't skip as far.
i = index(i, offsetBy: skipTable[c] ?? patternLength)
}
func index(of pattern: String, usingHorspoolImprovement: Bool = false) -> Index? {
// Cache the length of the search pattern because we're going to
// use it a few times and it's expensive to calculate.
let patternLength = pattern.characters.count
guard patternLength > 0, patternLength <= characters.count else { return nil }

// Make the skip table. This table determines how far we skip ahead
// when a character from the pattern is found.
var skipTable = [Character: Int]()
for (i, c) in pattern.characters.enumerated() {
skipTable[c] = patternLength - i - 1
}

// This points at the last character in the pattern.
let p = pattern.index(before: pattern.endIndex)
let lastChar = pattern[p]

// The pattern is scanned right-to-left, so skip ahead in the string by
// the length of the pattern. (Minus 1 because startIndex already points
// at the first character in the source string.)
var i = index(startIndex, offsetBy: patternLength - 1)

// This is a helper function that steps backwards through both strings
// until we find a character that doesn’t match, or until we’ve reached
// the beginning of the pattern.
func backwards() -> Index? {
var q = p
var j = i
while q > pattern.startIndex {
j = index(before: j)
q = index(before: q)
if self[j] != pattern[q] { return nil }
}
return j
}

// The main loop. Keep going until the end of the string is reached.
while i < endIndex {
let c = self[i]

// Does the current character match the last character from the pattern?
if c == lastChar {

// There is a possible match. Do a brute-force search backwards.
if let k = backwards() { return k }

if !usingHorspoolImprovement {
// If no match, we can only safely skip one character ahead.
i = index(after: i)
} else {
// Ensure to jump at least one character (this is needed because the first
// character is in the skipTable, and `skipTable[lastChar] = 0`)
let jumpOffset = max(skipTable[c] ?? patternLength, 1)
i = index(i, offsetBy: jumpOffset, limitedBy: endIndex) ?? endIndex
}
} else {
// The characters are not equal, so skip ahead. The amount to skip is
// determined by the skip table. If the character is not present in the
// pattern, we can skip ahead by the full pattern length. However, if
// the character *is* present in the pattern, there may be a match up
// ahead and we can't skip as far.
i = index(i, offsetBy: skipTable[c] ?? patternLength, limitedBy: endIndex) ?? endIndex
}
}
return nil
}
return nil
}
}

// A few simple tests

let s = "Hello, World"
s.indexOf(pattern: "World") // 7
let str = "Hello, World"
str.index(of: "World") // 7

let animals = "🐶🐔🐷🐮🐱"
animals.indexOf(pattern: "🐮") // 6
animals.index(of: "🐮") // 6

let lorem = "Lorem ipsum dolor sit amet"
lorem.index(of: "sit", usingHorspoolImprovement: true) // 18
40 changes: 23 additions & 17 deletions Boyer-Moore/BoyerMoore.swift
Original file line number Diff line number Diff line change
Expand Up @@ -6,33 +6,32 @@
http://www.drdobbs.com/database/faster-string-searches/184408171
*/
extension String {
func indexOf(pattern: String) -> String.Index? {
func index(of pattern: String, usingHorspoolImprovement: Bool = false) -> Index? {
// Cache the length of the search pattern because we're going to
// use it a few times and it's expensive to calculate.
let patternLength = pattern.characters.count
assert(patternLength > 0)
assert(patternLength <= self.characters.count)

guard patternLength > 0, patternLength <= characters.count else { return nil }

// Make the skip table. This table determines how far we skip ahead
// when a character from the pattern is found.
var skipTable = [Character: Int]()
for (i, c) in pattern.characters.enumerated() {
skipTable[c] = patternLength - i - 1
}

// This points at the last character in the pattern.
let p = pattern.index(before: pattern.endIndex)
let lastChar = pattern[p]

// The pattern is scanned right-to-left, so skip ahead in the string by
// the length of the pattern. (Minus 1 because startIndex already points
// at the first character in the source string.)
var i = self.index(startIndex, offsetBy: patternLength - 1)
var i = index(startIndex, offsetBy: patternLength - 1)

// This is a helper function that steps backwards through both strings
// until we find a character that doesn’t match, or until we’ve reached
// the beginning of the pattern.
func backwards() -> String.Index? {
func backwards() -> Index? {
var q = p
var j = i
while q > pattern.startIndex {
Expand All @@ -42,26 +41,33 @@ extension String {
}
return j
}

// The main loop. Keep going until the end of the string is reached.
while i < self.endIndex {
while i < endIndex {
let c = self[i]

// Does the current character match the last character from the pattern?
if c == lastChar {

// There is a possible match. Do a brute-force search backwards.
if let k = backwards() { return k }

// If no match, we can only safely skip one character ahead.
i = index(after: i)

if !usingHorspoolImprovement {
// If no match, we can only safely skip one character ahead.
i = index(after: i)
} else {
// Ensure to jump at least one character (this is needed because the first
// character is in the skipTable, and `skipTable[lastChar] = 0`)
let jumpOffset = max(skipTable[c] ?? patternLength, 1)
i = index(i, offsetBy: jumpOffset, limitedBy: endIndex) ?? endIndex
}
} else {
// The characters are not equal, so skip ahead. The amount to skip is
// determined by the skip table. If the character is not present in the
// pattern, we can skip ahead by the full pattern length. However, if
// the character *is* present in the pattern, there may be a match up
// ahead and we can't skip as far.
i = self.index(i, offsetBy: skipTable[c] ?? patternLength)
i = index(i, offsetBy: skipTable[c] ?? patternLength, limitedBy: endIndex) ?? endIndex
}
}
return nil
Expand Down
106 changes: 65 additions & 41 deletions Boyer-Moore/README.markdown
Original file line number Diff line number Diff line change
Expand Up @@ -32,12 +32,11 @@ Here's how you could write it in Swift:

```swift
extension String {
func indexOf(pattern: String) -> String.Index? {
func index(of pattern: String) -> Index? {
// Cache the length of the search pattern because we're going to
// use it a few times and it's expensive to calculate.
let patternLength = pattern.characters.count
assert(patternLength > 0)
assert(patternLength <= self.characters.count)
guard patternLength > 0, patternLength <= characters.count else { return nil }

// Make the skip table. This table determines how far we skip ahead
// when a character from the pattern is found.
Expand All @@ -53,12 +52,12 @@ extension String {
// The pattern is scanned right-to-left, so skip ahead in the string by
// the length of the pattern. (Minus 1 because startIndex already points
// at the first character in the source string.)
var i = self.index(startIndex, offsetBy: patternLength - 1)
var i = index(startIndex, offsetBy: patternLength - 1)

// This is a helper function that steps backwards through both strings
// until we find a character that doesn’t match, or until we’ve reached
// the beginning of the pattern.
func backwards() -> String.Index? {
func backwards() -> Index? {
var q = p
var j = i
while q > pattern.startIndex {
Expand All @@ -70,7 +69,7 @@ extension String {
}

// The main loop. Keep going until the end of the string is reached.
while i < self.endIndex {
while i < endIndex {
let c = self[i]

// Does the current character match the last character from the pattern?
Expand All @@ -87,7 +86,7 @@ extension String {
// pattern, we can skip ahead by the full pattern length. However, if
// the character *is* present in the pattern, there may be a match up
// ahead and we can't skip as far.
i = self.index(i, offsetBy: skipTable[c] ?? patternLength)
i = index(i, offsetBy: skipTable[c] ?? patternLength, limitedBy: endIndex) ?? endIndex
}
}
return nil
Expand Down Expand Up @@ -157,41 +156,66 @@ Here's an implementation of the Boyer-Moore-Horspool algorithm:

```swift
extension String {
func indexOf(pattern: String) -> String.Index? {
let patternLength = pattern.characters.count
assert(patternLength > 0)
assert(patternLength <= self.characters.count)

var skipTable = [Character: Int]()
for (i, c) in pattern.characters.enumerated() {
skipTable[c] = patternLength - i - 1
}
func index(of pattern: String) -> Index? {
// Cache the length of the search pattern because we're going to
// use it a few times and it's expensive to calculate.
let patternLength = pattern.characters.count
guard patternLength > 0, patternLength <= characters.count else { return nil }

let p = pattern.index(before: pattern.endIndex)
let lastChar = pattern[p]
var i = self.index(startIndex, offsetBy: patternLength - 1)

func backwards() -> String.Index? {
var q = p
var j = i
while q > pattern.startIndex {
j = index(before: j)
q = index(before: q)
if self[j] != pattern[q] { return nil }
}
return j
}
// Make the skip table. This table determines how far we skip ahead
// when a character from the pattern is found.
var skipTable = [Character: Int]()
for (i, c) in pattern.characters.enumerated() {
skipTable[c] = patternLength - i - 1
}

while i < self.endIndex {
let c = self[i]
if c == lastChar {
if let k = backwards() { return k }
i = index(after: i)
} else {
i = index(i, offsetBy: skipTable[c] ?? patternLength)
}
}
return nil
// This points at the last character in the pattern.
let p = pattern.index(before: pattern.endIndex)
let lastChar = pattern[p]

// The pattern is scanned right-to-left, so skip ahead in the string by
// the length of the pattern. (Minus 1 because startIndex already points
// at the first character in the source string.)
var i = index(startIndex, offsetBy: patternLength - 1)

// This is a helper function that steps backwards through both strings
// until we find a character that doesn’t match, or until we’ve reached
// the beginning of the pattern.
func backwards() -> Index? {
var q = p
var j = i
while q > pattern.startIndex {
j = index(before: j)
q = index(before: q)
if self[j] != pattern[q] { return nil }
}
return j
}

// The main loop. Keep going until the end of the string is reached.
while i < endIndex {
let c = self[i]

// Does the current character match the last character from the pattern?
if c == lastChar {

// There is a possible match. Do a brute-force search backwards.
if let k = backwards() { return k }

// Ensure to jump at least one character (this is needed because the first
// character is in the skipTable, and `skipTable[lastChar] = 0`)
let jumpOffset = max(skipTable[c] ?? patternLength, 1)
i = index(i, offsetBy: jumpOffset, limitedBy: endIndex) ?? endIndex
} else {
// The characters are not equal, so skip ahead. The amount to skip is
// determined by the skip table. If the character is not present in the
// pattern, we can skip ahead by the full pattern length. However, if
// the character *is* present in the pattern, there may be a match up
// ahead and we can't skip as far.
i = index(i, offsetBy: skipTable[c] ?? patternLength, limitedBy: endIndex) ?? endIndex
}
}
return nil
}
}
```
Expand All @@ -200,4 +224,4 @@ In practice, the Horspool version of the algorithm tends to perform a little bet

Credits: This code is based on the paper: [R. N. Horspool (1980). "Practical fast searching in strings". Software - Practice & Experience 10 (6): 501–506.](http://www.cin.br/~paguso/courses/if767/bib/Horspool_1980.pdf)

_Written for Swift Algorithm Club by Matthijs Hollemans, updated by Andreas Neusüß_
_Written for Swift Algorithm Club by Matthijs Hollemans, updated by Andreas Neusüß_, [Matías Mazzei](https://github.com/mmazzei).
Loading