Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix slow execution time when using CharacterSet or CharacterSetComple… #11991

Merged
merged 2 commits into from Nov 29, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
11 changes: 11 additions & 0 deletions src/Collections-Strings-Tests/StringTest.class.st
Expand Up @@ -981,6 +981,17 @@ StringTest >> testFindTokens [
self assert: ('test this' findTokens: 't') equals: s
]

{ #category : #'tests - tokenizing' }
StringTest >> testFindTokensCharacterSetComplement [

| tokens delims |
string := 'one, two, three, and four, one, five, one '.
delims := (CharacterSet newFrom: (Character alphabet, Character alphabet asUppercase)) complement.
tokens := string findTokens: delims.
self assert: tokens size equals: 8.
self assert: tokens third equals: 'three'
]

{ #category : #tests }
StringTest >> testFindTokensEscapedBy [

Expand Down
14 changes: 10 additions & 4 deletions src/Collections-Strings/String.class.st
Expand Up @@ -1371,11 +1371,12 @@ String >> findCloseParenthesisFor: startIndex [
{ #category : #'finding/searching' }
String >> findDelimiters: delimiters startingAt: start [
"Answer the index of the character within the receiver, starting at start, that matches one of the delimiters. If the receiver does not contain any of the delimiters, answer size + 1."

"delimiters is any collection of characters and is often passed as a String. This is fine when the number of possible delimiters is small even though String>>includes: is an O(n) operation because n is small. When using a large number of possible delimiters, using a CharacterSet with a lookup efficiency of O(1) will produce much better performance."

start to: self size do: [:i |
delimiters do: [:delim |
delim = (self at: i)
ifTrue: [^ i]]].
(delimiters includes: (self at: i))
ifTrue: [^ i]].
^ self size + 1
]

Expand Down Expand Up @@ -1561,6 +1562,8 @@ String >> findSubstringViaPrimitive: key in: body startingAt: start matchTable:
{ #category : #'finding/searching' }
String >> findTokens: delimiters [
"Answer the collection of tokens that result from parsing self. Return strings between the delimiters. Any character in the Collection delimiters marks a border. Several delimiters in a row are considered as just one separation. Also, allow delimiters to be a single character."

"delimiters can be any collection of characters and is often passed as a String. This is fine when the number of possible delimiters is small even though String>>includes: is an O(n) operation because n is small. When using a large number of possible delimiters, using a CharacterSet with a lookup efficiency of O(1) will produce much better performance."

| tokens keyStart keyStop separators |

Expand Down Expand Up @@ -2342,9 +2345,12 @@ String >> skipAnySubstring: delimiters startingAt: start [
String >> skipDelimiters: delimiters startingAt: start [

"Answer the index of the first character within the receiver, starting at start, that does NOT match any element of delimiters (a collection of characters). If the end of the receiver is reached, answer size + 1."

"delimiters is any collection of characters and is often passed as a String. This is fine when the number of possible delimiters is small even though String>>includes: is an O(n) operation because n is small. When using a large number of possible delimiters, using a CharacterSet with a lookup efficiency of O(1) will produce much better performance."


start to: self size do: [ :i |
(delimiters anySatisfy: [ :delim | delim = (self at: i) ])
(delimiters includes: (self at: i))
ifFalse: [ ^ i ] ].
^ self size + 1
]
Expand Down