Change from token type to tokenizer type (#2)

* Update API Switch internal terminology from use of `Token` to `Tokenizer`. Change external API from `matches(:)` back to the original `tokens(:)`. Add a convenience method `components(:)` that returns just substrings rather than array of `Token`. * Update file name. * Update README to show `Token` type. * Include `import Mustard` in code snippet.
mathewsanders · Jan 2, 2017 · 6cbc2d8 · 6cbc2d8
1 parent 56bfad7
commit 6cbc2d8
Show file tree

Hide file tree

Showing 19 changed files with 614 additions and 567 deletions.
diff --git a/Documentation/Expressive matching.md b/Documentation/Expressive matching.md
@@ -1,83 +1,79 @@
 # Example: expressive matching
 
-The results returned by `matches(from:)`returns an array tuples with the signature `(tokenizer: TokenType, text: String, range: Range<String.Index>)`
+The results returned by `tokens(matchedWith:)`returns an array `Token` which in turn is a tuple with the signature `(tokenizer: TokenizerType, text: String, range: Range<String.Index>)`
 
 To make use of the `tokenizer` element, you need to either use type casting (using `as?`) or type checking (using `is`) for the `tokenizer` element to be useful.
 
-Maybe we want to filter out only tokens that are numbers:
+Maybe we want to filter out only tokens that were matched with a number tokenizer:
 
 ````Swift
 import Mustard
 
-let messy = "123Hello world&^45.67"
-let matches = messy.matches(from: .decimalDigits, .letters)
-// matches.count -> 5
+let tokens = "123Hello world&^45.67".tokens(matchedWith: .decimalDigits, .letters)
+// tokens.count -> 5
 
-let numbers = matches.filter({ $0.tokenizer is NumberToken })
-// numbers.count -> 0
+let numberTokens = tokens.filter({ $0.tokenizer is NumberTokenizer })
+// numberTokens.count -> 0
 
 ````
 
-This can lead to bugs in your logic-- in the example above `numberTokens` will be empty because the tokenizers used were the character sets `.decimalDigits`, and `.letters`, so the filter won't match any of the tokens.
+This can lead to bugs in your logic-- in the example above `numberTokens` will be empty because the tokenizers used were  `CharacterSet.decimalDigits`, and `CharacterSet.letters`, so the filter won't match any of the tokens.
 
 This may seem like an obvious error, but it's the type of unexpected bug that can slip in when we're using loosely typed results.
 
-Thankfully, Mustard can return a strongly typed set of matches if a single `TokenType` is used:
+Thankfully, Mustard can return a strongly typed set of matches if a single `TokenizerType` is used:
 
 ````Swift
 import Mustard
 
-let messy = "123Hello world&^45.67"
-
-// call `matches()` method on string to get matching tokens from string
-let numberMatches: [NumberToken.Match] = messy.matches()
-// numberMatches.count -> 2
+// call `tokens()` method on `String` to get matching tokens from the string
+let numberTokens: [NumberTokenizer.Token] = "123Hello world&^45.67".tokens()
+// numberTokens.count -> 2
 
 ````
 
-Used in this way, this isn't very useful, but it does allow for multiple `TokenType` to be bundled together as a single `TokenType` by implementing a TokenType using an `enum`.
+Used in this way, this isn't very useful, but it does allow for multiple `TokenizerType` to be bundled together as a single tokenizer by implementing with an `enum`.
 
-An enum token type can either manage it's own internal state, or potentially act as a lightweight wrapper to existing tokenizers.
-Here's an example `TokenType` that acts as a wrapper for word, number, and emoji tokenizers:
+An enum tokenizer can either manage it's own internal state, or potentially act as a lightweight wrapper to other existing tokenizers.
 
-````Swift
+Here's an example `TokenizerType` that acts as a wrapper for word, number, and emoji tokenizers:
 
-enum MixedToken: TokenType {
+````Swift
+enum MixedTokenizer: TokenizerType {
 
     case word
     case number
     case emoji
     case none // 'none' case not strictly needed, and
               // in this implementation will never be matched
-
     init() {
         self = .none
     }
 
-    static let wordToken = WordToken()
-    static let numberToken = NumberToken()
-    static let emojiToken = EmojiToken()
+    static let wordTokenizer = WordTokenizer()
+    static let numberTokenizer = NumberTokenizer()
+    static let emojiTokenizer = EmojiTokenizer()
 
-    func canAppend(next scalar: UnicodeScalar) -> Bool {
+    func tokenCanTake(_ scalar: UnicodeScalar) -> Bool {
         switch self {
-        case .word: return MixedToken.wordToken.canAppend(next: scalar)
-        case .number: return MixedToken.numberToken.canAppend(next: scalar)
-        case .emoji: return MixedToken.emojiToken.canAppend(next: scalar)
+        case .word: return MixedTokenizer.wordTokenizer.tokenCanTake(scalar)
+        case .number: return MixedTokenizer.numberTokenizer.tokenCanTake(scalar)
+        case .emoji: return MixedTokenizer.emojiTokenizer.tokenCanTake(scalar)
         case .none:
             return false
         }
     }
 
-    func token(startingWith scalar: UnicodeScalar) -> TokenType? {
+    func token(startingWith scalar: UnicodeScalar) -> TokenizerType? {
 
-        if let _ = MixedToken.wordToken.token(startingWith: scalar) {
-            return MixedToken.word
+        if let _ = MixedTokenizer.wordTokenizer.token(startingWith: scalar) {
+            return MixedTokenizer.word
         }
-        else if let _ = MixedToken.numberToken.token(startingWith: scalar) {
-            return MixedToken.number
+        else if let _ = MixedTokenizer.numberTokenizer.token(startingWith: scalar) {
+            return MixedTokenizer.number
         }
-        else if let _ = MixedToken.emojiToken.token(startingWith: scalar) {
-            return MixedToken.emoji
+        else if let _ = MixedTokenizer.emojiTokenizer.token(startingWith: scalar) {
+            return MixedTokenizer.emoji
         }
         else {
             return nil
@@ -90,25 +86,25 @@ Mustard defines a default typealias for `Token` that exposes the specific type i
 results tuple.
 
 ````Swift
-public extension TokenType {
-    typealias Match = (tokenizer: Self, text: String, range: Range<String.Index>)
+public extension TokenizerType {
+    typealias Token = (tokenizer: Self, text: String, range: Range<String.Index>)
 }
 ````
 
-Setting your results array to this type gives you the option to use the shorter `matches()` method,
+Setting your results array to this type gives you the option to use the shorter `tokens()` method,
 where Mustard uses the inferred type to perform tokenization.
 
-Since the matches array is strongly typed, you can be more expressive with the results, and the
+Since the tokens array is strongly typed, you can be more expressive with the results, and the
 complier can give you more hints to prevent you from making mistakes.
 
 ````Swift
 
-// use the `matches()` method to grab matching substrings using a single tokenizer
-let matches: [MixedToken.Match] = "123👩‍👩‍👦‍👦Hello world👶 again👶🏿 45.67".matches()
-// matches.count -> 8
+// use the `tokens()` method to grab matching substrings using a single tokenizer
+let tokens: [MixedTokenizer.Token] = "123👩‍👩‍👦‍👦Hello world👶 again👶🏿 45.67".tokens()
+// tokens.count -> 8
 
-matches.forEach({ match in
-    switch (match.tokenizer, match.text) {
+tokens.forEach({ token in
+    switch (token.tokenizer, token.text) {
     case (.word, let word): print("word:", word)
     case (.number, let number): print("number:", number)
     case (.emoji, let emoji): print("emoji:", emoji)

diff --git a/Documentation/Greedy tokens and tokenizer order.md b/Documentation/Greedy tokens and tokenizer order.md
@@ -1,43 +1,43 @@
 # Greedy tokens and tokenizer order
 
-Tokenizers are greedy. The order that tokenizers are passed into the `matches(from: TokenType...)` will effect how substrings are matched.
+Tokenizers are greedy. The order that tokenizers are passed into the `matches(from: TokenizerType...)` will effect how substrings are matched.
 
-Here's an example using the `CharacterSet.decimalDigits` tokenizer and the custom tokenizer `DateToken` that matches dates in the format `MM/dd/yy` ([see example](Tokens with internal state.md) for implementation).
+Here's an example using the `CharacterSet.decimalDigits` tokenizer and the custom tokenizer `DateTokenizer` that matches dates in the format `MM/dd/yy` ([see example](Tokens with internal state.md) for implementation).
 
 ````Swift
 import Mustard
 
 let numbers = "03/29/17 36"
-let matches = numbers.matches(from: CharacterSet.decimalDigits, DateToken.tokenizer)
-// matches.count -> 4
+let tokens = numbers.tokens(matchedWith: CharacterSet.decimalDigits, DateTokenizer.defaultTokenizer)
+// tokens.count -> 4
 //
-// matches[0].text -> "03"
-// matches[0].tokenizer -> CharacterSet.decimalDigits
+// tokens[0].text -> "03"
+// tokens[0].tokenizer -> CharacterSet.decimalDigits
 //
-// matches[1].text -> "29"
-// matches[1].tokenizer -> CharacterSet.decimalDigits
+// tokens[1].text -> "29"
+// tokens[1].tokenizer -> CharacterSet.decimalDigits
 //
-// matches[2].text -> "17"
-// matches[2].tokenizer -> CharacterSet.decimalDigits
+// tokens[2].text -> "17"
+// tokens[2].tokenizer -> CharacterSet.decimalDigits
 //
-// matches[3].text -> "36"
-// matches[3].tokenizer -> CharacterSet.decimalDigits
+// tokens[3].text -> "36"
+// tokens[3].tokenizer -> CharacterSet.decimalDigits
 ````
 
-To get expected behavior, the `matches` method should be called with more specific tokenizers placed before more general tokenizers:
+To get expected behavior, the `tokens` method should be called with more specific tokenizers placed before more general tokenizers:
 
 ````Swift
 import Mustard
 
 let numbers = "03/29/17 36"
-let matches = numbers.matches(from: DateToken.tokenizer, CharacterSet.decimalDigits)
-// matches.count -> 2
+let tokens = numbers.tokens(matchedWith: DateTokenizer.defaultTokenizer, CharacterSet.decimalDigits)
+// tokens.count -> 2
 //
-// matches[0].text -> "03/29/17"
-// matches[0].tokenizer -> DateToken()
+// tokens[0].text -> "03/29/17"
+// tokens[0].tokenizer -> DateTokenizer()
 //
-// matches[1].text -> "36"
-// matches[1].tokenizer -> CharacterSet.decimalDigits
+// tokens[1].text -> "36"
+// tokens[1].tokenizer -> CharacterSet.decimalDigits
 ````
 
 If the more specific tokenizer fails to match a token, the more general tokens still have a chance to perform matches:
@@ -46,18 +46,18 @@ If the more specific tokenizer fails to match a token, the more general tokens s
 import Mustard
 
 let numbers = "99/99/99 36"
-let matches = numbers.matches(from: DateToken.tokenizer, CharacterSet.decimalDigits)
-// matches.count -> 4
+let tokens = numbers.tokens(matchedWith: DateTokenizer.defaultTokenizer, CharacterSet.decimalDigits)
+// tokens.count -> 4
 //
-// matches[0].text -> "99"
-// matches[0].tokenizer -> CharacterSet.decimalDigits
+// tokens[0].text -> "99"
+// tokens[0].tokenizer -> CharacterSet.decimalDigits
 //
-// matches[1].text -> "99"
-// matches[1].tokenizer -> CharacterSet.decimalDigits
+// tokens[1].text -> "99"
+// tokens[1].tokenizer -> CharacterSet.decimalDigits
 //
-// matches[2].text -> "99"
-// matches[2].tokenizer -> CharacterSet.decimalDigits
+// tokens[2].text -> "99"
+// tokens[2].tokenizer -> CharacterSet.decimalDigits
 //
-// matches[3].text -> "36"
-// matches[3].tokenizer -> CharacterSet.decimalDigits
+// tokens[3].text -> "36"
+// tokens[3].tokenizer -> CharacterSet.decimalDigits
 ````
diff --git a/Documentation/Matching emoji.md b/Documentation/Matching emoji.md
@@ -6,21 +6,21 @@ As an example, the character '👶🏿' is comprised by two scalars: '👶', and
 The rainbow flag character '🏳️‍🌈' is again comprised by two adjacent scalars '🏳' and '🌈'.
 A final example, the character '👨‍👨‍👧‍👦' is actually 7 scalars: '👨' '👨' '👧' '👦' joined by three ZWJs (zero-with joiner).
 
-To create a TokenType that matches emoji we can instead check to see if a scalar falls within known range, or if it's a ZWJ.
+To create a `TokenizerType` that matches emoji we can instead check to see if a scalar falls within known range, or if it's a ZWJ.
 
 This isn't the most *accurate* emoji tokenizer because it would potentially matches an emoji scalar followed by 100 zero-width joiners, but for basic use it might be enough.
 
 ````Swift
-struct EmojiToken: TokenType {
+struct EmojiTokenizer: TokenizerType {
 
     // (e.g. can't start with a ZWJ)
-    func canStart(with scalar: UnicodeScalar) -> Bool {
-        return EmojiToken.isEmojiScalar(scalar)
+    func tokenCanStart(with scalar: UnicodeScalar) -> Bool {
+        return EmojiTokenizer.isEmojiScalar(scalar)
     }
 
     // either in the known range for a emoji, or a ZWJ
-    func canTake(_ scalar: UnicodeScalar) -> Bool {
-        return EmojiToken.isEmojiScalar(scalar) || EmojiToken.isJoiner(scalar)
+    func tokenCanTake(_ scalar: UnicodeScalar) -> Bool {
+        return EmojiTokenizer.isEmojiScalar(scalar) || EmojiTokenizer.isJoiner(scalar)
     }
 
     static func isJoiner(_ scalar: UnicodeScalar) -> Bool {

diff --git a/Documentation/TallyType protocol.md b/Documentation/TallyType protocol.md