🌭 Mustard is a Swift library for tokenizing strings when splitting by whitespace doesn't cut it.
Switch branches/tags
Nothing to show
Clone or download
mathewsanders Convert to Swift 4 (#9)
* Update syntax and build settings for Swift 4

* Fix broken links in README

* Update Swift version requirement

Mustard is using compactMap(_:)
Latest commit 19dc275 May 15, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
Documentation fix link (#7) Jan 14, 2017
Mustard.xcodeproj Convert to Swift 4 (#9) May 14, 2018
Playgrounds Performance optimizations for single tokenizer (#6) Jan 7, 2017
Sources Convert to Swift 4 (#9) May 14, 2018
Tests Convert to Swift 4 (#9) May 14, 2018
.gitignore Initial commit Dec 30, 2016
LICENSE Initial commit Dec 30, 2016
README.md Convert to Swift 4 (#9) May 14, 2018
package.swift Add package.swift Jan 4, 2017

README.md

Mustard 🌭

GitHub license Carthage compatible Swift Package Manager compatible

Mustard is a Swift library for tokenizing strings when splitting by whitespace doesn't cut it.

Quick start using character sets

Foundation includes the String method components(separatedBy:) that allows us to get substrings divided up by certain characters:

let sentence = "hello 2017 year"
let words = sentence.components(separatedBy: .whitespaces)
// words.count -> 3
// words = ["hello", "2017", "year"]

Mustard provides a similar feature, but with the opposite approach, where instead of matching by separators you can match by one or more character sets, which is useful if separators simply don't exist:

import Mustard

let sentence = "hello2017year"
let words = sentence.components(matchedWith: .letters, .decimalDigits)
// words.count -> 3
// words = ["hello", "2017", "year"]

If you want more than just the substrings, you can use the tokens(matchedWith: CharacterSet...) method which will return an array of TokenType.

As a minimum, TokenType requires properties for text (the substring matched), and range (the range of the substring in the original string). When using CharacterSets as a tokenizer, the more specific type CharacterSetToken is returned, which includes the property set which contains the instance of CharacterSet that was used to create the match.

import Mustard

let tokens = "123Hello world&^45.67".tokens(matchedWith: .decimalDigits, .letters)
// tokens: [CharacterSet.Token]
// tokens.count -> 5 (characters '&', '^', and '.' are ignored)
//
// second token..
// token[1].text -> "Hello"
// token[1].range -> Range<String.Index>(3..<8)
// token[1].set -> CharacterSet.letters
//
// last token..
// tokens[4].text -> "67"
// tokens[4].range -> Range<String.Index>(19..<21)
// tokens[4].set -> CharacterSet.decimalDigits

Advanced matching with custom tokenizers

Mustard can do more than match from character sets. You can create your own tokenizers with more sophisticated matching behavior by implementing the TokenizerType and TokenType protocols.

Here's an example of using DateTokenizer (see example for implementation) that finds substrings that match a MM/dd/yy format.

DateTokenizer returns tokens with the type DateToken. Along with the substring text and range, DateToken includes a Date object corresponding to the date in the substring:

import Mustard

let text = "Serial: #YF 1942-b 12/01/17 (Scanned) 12/03/17 (Arrived) ref: 99/99/99"

let tokens = text.tokens(matchedWith: DateTokenizer())
// tokens: [DateTokenizer.Token]
// tokens.count -> 2
// ('99/99/99' is *not* matched by `DateTokenizer` because it's not a valid date)
//
// first date
// tokens[0].text -> "12/01/17"
// tokens[0].date -> Date(2017-12-01 05:00:00 +0000)
//
// last date
// tokens[1].text -> "12/03/17"
// tokens[1].date -> Date(2017-12-03 05:00:00 +0000)

Documentation & Examples

Roadmap

  • Include detailed examples and documentation
  • Ability to skip/ignore characters within match
  • Include more advanced pattern matching for matching tokens
  • Make project logo 🌭
  • Performance testing / benchmarking against Scanner
  • Include interface for working with Character tokenizers

Requirements

  • Swift 4.1

Author

Made with ❤️ by @permakittens

Contributing

Feedback, or contributions for bug fixing or improvements are welcome. Feel free to submit a pull request or open an issue.

License

MIT