A minor release, which introduces the following changes:
- Added
positional
-methods to interfaceProcessor
. - Updated dependency "JavaUtil" to version 2.4.0.
- Updated Kotlin to version 1.3.50.
A minor release, which introduces the following changes:
- Updated dependency "JavaUtil" to version 2.1.1.
- Updated Kotlin to version 1.3.21.
A minor release, which introduces the following changes:
Tokenizer
s do now handle empty tokens properly.- Add predefined
Processor
for removing leading and trailing whitespace from tokens.
A feature release, which introduces the following changes:
- Added interface
TextParser
as well as the implementationsAbstractTextParser
andGradualTextParser
. - Added types of tokens, such as
MutableToken
,ValueToken
andTokenSequence
, that are useful when parsing texts. - Added class
Dictionary
that allows to translate parts of texts using key-value pairs. - Added class
DictionaryTokenizer
that allows to split texts into tokens by using aDictionary
. - Added interface
Processor
and classProcessorChain
. - Added a helper method to create case-insensitive metrics to the interface
TextMetric
. - Added class
TextMetric.Comparator
. Tokenizer
s andTextMetric
s can now be applied toCharSequence
s.- Replaced getter and setter methods with properties to be in accordance with the Kotlin paradigm
- Updated dependency "JavaUtil" to version 2.0.2.
- Updated Kotlin to version 1.3.11.
A major release, which introduces the following changes:
- Migrated the project to use the Kotlin programming language instead of Java.
- Converted the inner interface
Tokenizer.Token
into a separate interfaceToken
. - Converted the inner class
NGramTokenizer.NGram
into a separate classNGram
. - Changed the return type of the method
Tokenizer#tokenize
fromSet
toCollection
.
A feature release, which introduces the following changes:
- Added the metrics
OptimalStringAlignmentDistance
,OptimalStringAlignmentDissimilarity
andOptimalStringAlignmentSimilarity
. - Added the metrics
DamerauLevenshteinDistance
,DamerauLevenshteinDissimilarity
andDamerauLevenshteinSimilarity
. - Added the tokenizers
FixedLengthTokenizer
andRegexTokenizer
.
A minor release, which provides the following changes:
- Updated dependency "JavaUtil" to version 1.2.0.
A feature release, which provides the following changes:
- Tokenizers do not return multiple n-grams or substrings with the same token anymore. Instead, the positions off all duplicates are aggregated in one
NGram
orSubstring
. - Added constructors, which only allows to specify a maximum length, but no minimum length, to the classes
NGramTokenizer
andDiceCoefficient
.
The first stable release of the library, which provides the following utility classes:
- The classes
SubstringTokenizer
andNGramTokenizer
for splitting texts into shorter subtexts. - The metrics
DiceCoefficient
,HammingDistance
,HammingLoss
,HammingAccuracy
,LevenshteinDistance
,LevenshteinDissimilarity
andLevenshteinSimilarity
.