- Major internal refactoring in defining and running rules and subrules:
- Easier code maintenance: less subrules objects and cleaner code
- 20-40% speed up in raw benchmarks
- Buffer support:
Atok#write(data)
data
(String | Buffer): always pass either type. UseAtok#setEncoding()
when using strings (default=utf-8).
Atok#addRule(pattern...type)
pattern
(String | Buffer): Buffers can be used instead of strings, except for {start,end} and {firstOf} subrules.
- Behaviour changes:
- Subrules defined as a Number now do not apply the following ones to the matched token. Use another Atok instance to emulate previous behaviour.
- {firstOf} subrules cannot be in first position. Use addRule('', {firstOf}) instead.
- Function subrules returning 0 must set continue() properly to avoid potential infinite loops.
- currentRule property is now a method
- Fix: infinite loop detections
- Fix: corrupted result when
addrule(number|firsOf)
return an empty token
-
Cleanups
deleteRuleSet()
->removeRuleSet()
loadProps()
->setProps()
: all or a subset of properties are returnedescaped()
->escape()
- renamed internal properties with a leading _
offsetBuffer
->markedOffset
-
Deprecated
saveProps()
: usegetProps()
to retrieve all propertiesexistsRule()
getAllRuleSet()
bytesRead
propertyseek()
: use theoffset
property directlyruleIndex
is now private (_ruleIndex
) and may not be systematically updated during parsing- Boolean subrules in
addRule()
: set the handler/type tofalse
to get the same behaviour addRule(0)
: use the [empty] event (NB. addRule(..., 0, ...)) is still supported.
-
Internals
- Use of Node's StringDecoder for utf-8 encoding
- Better compliance with Nodejs Stream API:
writable
andreadable
properties are set to false after an error,end()
anddestroy()
- The buffer is always truncated from min(
offset
,markedOffset
) to try and minimize memory usage - .continue(-1).ignore(true).next().addRule(subrule, handler) are optimized with a while()
-
Features
getProps()
returns all current properties- continue(string|function): resolution automatically performed on
saveRuleSet()
andwrite()
. This means that stricter checks are imposed:continue(+x)
with x>=0 cannot be set on the last rulecontinue(-x)
with x<-1 cannot be set on the first rule
- Infinite loop detection. Cannot detect rule handlers changing the offset property to the one before the rule execution.
- Added
slice([startIndex[, endIndex]])
: returns a slice of the buffer. NB. the buffer is not altered. - Added
groupRule(boolean)
: bind the following rules to the same index (useful for writing helpers and make them behave as a single rule). Groups can be set at any level. Empty or 1 rule groups are ignored. addRule(function)
is now considered a successful rule- If the last argument to
addRule()
isfalse
, the rule is ignored - Subrule { firstOf: (string|array) } accepts a string as well as an array
- Fix:
addRule(null)
throws an error - Fix:
addRule([1,2])
returns proper token - Fix: first rule validation enforced (waits for more data if required, which means rules starting with an array of numbers is equivalent to a rule with the max of those numbers: addRule([1,2]) <=> addRule(2))
-
Added
offsetBuffer
property: when set to a positive value, the buffer is not sliced whenwrite()
ends. Use with caution as this can make the buffer continuously grow. -
Added
currentRule
property: name of the current rule set,getRuleSet()
is deprecated -
Added
getProps()
: return an object containing the requested property values (default=all properties) -
Added second parameter to
continue()
: used when the rule fails (Number, String or Function) -
Added boolean sub rules to
addRule()
: the whole rule is discarded if false,true
subrule ignored -
addRule()
can now accept only one parameter (type|handler) -
Added second parameter to
loadRuleSet()
andnext()
: index to be used when loading the rule set -
Added handlers to the [debug] event
-
Fix:
continue(String|Function)
proper indexes -
Fix:
addRule(0)
can now be invoked many times in a rule set -
Switch to using JSDoc format, documentation automatically generated on build
continue()
acceptsnull
- Added [pipe], [listening], [open] and [close] events to the event set
- Added support for array of functions in rule definitions: addRule([fn1, fn2...], ...)
- Fixed wrong array size in sliceArguments()
- Fix:
firstOf
now honorsescaped()
- Fix: invalid rule index after when using
loadRuleSet()
ornext()
- Code refactoring
- automatic masked Rule#test() method
escaped()
subrules
- Fixed rule set name not being reset upon
clearRule()
- Fixed
clearProps()
not chainable - Added
break()
: abort a current rule set. Use continue(-1) to resume at the current subrule. - Added
version
property
- Multiple calls to
debug()
fix
split()
removed as it can be achieved with current rules definition and adds little value- Performance improvements (~50% compared to v0.1.10)
- [match] event removed as redundant with the [debug] event
- [loadruleset] and [seek] events moved under the [debug] event
- [debug] event signature: (method name, type, data)
debug
option moved to thedebug()
method so debug mode can be turned on and off dynamically- Added
events
property to Atok
- Code cleanups
- Performance improvements
- Added benchmarks for every subrule type
- New
split(flag)
property: split token by subrules. No effect if # of subrules is < 3. - New
debug
option: emits the [debug] event if set to true or trigger the given function for debugging purposes. Note that thanks to dynamic method setting, this has absolutely no impact on performance if not set!
continue()
supports string and function input - must perform asaveRuleSet()
to take effect- Some refactoring
- Array.prototype.slice calls
- Rule index internal fetching
clearProps()
reset properties to their default valuessaveProps()
->saveProps(name)
with name=default if not set
addRule(rule, 123)
fixed when used withquiet(true)
continue()
applied when a handler usespause()
write()
will continue at the last rule index if the last successful rule was subject tocontinue()
next()
andignore()
can now be applied toaddRule(0)
- note thatcontinue()
cannot- [loadruleset] and [seek] events
addRule(123)
now honorsquiet()
- new property: Atok.ending (Boolean): indicates if
end()
was called - new events:
- match (replaces matchEventHandler): rule match (current offset, matched size, matched rule object)
- empty: empty buffer (ending flag) cf. TODO about the performance impact when listeners are attached
addRule([rules], 0)
fixedaddRule('', handler)
now honorsquiet()
- handlers triggered in
quiet()
mode gives the non extracted token size as the first argument (actually introduced in the previous release) - emptyHandler now triggered on a per rule set basis
- New matchEventHandler property (Function): triggered upon a rule match with arguments: , ,
continue()
accepts negative inputaddRule(-1, handler)
triggers [handler] when tokenizer has ended
- When
quiet()
, the token is set to the would be token length seek()
decreases bytes on negative seek- Added
existsRule()
- Added
deleteRuleSet()
- Added
getRuleSet()
- Added
getAllRuleSet()
- Changed
length()
to be a property - Added
addRuleFirst()
- Added
continue()
- If any remaining data,
end()
signature set to (token, -1, ruleSetName) setEncoding()
default value is UTF-8, utf-8 and utf8 are also acceptedremoveRule()
fixed processing Function
- Fixed utf8 handling in
write()
- Fixed return flag in
write()
, resolvingpipe()
freeze
- First release