Permalink
Browse files

Version 0.4.0

  • Loading branch information...
pierrec committed Nov 6, 2012
1 parent 53ec9bd commit b7a0c55165b0808c7e2713c4a6f3499cfe92499a
View
@@ -1,4 +1,4 @@
-0.4.0 / 2012-10-30
+0.4.0 / 2012-11-06
==================
* Major internal refactoring in defining and running rules and subrules:
@@ -8,12 +8,12 @@
* `Atok#write(data)`
* `data` (_String_ | _Buffer_): always pass either type. Use `Atok#setEncoding()` when using strings (default=utf-8).
* `Atok#addRule(pattern...type)`
- * `pattern` (_String_ | _Buffer_): Buffers can be used instead of strings, except for {start,end} and {firstOf} patterns.
+ * `pattern` (_String_ | _Buffer_): Buffers can be used instead of strings, except for {start,end} and {firstOf} subrules.
* Behaviour changes:
* Subrules defined as a Number now do not apply the following ones to the matched token. Use another Atok instance to emulate previous behaviour.
* {firstOf} subrules cannot be in first position. Use addRule('', {firstOf}) instead.
- * Custom subrules returning 0 __must__ set continue() properly to avoid potential infinite loops.
- * All rules are cleared after a saveRuleSet()
+ * Function subrules returning 0 __must__ set continue() properly to avoid potential infinite loops.
+ * currentRule property is now a method
0.3.2 / 2012-09-16
==================
View
@@ -3,24 +3,22 @@
## Overview
-Atok is a fast, easy and flexible tokenizer designed for use with [node.js](http://nodejs.org). It is based around the [Stream](http://nodejs.org/docs/latest/api/streams.html) concept and is implemented as a read/write one.
+Atok is a fast, easy and dynamic tokenizer designed for use with [node.js](http://nodejs.org). It is based around the [Stream](http://nodejs.org/docs/latest/api/streams.html) concept and is implemented as a read/write one.
It was originally inspired by [node-tokenizer](https://github.com/floby/node-tokenizer), but quickly grew into its own form as I wanted it to be RegExp agnostic so it could be used on node Buffer intances and more importantly *faster*.
Atok is built using [ekam](https://github.com/pierrec/node-ekam) as it abuses includes and dynamic method generation.
Atok is the fundation for the [atok-parser](https://github.com/pierrec/node-atok-parser), which provides the environment for quickly building efficient and easier to maintain parsers.
-This is a work in progress as Buffer data is still converted into String before being processed. Removing this drawback is planned for the next version (0.4.0).
-
## Core concepts
First let's see some definitions. In atok's terms:
* a `subrule` is an atomic check against the current data. It can be represented by a user defined function (rarely), a string or a number, or an array of those, as well as specific objects defining a range of values for instance (e.g. { start: 'a', end: 'z' } is equivalent to /[a-z]/ in RegExp)
* a `rule` is an __ordered__ combination of subrules. Each subrule is evaluated in order and if any fails, the whole rule is considered failed. If all of them are valid, then the handler supplied at rule instanciation is triggered, or if none was supplied, a data event is emitted instead.
-* a `ruleSet` is a list of `rules` that are saved under a given name. Using `ruleSets` is useful when writting a parser to break down its complexity into smaller, easier to solve chunks.
+* a `ruleSet` is a list of `rules` that are saved under a given name. Using `ruleSets` is useful when writting a parser to break down its complexity into smaller, easier to solve chunks. RuleSets can be created or altered __on the fly__ by any of its handlers.
* a `property` is an option applicable to the current rules being created.
* properties are set using their own methods. For instance, a `rule` may load a different `ruleSet` upon match using `next()`
* properties are defined before the rules they need to be applied to. E.g. atok.next('rules2').addRule(...)
View
@@ -1,5 +1,10 @@
# TODO
+## 0.5.0
+
+* Compile rule sets to JS code
+
+
## 0.4.0
* looping rules optimization
View

Large diffs are not rendered by default.

Oops, something went wrong.
View

Large diffs are not rendered by default.

Oops, something went wrong.
View
@@ -45,7 +45,7 @@ function stringHandler (token, idx) {
}
function rawStringHandler (token, idx) {
addLine(idx)
- data[ data.length-1 ].push(token)
+ data[ data.length-1 ].push( token.toString() )
}
function emptyHandler (token, idx) {
addLine(idx)
@@ -55,7 +55,7 @@ function numberHandler (token, idx) {
addLine(idx)
var num = Number(token)
// Valid Number?
- data[ data.length-1 ].push( isFinite(num) ? num : token )
+ data[ data.length-1 ].push( isFinite(num) ? num : token.toString() )
}
// Define the main parser rules
View
@@ -20,22 +20,23 @@ module.exports = Rule
* @constructor
* @api private
*/
-function Rule (subrules, type, handler, atok) {
+function Rule (subrules, type, handler, props, groupProps, encoding) {
var self = this
var n = subrules.length
- this.atok = atok
- this.props = atok.getProps()
+ this.props = props
this.debug = false
// Used for cloning
this.subrules = subrules
// Required by Atok#_resolveRules
- this.group = atok._group
- this.groupStart = atok._groupStart
- this.groupEnd = atok._groupEnd
+ for (var p in groupProps)
+ this[p] = groupProps[p]
+ // this.group = atok._group
+ // this.groupStart = atok._groupStart
+ // this.groupEnd = atok._groupEnd
// Runtime values for continue props
this.continue = this.props.continue[0]
@@ -55,7 +56,7 @@ function Rule (subrules, type, handler, atok) {
// First subrule
var subrule = this.first = n > 0
- ? SubRule.firstSubRule( subrules[0], this.props, atok._encoding )
+ ? SubRule.firstSubRule( subrules[0], this.props, encoding )
// Special case: no rule given -> passthrough
: SubRule.emptySubRule
@@ -77,7 +78,7 @@ function Rule (subrules, type, handler, atok) {
var prev = subrule
// Many subrules or none
for (var i = 1; i < n; i++) {
- subrule = SubRule.SubRule( subrules[i], this.props, atok._encoding )
+ subrule = SubRule.SubRule( subrules[i], this.props, encoding )
prev.next = subrule
prev = subrule
if (this.length < subrule.length) this.length = subrule.length
@@ -121,9 +122,8 @@ function wrapDebug (rule, id, atok) {
return rule._test(buf, offset)
}
}
-Rule.prototype.setDebug = function (debug) {
+Rule.prototype.setDebug = function (debug, atok) {
var self = this
- var atok = this.atok
// Rule already in debug mode
if (this.debug === debug) return
@@ -173,15 +173,8 @@ Rule.prototype.setDebug = function (debug) {
*
* @api private
*/
-Rule.prototype.clone = function () {
- var self = this
- // Instantiate a dummy rule
- var rule = new Rule(this.subrules, this.type, this.handler, this.atok)
-
- // Overwrite its props
- Object.keys(self).forEach(function (k) {
- rule[k] = self[k]
- })
-
+Rule.prototype.clone = function (name) {
+ var rule = new Rule(this.subrules, this.type, this.handler, this.props, this)
+ rule.currentRule = name
return rule
-}
+}
View
@@ -444,6 +444,8 @@ exports.firstSubRule = function (rule, props, encoding) {
if (rule === null || rule === undefined)
throw new Error('Tokenizer#addRule: Invalid rule ' + rule + ' (function/string/integer/array only)')
+ // var loop = props.ignore && props.continue[0] === -1 && !props.next[0] ? '_loop' : ''
+ // var type = typeOf(rule) + loop
var type = typeOf(rule)
switch (type) {
View
@@ -94,8 +94,6 @@ function Atok (options) {
this._groupStartPrev = []
- // this.currentRule = { get: function (ruleSet) { return this._firstRule.currentRule }, set: function () { throw new Error('Atok: Cannot set currentRule') } } // Name of the current rule
- this.currentRule = null // Name of the current rule
this._rules = [] // Rules to be checked against
this._defaultHandler = null // Matched token default handler
this._savedRules = {} // Saved rules
@@ -122,6 +120,10 @@ function Atok (options) {
}
inherits(Atok, EV, Stream.prototype)
+// Atok.prototype.__defineGetter__('currentRule', function () {
+// return this._firstRule ? this._firstRule.currentRule : null
+// })
+
Atok.prototype._error = function (err) {
this.readable = false
this.writable = false
@@ -161,8 +163,6 @@ Atok.prototype.clear = function (keepRules) {
if (!keepRules) {
- // this.currentRule = { get: function (ruleSet) { return this._firstRule.currentRule }, set: function () { throw new Error('Atok: Cannot set currentRule') } } // Name of the current rule
- this.currentRule = null // Name of the current rule
this._rules = [] // Rules to be checked against
this._defaultHandler = null // Matched token default handler
this._savedRules = {} // Saved rules
@@ -182,14 +182,6 @@ Atok.prototype.clear = function (keepRules) {
* @api public
*/
Atok.prototype.slice = function (start, end) {
- // switch (arguments.length) {
- // case 0:
- // start = this.offset
- // case 1:
- // end = this.length
- // }
-
- // return this.buffer.substr(start, end - start)
return this.buffer.slice(start, end)
}
/**
@@ -241,12 +233,12 @@ Atok.prototype.debug = function (flag) {
this.debugMode = _debug
// Apply debug mode to all defined rules...
+ var self = this
this._rulesForEach(function (rule) {
- rule.setDebug(_debug)
+ rule.setDebug(_debug, self)
})
// Apply debug mode to some methods
- var self = this
;[ 'loadRuleSet' ].forEach(function (method) {
if (_debug) {
var prevMethod = self[method]
@@ -277,7 +269,15 @@ Atok.prototype._rulesForEach = function (fn) {
saved[ruleSet].rules.forEach(fn)
})
}
-// include("methods_ruleprops.js")
+/**
+ * Get the current rule set name
+ *
+ * @return {String} rule set name
+ * @api public
+ */
+Atok.prototype.currentRule = function () {
+ return this._firstRule ? this._firstRule.currentRule : null
+}// include("methods_ruleprops.js")
/**
* Set the default handler.
* Triggered on all subsequently defined rules if the handler is not supplied
@@ -611,15 +611,22 @@ Atok.prototype.addRule = function (/*rule1, rule2, ... type|handler*/) {
if ( first === 0 )
this._error( new Error('Atok#addRule: invalid first subrule, must be > 0') )
- else
+ else {
+ var groupProps = Object.create(null)
+ groupProps.group = this._group
+ groupProps.groupStart = this._groupStart
+ groupProps.groupEnd = this._groupEnd
this._rules.push(
new Rule(
args
, type
, handler
- , this
+ , this.getProps()
+ , groupProps
+ , this._encoding
)
)
+ }
this._rulesToResolve = true
@@ -653,9 +660,9 @@ Atok.prototype.removeRule = function (/* name ... */) {
*/
Atok.prototype.clearRule = function () {
this.clearProps()
+ this._firstRule = null
this._rules = []
this._defaultHandler = null
- this.currentRule = null
this._rulesToResolve = false
return this
@@ -671,18 +678,15 @@ Atok.prototype.saveRuleSet = function (name) {
if (arguments.length === 0 || name === null)
return this._error( new Error('Atok#saveRuleSet: invalid rule name supplied') )
- this.currentRule = name
this._savedRules[name] = {
- rules: this._rules.slice() // Make sure to make a copy of the list
- // Clone the rules
- // .map(function (rule) { return rule.clone() })
- // Assign the current rule set name
- .map(function (rule) { rule.currentRule = name; return rule })
+ rules: this._rules
+ .map(function (rule) { // Clone and assign the current rule set name
+ return rule.clone(name)
+ })
}
// Resolve and check continues
this._resolveRules(name)
- this.clearRule()
return this
}
@@ -701,9 +705,7 @@ Atok.prototype.loadRuleSet = function (name, index) {
index = typeof index === 'number' ? index : 0
- this.currentRule = name
this._rules = ruleSet.rules
- this._rulesToResolve = false
// Set the rule index
this._firstRule = this._rules[index]
this._resetRule = true
@@ -719,8 +721,6 @@ Atok.prototype.loadRuleSet = function (name, index) {
*/
Atok.prototype.removeRuleSet = function (name) {
delete this._savedRules[name]
- // Make sure no reference to the rule set exists
- if (this.currentRule === name) this.currentRule = null
return this
}
@@ -740,10 +740,9 @@ Atok.prototype._resolveRules = function (name) {
var self = this
// Check and set the continue values
var rules = name ? this._savedRules[name].rules : this._rules
- var groupStartPrev = this._groupStartPrev
function getErrorData (i) {
- return ( self.currentRule ? '@' + self.currentRule : ' ' )
+ return ( self.currentRule() ? '@' + self.currentRule() : ' ' )
+ (arguments.length > 0
? '[' + i + ']'
: ''
@@ -1133,25 +1132,25 @@ Atok.prototype._tokenize = function () {
p = this._firstRule
this._resetRule = false
- while ( p !== null && this.offset < this.length ) {
+ while ( p && this.offset < this.length ) {
props = p.props
// Return the size of the matched data (0 is valid!)
- //TODO matched = p.first.test(this.buffer, this.offset) - this.offset
matched = p.test(this.buffer, this.offset)
if ( matched < 0 ) {
- p = p.nextFail
+ // End of the rule set, end the loop
+ if (!p.nextFail) break
// Next rule exists, carry on
- if (p) continue
-
- // End of the rule set, end the loop
- break
+ p = p.nextFail
+ continue
}
// Is the token to be processed?
- if ( !props.ignore ) {
+ if ( props.ignore ) {
+ p = p.next
+ } else {
// Emit the data by default, unless the handler is set
token = props.quiet
? matched - (p.single ? 0 : p.last.length) - p.first.length
@@ -1173,12 +1172,8 @@ Atok.prototype._tokenize = function () {
} else {
p = p.next
}
- // p = this._resetRule ? this._firstRule : p.next
- } else {
- p = p.next
}
-
this.offset += matched
// NB. `break()` prevails over `pause()`
@@ -1193,7 +1188,7 @@ Atok.prototype._tokenize = function () {
}
// Keep track of the rule we are at
- this._firstRule = p || this._firstRule
+ if (p) this._firstRule = p
// Truncate the buffer if possible: min(offset, markedOffset)
if (this.markedOffset < 0) {
View
@@ -18,8 +18,6 @@
//if(keepRules)
if (!keepRules) {
//endif
- // this.currentRule = { get: function (ruleSet) { return this._firstRule.currentRule }, set: function () { throw new Error('Atok: Cannot set currentRule') } } // Name of the current rule
- this.currentRule = null // Name of the current rule
this._rules = [] // Rules to be checked against
this._defaultHandler = null // Matched token default handler
this._savedRules = {} // Saved rules
Oops, something went wrong.

0 comments on commit b7a0c55

Please sign in to comment.