Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Have "Complex Tokens" that are length-independent... #3

Open
TooTallNate opened this Issue · 2 comments

2 participants

@TooTallNate

Would it be possible to create a custom complex token that doesn't know it's len property?

I.E. I'm trying to make a CharDelimiter custom token that works similar to String#split, and returns a token when it finds a specified Char (maybe a space, " "):

function CharDelimiter(delim) {
  this.get = function(buf, off) {
    for (var i=off, l=buf.length; i<l; i++) {
      if (buf[i] == delim.charCodeAt(0)) {
        return buf.slice(off, i-1);
      }
    }
  }
}

var stream = new EventEmitter();
strtok.parse(stream, function(v) {
  if (v === undefined) {
    return new CharDelimiter(" ");
  }

  console.error(v);
  return new CharDelimiter(" ");
});

stream.emit("data", new Buffer("this is a test"));
  //-> this
  //-> is
  //-> a
  //-> test

Looking at the current way it's set up, this might be hard to implement. And I realize that it might just be easier to implement in a new module. But frankly, I thought that some sort of way to split a token on a certain char or String would be build-in functionality, since that's how the C version kinds works.

I think it would work if the get function were allowed to return a special value (similar to DEFER, lets call it WAIT), which meant that the Buffer didn't have whatever the custom token was looking for (yet) and to re-invoke the get method after another data event is received. Then my CharDelimiter function might look something like:

function CharDelimiter(delim) {
  this.get = function(buf, off) {
    for (var i=off, l=buf.length; i<l; i++) {
      if (buf[i] == delim.charCodeAt(0)) {
        return buf.slice(off, i-1);
      }
    }
    // If we didn't find the delimiter, wait til the next 'data' event
    return strtok.WAIT;
  }
}

What are your thoughts on an API like this? Maybe even reusing the DEFER object itself, since their meaning is similar.

Thanks in advance!

@pgriess
Owner

I've been holding off on functionality like this because it falls into the bucket of parsing ASCII streams, which has a whole slew of other problems and far better developer APIs (e.g. specifying tokens as regexes, allowing tok1 || tok2 specifiers, etc). I've kind of been thinking that handling ASCII streams would probably end up with its own API (and possibly separate module).

That said, ASCII streams are definitely something that I'm interested in. If you're willing to work in a non-master branch, we can work on iterating on the current API towards something that works with ASCII streams. If that's the case, I think changing the 'len' property to a function that is passed the list of currently-known buffers and can return either a positive value (real length) or negative value (don't know yet).

What do you think?

@TooTallNate

Turning len into a function would work as well. I kind of like that better! I'd recommend that we add a third, optional, parameter to the get callback that would be the expected length. i.e. it would be the return value of the last time the len function was called.

I think then my example CharDelimier custom token should look like:

function CharDelimiter(delim) {
  this.len = function(bufferList) {
    var len = 0;
    for (var n=0; n < bufferList.length; n++) {
      for (var i=0; i < bufferList[n].length; i++) {
        if (bufferList[n][i] == delim.charCodeAt(0)) {
          return len;
        }
        len++;
      }
    }
    // If we didn't find the delimiter, wait til the next 'data' event
    return -1;
  }

  this.get = function(buf, off, len) {
    return buf.slice(off, off+len);
  }
}

I had a hacked-together version of the API I originally proposed, which still failed on some edge-cases. I think I'll start over taking a look at this API, unless of course you do first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.