Skip to content
This repository has been archived by the owner on Sep 20, 2021. It is now read-only.

Ability to parse a "blob" (sequence of bytes) which length is defined by a previous token #68

Closed
Savageman opened this issue Apr 24, 2017 · 8 comments

Comments

@Savageman
Copy link
Member

Savageman commented Apr 24, 2017

For example 5 abcde
would be parsed into

token(number, 5)
token(fixedlengthstring, abcde)

with the constraint that the length of the 2nd string token "abcde" is equal to the value of the first number token 5.

Here are 3 examples that would need this feature (maybe that using the Compiler is not the right tool for the job, in which case this issue can be closed) :

1️⃣ parsing the response of a IMAP FETCH command

* 1 FETCH (BODY[HEADER]<0> {100}
The first 100 byte literals of the headers would be here)

In this example, the size of the next data token is included not far before the data to parse.

2️⃣ Creating an AST from a PDF file:

5 0 obj
<< /Length 42 >>
stream
This (possibly encoded) stream contains 42 bytes of dataendstream
endobj

This one is a bit more tricky since the size of the data stream (42 bytes) is contained in a the previous "Dictionary Node" (from an AST Point-Of-View), thus requiring knowledge of previous nodes already emited.

3️⃣ Other protocols uses this as well:

  • the Content-Length header (HTTP), same as 2️⃣
  • the payload len in a websocket frame, though this one is even more challenging because it would in addition require being able to read a streamed input
@jubianchi
Copy link
Member

jubianchi commented Apr 24, 2017

I would add such constraint on the visitor side...

Since Hoa/Compiler does not allow executable code in grammar (unlike JavaCC for example) you will have to define and check such constraints when traversing your AST.

@Grummfy
Copy link
Member

Grummfy commented Apr 24, 2017

I'm intrested too. Many files provided by bank are based on that kind of stuff.

The purpose here is to have a way to describe that kind of stuff without devlopping a lot of stuff.

@Savageman
Copy link
Member Author

@jubianchi You mean some post-processing (from the Compiler POV) that takes AST as an input and return another "fixed" AST?

@Hywan
Copy link
Member

Hywan commented Apr 24, 2017

Hello,

Unfortunately, this is not possible with such parsers. You might want to either look for a parser generator (like nom), or add this constraint on a visitor, but this is not ideal at all, and it covers a few use cases.

@Grummfy
Copy link
Member

Grummfy commented Apr 24, 2017

Hywan, well perhaps it could evolve in something différent or be another lib based on compiler.

@jubianchi
Copy link
Member

takes AST as an input and return another "fixed" AST?

I mean, a visitor which will validate an AST : it takes an AST as an input, validates it and returns the exact same AST if every constraint is OK.

@Hywan
Copy link
Member

Hywan commented Apr 25, 2017

@Grummfy We can create a parser generator, and add it to hoa/compiler, but it would not be efficient trivially. Parser generators are slow and safe, or fast and unsafe, but rarely both, except in some languages like Rust or Haskell. I have a plan to add one parser of this kind in hoa/compiler, but it requires several other libraries…

@Savageman
Copy link
Member Author

Thank you for the answer. :) I'm closing the issue since it's out of scope of this library.

@ghost ghost removed the in progress label May 11, 2017
SerafimArts added a commit to railt/compiler that referenced this issue Jan 6, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Development

No branches or pull requests

4 participants