Provide ability to reference underlying source from a returned JParsec object #5

jasons2000 · 2013-06-16T18:04:29Z

I would like to produce an annotated version of an input source file. To be able to do this one needs the ability to reference the original source, either as a "image" ie String or through pointers to the original source, perhaps using two Location objects,begin and end. It should include any white spaces too.

Does this sound reasonable?

abailly · 2013-06-16T20:41:46Z

Definitely reasonable! Will have a look at it tomorrow (2013-06-17) and assess complexity of needed devs.

jasons2000 · 2013-06-20T16:35:33Z

Any update?
On Jun 16, 2013 9:41 PM, "Arnaud Bailly" notifications@github.com wrote:

Definitely reasonable! Will have a look at it tomorrow (2013-06-17) and
assess complexity of needed devs.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/5#issuecomment-19518720
.

fluentfuture · 2013-06-20T16:48:08Z

So once you have a Parser that returns the fully parsed object, parser.source() is a parser that returns the original source string. Is that useful?

jasons2000 · 2013-06-20T18:43:30Z

Yes, sounds like it could be useful. Is it easy to to implement?

abailly · 2013-06-20T19:20:19Z

I wonder whether you can retrieve both an Object and the source of the object using the same Parser object? I will try to build an example using the proposed technique and push it to the repo.

jasons2000 · 2013-06-20T19:40:06Z

looking forward to seeing the results.

fluentfuture · 2013-06-20T19:58:06Z

source() is available in v2.0: http://jparsec.codehaus.org/jparsec2/api/org/codehaus/jparsec/Parser.html#source()

Though to get both the object and the underlying source, maybe try token():

http://jparsec.codehaus.org/jparsec2/api/org/codehaus/jparsec/Parser.html#token()

Token has the value, and the index and length in the original source.

abailly · 2013-06-20T20:15:12Z

Actually, I really don't see how this could work as is. Method source() returns a Parser<String> on which one needs to call parse(CharSequence) to extract a string result. But to construct the AST and its objects, one needs a Parser or equivalent depending on structure of the AST, and calling parse() on it.

But working on it I had a simpler idea than I initially thought: Provide a method Parser.locate() that would annotate the Parser's result, provided it implements a specific interface, eg. Locatable. Here is a sample use: https://gist.github.com/abailly/5826206

WDYT ?

* Locatable interface is called by successful parser when implemented by result

jasons2000 · 2013-06-20T21:20:55Z

hmm, does that mean adding the same interface to every parser object, as an
end user client?

On Thu, Jun 20, 2013 at 9:15 PM, Arnaud Bailly notifications@github.comwrote:

Actually, I really don't see how this could work as is. Method source()returns a
Parser on which one needs to call parse(CharSequence) to extract
a string result. But to construct the AST and its objects, one needs a
Parser or equivalent depending on structure of the AST, and calling
parse() on it.

But working on it I had a simpler idea than I initially thought: Provide a
method Parser.locate() that would annotate the Parser's result, provided
it implements a specific interface, eg. Locatable. Here is a sample use:
https://gist.github.com/abailly/5826206

WDYT ?

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/5#issuecomment-19780118
.

fluentfuture · 2013-06-20T21:35:18Z

So token() may feel a bit crude but it seems to work if you already have access to the source at parser creation time:

final String source = ...;

parser.token().map(token-> {
  return new AnnotatedFoo((Foo) token.getValue(), source.substring(token.getIndex(), token.length()));
});

fluentfuture · 2013-06-21T03:38:19Z

Some more alternatives that're supported in v2:

If you can live with [begin, end) indices in the source, instead of the actual substring, you could do:

Mapper.curry(LocationAnnotated.class).sequence(Parsers.INDEX, parser, Parsers.INDEX);

public class LocationAnnotated<T> {
  public LocationAnnotated(int begin, T value, int end) {...}
}

If you don't mind the cost of parsing over the same source twice, you can do:

Parser<SourceAnnotated<T>> annotateWithSource(Parser<T> parser) {
  return Mapper.curry(SourceAnnotated.class).sequence(parser.source().peek(), parser);
}

public class SourceAnnotated<T> {
  public SourceAnnotated(String source, T value) {...}
}

abailly · 2013-06-21T07:25:23Z

@jasons2000 Yes, you would need to add the interface to each node you want to make Locatable.

@fluentfuture Quite a few different options, indeed! Looks like I still have a lot to learn about the capabilities of jparsec :-)

I have committed a small piece of code implementing the Locatable idiom so that you may try it. Let us know which solution suits you best. I will add a small section on the wiki to document this feature which might interest other people.

jasons2000 · 2013-06-21T07:39:22Z

thanks for you help guys, I'm going to take a look at the weekend

fluentfuture · 2013-06-25T13:22:19Z

Arnaud, I have concerns over the Locatable approach. It's a side-effect based solution while jparsec has been immutable and functional so far.

I'm afraid introducing side-effect could cause surprising or confusing behavior or even bugs. Here's one use case I was thinking:

class Foo implements Locatable {...}

Parser<Foo> fooParser = (...).peek();

The 'fooParser' will return foo, but its source will be empty, counter-intuitively.

sequence(INDEX, parser.peek(), INDEX)

has the same effect. But at least it's obvious to code reader that the INDEX is obtained after peek().

I'm also unsure whether Locatable will make it more difficult to implement incremental parser.

WDYT?

abailly · 2013-06-25T17:09:18Z

Ben,
Sorry for delayed reply I was quite busy today.

I hear and understand your concerns. I initially was not very confident with this solution and now I see it is probably because it somehow feels clunky with respect to the rest of jparsec. However, I was guided by the fact it does not clutter much the parser's grammar and decouples the expression of the language we are parsing from the technical details of how it is constructed. Using INDEX parser is IM(NS)HO more intrusive.

What about something like:

public class Parser<T> {
  public Parser<Locatable<T>> locate() {
     return new LocatableParser(this); 
   }
}
Parser<Locatable<Foo>> fooParser = (...).locate();

Instead of relying on a side-effect, return a wrapped object from which you can retrieve both the location information and the successful parser? Locatable would be a functor then, providing common mapping operations to allow composition inside a locatable while preserving the structure of the parse tree.

Another possibility would be to augment the Token object with the source, thus alleviating the need for maintaining a secondary source.

Both solutions would better fit an incremental parser.

Anyhow, I do not take it as personal offense to be contradicted by the creator of the library so feel free to (positively) criticize what I do, this is ok with me. I will revert the current Locatable and move it to a branch, which I should have done in the first place anyway.

fluentfuture · 2013-06-26T01:56:51Z

Thanks for continuing the brainstorming!

On Jun 25, 2013, at 12:09 PM, Arnaud Bailly notifications@github.com wrote:

Ben,
Sorry for delayed reply I was quite busy today.

I hear and understand your concerns. I initially was not very confident with this solution and now I see it is probably because it somehow feels clunky with respect to the rest of jparsec. However, I was guided by the fact it does not clutter much the parser's grammar and decouples the expression of the language we are parsing from the technical details of how it is constructed. Using INDEX parser is IM(NS)HO more intrusive.

I suppose by "intrusive", you meant "less convenient" because intrusive can also mean that the user code has to implement a framework specified interface, in which sense, implementing Locatable is more intrusive because user domain code would then need to implement a jparsec-specific interface, as opposed to being pojo.

Using INDEX isn't very convenient since if user wants to track index for all of his tree nodes, he'll have to have a lot of Parser<LocationAnnotated> and every of his parser would need to be wrapped in annotateWithLocation().

So yeah, I definitely agree that using INDEX or an annotateWithLocation() helper leaves much to be desired. I was just not sure how strong the demand is and was hoping to learn from actual use cases before trying to think about the API.

I'm very curious about what jsons2000 find out in his use case, since it's the first time we hear about it.

What about something like:

public class Parser {
public Parser<Locatable> locate() {
return new LocatableParser(this);
}
}
Parser<Locatable> fooParser = (...).locate();

locate() is almost similar to the annotateWithLocation() method, with 2 main differences:

parser.locate() feels easier to use than annotateWithLocation(parser).
locate() also offers the String.

As I said above, I'm still not very sure how much this is needed to justify a core API addition vs. a small helper in user's code. It's not clear to me either whether user code really needs the source string in addition to the indices.

Jason's original question was: "either as a "image" ie String or through pointers to the original source, perhaps using two Location objects,begin and end". So it sounded like he's fine without the actual "image".

Creating a lot of substring objects may not be the most efficient if most clients end up only using the indices anyway.

So, I guess what I'm saying is: I'd like to learn a bit more about the requirement/use-case.

Instead of relying on a side-effect, return a wrapped object from which you can retrieve both the location information and the successful parser? Locatable would be a functor then, providing common mapping operations to allow composition inside a locatable while preserving the structure of the parse tree.

Another possibility would be to augment the Token object with the source, thus alleviating the need for maintaining a secondary source.

Both solutions would better fit an incremental parser.

Anyhow, I do not take it as personal offense to be contradicted by the creator of the library so feel free to (positively) criticize what I do, this is ok with me. I will revert the current Locatable and move it to a branch, which I should have done in the first place anyway.

Again, thank you!

—
Reply to this email directly or view it on GitHub.

This reverts commit d0b5a99.

abailly · 2013-06-26T06:17:00Z

You are right, "less convenient" suits better what I wanted to say than "more intrusive": Forcing implementing an interface is actually quite intrusive! About the "alternate" locate() implementation I proposed, you are also right, this is very close to your proposal above. I overlooked it when I replied.

And finally yes, let's wait about @jasons2000's feedback.

jasons2000 · 2013-06-26T06:40:44Z

HI Guys, I didn't get the chance to look at this at the weekend as planned,
but i should get the chance this coming weekend.

I've been following the debate keenly, and agree that the original proposal
seemed very intrusive.

I'd expect to be able to walk the parse tree and print of the tokens as I
go along, with out impacting the core grammar definition too much.

Should be able to give you good feedback by the weekend.

Jason

You are right, "less convenient" suits better what I wanted to say than
"more intrusive": Forcing implementing an interface is actually quite
intrusive! About the "alternate" locate() implementation I proposed, you
are also right, this is very close to your proposal above. I overlooked it
when I replied.

And finally yes, let's wait about @jasons2000 https://github.com/jasons2000's
feedback.

—
Reply to this email directly or view it on
GitHubhttps://github.com//issues/5#issuecomment-20028983
.

* Locatable interface is called by successful parser when implemented by result

abailly · 2013-08-12T12:21:33Z

@jasons2000 Did you manage to test one or another solution? Otherwise, I think we can close the issue for now and move on to as there seems to exist quite a few solutions to retrieve this information using current framework's implementation.

I will update the documentation to reflect proposed solution that do not impact existing codebase.

jasons2000 · 2013-10-20T21:09:05Z

haven't had a chance t look at this in detail, but I will be getting to it
at some point.

Thanks for your help again

On 12 August 2013 13:21, Arnaud Bailly notifications@github.com wrote:

@jasons2000 https://github.com/jasons2000 Did you manage to test one or
another solution? Otherwise, I think we can close the issue for now and
move on to as there seems to exist quite a few solutions to retrieve this
information using current framework's implementation.

I will update the documentation to reflect proposed solution that do not
impact existing codebase.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/5#issuecomment-22489295
.

abailly pushed a commit that referenced this issue Jun 20, 2013

basic implementation of Locatable logic #5

d0b5a99

* Locatable interface is called by successful parser when implemented by result

abailly added a commit that referenced this issue Jun 26, 2013

Revert "basic implementation of Locatable logic #5"

9b2be3d

This reverts commit d0b5a99.

abailly pushed a commit that referenced this issue Jun 30, 2013

basic implementation of Locatable logic #5

c937694

* Locatable interface is called by successful parser when implemented by result

abailly mentioned this issue Aug 14, 2013

Add Logging Capabilities #12

Closed

abailly closed this as completed Dec 27, 2013

abailly mentioned this issue Feb 18, 2014

Parser.withSource() parser #16

Merged

scolomer mentioned this issue Jan 6, 2015

Listeners on Parser.map() #36

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide ability to reference underlying source from a returned JParsec object #5

Provide ability to reference underlying source from a returned JParsec object #5

jasons2000 commented Jun 16, 2013

abailly commented Jun 16, 2013

jasons2000 commented Jun 20, 2013

fluentfuture commented Jun 20, 2013

jasons2000 commented Jun 20, 2013

abailly commented Jun 20, 2013

jasons2000 commented Jun 20, 2013

fluentfuture commented Jun 20, 2013

abailly commented Jun 20, 2013

jasons2000 commented Jun 20, 2013

fluentfuture commented Jun 20, 2013

fluentfuture commented Jun 21, 2013

abailly commented Jun 21, 2013

jasons2000 commented Jun 21, 2013

fluentfuture commented Jun 25, 2013

abailly commented Jun 25, 2013

fluentfuture commented Jun 26, 2013

abailly commented Jun 26, 2013

jasons2000 commented Jun 26, 2013

abailly commented Aug 12, 2013

jasons2000 commented Oct 20, 2013

Provide ability to reference underlying source from a returned JParsec object #5

Provide ability to reference underlying source from a returned JParsec object #5

Comments

jasons2000 commented Jun 16, 2013

abailly commented Jun 16, 2013

jasons2000 commented Jun 20, 2013

fluentfuture commented Jun 20, 2013

jasons2000 commented Jun 20, 2013

abailly commented Jun 20, 2013

jasons2000 commented Jun 20, 2013

fluentfuture commented Jun 20, 2013

abailly commented Jun 20, 2013

jasons2000 commented Jun 20, 2013

fluentfuture commented Jun 20, 2013

fluentfuture commented Jun 21, 2013

abailly commented Jun 21, 2013

jasons2000 commented Jun 21, 2013

fluentfuture commented Jun 25, 2013

abailly commented Jun 25, 2013

fluentfuture commented Jun 26, 2013

abailly commented Jun 26, 2013

jasons2000 commented Jun 26, 2013

abailly commented Aug 12, 2013

jasons2000 commented Oct 20, 2013