Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide ability to reference underlying source from a returned JParsec object #5

Closed
jasons2000 opened this issue Jun 16, 2013 · 20 comments

Comments

@jasons2000
Copy link

I would like to produce an annotated version of an input source file. To be able to do this one needs the ability to reference the original source, either as a "image" ie String or through pointers to the original source, perhaps using two Location objects,begin and end. It should include any white spaces too.

Does this sound reasonable?

@abailly
Copy link
Contributor

abailly commented Jun 16, 2013

Definitely reasonable! Will have a look at it tomorrow (2013-06-17) and assess complexity of needed devs.

@jasons2000
Copy link
Author

Any update?
On Jun 16, 2013 9:41 PM, "Arnaud Bailly" notifications@github.com wrote:

Definitely reasonable! Will have a look at it tomorrow (2013-06-17) and
assess complexity of needed devs.


Reply to this email directly or view it on GitHubhttps://github.com//issues/5#issuecomment-19518720
.

@fluentfuture
Copy link
Contributor

So once you have a Parser that returns the fully parsed object, parser.source() is a parser that returns the original source string. Is that useful?

@jasons2000
Copy link
Author

Yes, sounds like it could be useful. Is it easy to to implement?

@abailly
Copy link
Contributor

abailly commented Jun 20, 2013

I wonder whether you can retrieve both an Object and the source of the object using the same Parser object? I will try to build an example using the proposed technique and push it to the repo.

@jasons2000
Copy link
Author

looking forward to seeing the results.

@fluentfuture
Copy link
Contributor

source() is available in v2.0: http://jparsec.codehaus.org/jparsec2/api/org/codehaus/jparsec/Parser.html#source()

Though to get both the object and the underlying source, maybe try token():

http://jparsec.codehaus.org/jparsec2/api/org/codehaus/jparsec/Parser.html#token()

Token has the value, and the index and length in the original source.

@abailly
Copy link
Contributor

abailly commented Jun 20, 2013

Actually, I really don't see how this could work as is. Method source() returns a Parser<String> on which one needs to call parse(CharSequence) to extract a string result. But to construct the AST and its objects, one needs a Parser or equivalent depending on structure of the AST, and calling parse() on it.

But working on it I had a simpler idea than I initially thought: Provide a method Parser.locate() that would annotate the Parser's result, provided it implements a specific interface, eg. Locatable. Here is a sample use: https://gist.github.com/abailly/5826206

WDYT ?

abailly pushed a commit that referenced this issue Jun 20, 2013
* Locatable interface is called by successful parser when implemented by result
@jasons2000
Copy link
Author

hmm, does that mean adding the same interface to every parser object, as an
end user client?

On Thu, Jun 20, 2013 at 9:15 PM, Arnaud Bailly notifications@github.comwrote:

Actually, I really don't see how this could work as is. Method source()returns a
Parser on which one needs to call parse(CharSequence) to extract
a string result. But to construct the AST and its objects, one needs a
Parser or equivalent depending on structure of the AST, and calling
parse() on it.

But working on it I had a simpler idea than I initially thought: Provide a
method Parser.locate() that would annotate the Parser's result, provided
it implements a specific interface, eg. Locatable. Here is a sample use:
https://gist.github.com/abailly/5826206

WDYT ?


Reply to this email directly or view it on GitHubhttps://github.com//issues/5#issuecomment-19780118
.

@fluentfuture
Copy link
Contributor

So token() may feel a bit crude but it seems to work if you already have access to the source at parser creation time:

final String source = ...;

parser.token().map(token-> {
  return new AnnotatedFoo((Foo) token.getValue(), source.substring(token.getIndex(), token.length()));
});

@fluentfuture
Copy link
Contributor

Some more alternatives that're supported in v2:

  • If you can live with [begin, end) indices in the source, instead of the actual substring, you could do:
Mapper.curry(LocationAnnotated.class).sequence(Parsers.INDEX, parser, Parsers.INDEX);

public class LocationAnnotated<T> {
  public LocationAnnotated(int begin, T value, int end) {...}
}
  • If you don't mind the cost of parsing over the same source twice, you can do:
Parser<SourceAnnotated<T>> annotateWithSource(Parser<T> parser) {
  return Mapper.curry(SourceAnnotated.class).sequence(parser.source().peek(), parser);
}

public class SourceAnnotated<T> {
  public SourceAnnotated(String source, T value) {...}
}

@abailly
Copy link
Contributor

abailly commented Jun 21, 2013

@jasons2000 Yes, you would need to add the interface to each node you want to make Locatable.

@fluentfuture Quite a few different options, indeed! Looks like I still have a lot to learn about the capabilities of jparsec :-)

I have committed a small piece of code implementing the Locatable idiom so that you may try it. Let us know which solution suits you best. I will add a small section on the wiki to document this feature which might interest other people.

@jasons2000
Copy link
Author

thanks for you help guys, I'm going to take a look at the weekend

@fluentfuture
Copy link
Contributor

Arnaud, I have concerns over the Locatable approach. It's a side-effect based solution while jparsec has been immutable and functional so far.

I'm afraid introducing side-effect could cause surprising or confusing behavior or even bugs. Here's one use case I was thinking:

class Foo implements Locatable {...}

Parser<Foo> fooParser = (...).peek();

The 'fooParser' will return foo, but its source will be empty, counter-intuitively.

sequence(INDEX, parser.peek(), INDEX)

has the same effect. But at least it's obvious to code reader that the INDEX is obtained after peek().

I'm also unsure whether Locatable will make it more difficult to implement incremental parser.

WDYT?

@abailly
Copy link
Contributor

abailly commented Jun 25, 2013

Ben,
Sorry for delayed reply I was quite busy today.

I hear and understand your concerns. I initially was not very confident with this solution and now I see it is probably because it somehow feels clunky with respect to the rest of jparsec. However, I was guided by the fact it does not clutter much the parser's grammar and decouples the expression of the language we are parsing from the technical details of how it is constructed. Using INDEX parser is IM(NS)HO more intrusive.

What about something like:

public class Parser<T> {
  public Parser<Locatable<T>> locate() {
     return new LocatableParser(this); 
   }
}
Parser<Locatable<Foo>> fooParser = (...).locate();

Instead of relying on a side-effect, return a wrapped object from which you can retrieve both the location information and the successful parser? Locatable would be a functor then, providing common mapping operations to allow composition inside a locatable while preserving the structure of the parse tree.

Another possibility would be to augment the Token object with the source, thus alleviating the need for maintaining a secondary source.

Both solutions would better fit an incremental parser.

Anyhow, I do not take it as personal offense to be contradicted by the creator of the library so feel free to (positively) criticize what I do, this is ok with me. I will revert the current Locatable and move it to a branch, which I should have done in the first place anyway.

@fluentfuture
Copy link
Contributor

Thanks for continuing the brainstorming!

On Jun 25, 2013, at 12:09 PM, Arnaud Bailly notifications@github.com wrote:

Ben,
Sorry for delayed reply I was quite busy today.

I hear and understand your concerns. I initially was not very confident with this solution and now I see it is probably because it somehow feels clunky with respect to the rest of jparsec. However, I was guided by the fact it does not clutter much the parser's grammar and decouples the expression of the language we are parsing from the technical details of how it is constructed. Using INDEX parser is IM(NS)HO more intrusive.

I suppose by "intrusive", you meant "less convenient" because intrusive can also mean that the user code has to implement a framework specified interface, in which sense, implementing Locatable is more intrusive because user domain code would then need to implement a jparsec-specific interface, as opposed to being pojo.

Using INDEX isn't very convenient since if user wants to track index for all of his tree nodes, he'll have to have a lot of Parser<LocationAnnotated> and every of his parser would need to be wrapped in annotateWithLocation().

So yeah, I definitely agree that using INDEX or an annotateWithLocation() helper leaves much to be desired. I was just not sure how strong the demand is and was hoping to learn from actual use cases before trying to think about the API.

I'm very curious about what jsons2000 find out in his use case, since it's the first time we hear about it.

What about something like:

public class Parser {
public Parser<Locatable> locate() {
return new LocatableParser(this);
}
}
Parser<Locatable> fooParser = (...).locate();

locate() is almost similar to the annotateWithLocation() method, with 2 main differences:

  1. parser.locate() feels easier to use than annotateWithLocation(parser).
  2. locate() also offers the String.

As I said above, I'm still not very sure how much this is needed to justify a core API addition vs. a small helper in user's code. It's not clear to me either whether user code really needs the source string in addition to the indices.

Jason's original question was: "either as a "image" ie String or through pointers to the original source, perhaps using two Location objects,begin and end". So it sounded like he's fine without the actual "image".

Creating a lot of substring objects may not be the most efficient if most clients end up only using the indices anyway.

So, I guess what I'm saying is: I'd like to learn a bit more about the requirement/use-case.

Instead of relying on a side-effect, return a wrapped object from which you can retrieve both the location information and the successful parser? Locatable would be a functor then, providing common mapping operations to allow composition inside a locatable while preserving the structure of the parse tree.

Another possibility would be to augment the Token object with the source, thus alleviating the need for maintaining a secondary source.

Both solutions would better fit an incremental parser.

Anyhow, I do not take it as personal offense to be contradicted by the creator of the library so feel free to (positively) criticize what I do, this is ok with me. I will revert the current Locatable and move it to a branch, which I should have done in the first place anyway.

Again, thank you!


Reply to this email directly or view it on GitHub.

abailly added a commit that referenced this issue Jun 26, 2013
@abailly
Copy link
Contributor

abailly commented Jun 26, 2013

You are right, "less convenient" suits better what I wanted to say than "more intrusive": Forcing implementing an interface is actually quite intrusive! About the "alternate" locate() implementation I proposed, you are also right, this is very close to your proposal above. I overlooked it when I replied.

And finally yes, let's wait about @jasons2000's feedback.

@jasons2000
Copy link
Author

HI Guys, I didn't get the chance to look at this at the weekend as planned,
but i should get the chance this coming weekend.

I've been following the debate keenly, and agree that the original proposal
seemed very intrusive.

I'd expect to be able to walk the parse tree and print of the tokens as I
go along, with out impacting the core grammar definition too much.

Should be able to give you good feedback by the weekend.

Jason

You are right, "less convenient" suits better what I wanted to say than
"more intrusive": Forcing implementing an interface is actually quite
intrusive! About the "alternate" locate() implementation I proposed, you
are also right, this is very close to your proposal above. I overlooked it
when I replied.

And finally yes, let's wait about @jasons2000https://github.com/jasons2000's
feedback.


Reply to this email directly or view it on
GitHubhttps://github.com//issues/5#issuecomment-20028983
.

abailly pushed a commit that referenced this issue Jun 30, 2013
* Locatable interface is called by successful parser when implemented by result
@abailly
Copy link
Contributor

abailly commented Aug 12, 2013

@jasons2000 Did you manage to test one or another solution? Otherwise, I think we can close the issue for now and move on to as there seems to exist quite a few solutions to retrieve this information using current framework's implementation.

I will update the documentation to reflect proposed solution that do not impact existing codebase.

@jasons2000
Copy link
Author

haven't had a chance t look at this in detail, but I will be getting to it
at some point.

Thanks for your help again

On 12 August 2013 13:21, Arnaud Bailly notifications@github.com wrote:

@jasons2000 https://github.com/jasons2000 Did you manage to test one or
another solution? Otherwise, I think we can close the issue for now and
move on to as there seems to exist quite a few solutions to retrieve this
information using current framework's implementation.

I will update the documentation to reflect proposed solution that do not
impact existing codebase.


Reply to this email directly or view it on GitHubhttps://github.com//issues/5#issuecomment-22489295
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants