Import/include other grammars #38

Closed
ceymard opened this Issue Aug 15, 2011 · 32 comments

Comments

Projects
None yet

ceymard commented Aug 15, 2011

It could be extremely useful to have the ability to define grammars by importing rules from other grammars.

Several ideas ;

@include "expression.pegjs"
(or @from "expression.pegjs" import expression)

tag_if
    = "if" space? expression space? { ... }

@import "expression.pegjs" as expr

tag_if
    = "if" space? expr.expression space?

Ideally, this would not re-generate the whole code in every .pegjs that includes another ; maybe we would have to modify a little the behaviour of parse() to something of the like ;

Editing as per what you were saying in the options issue ;

parse(input, startRule)
->
parse(input, { startRule: "...", startPos : 9000 })

And at the end, if startPos != 0 && result !== null, we don't check if we went until input.length, but instead return the result as well as the endPos (don't really know how to do that elegantly - maybe simply modifying the options parameter ?).

It would allow reusability of grammars and modularisation of the code, which I think are two extremely important aspects of coding in general.

Contributor

dmajda commented Aug 20, 2011

I agree that this is an important feature, I want to do this after version 1.0.

(BTW I don't like the Python-like syntax you propose — something similar to Node.js's require would be better because it would be more familiar to JavaScript programmers. But this is a minor thing that can be ironed out later.)

ceymard commented Aug 20, 2011

Would you consider it for inclusion before 1.0 if provided with a patch ?

I agree on your remark about the python syntax.

s3u commented Oct 2, 2011

+1 for this feature

Contributor

dmajda commented Jan 10, 2012

@ceymard Yes, I would consider it.

+1 for the feature and +1 for require style inclusion

@dmajda @ceymard Do you have any thoughts already on how to implement this? I need this for a project at work and will try to implement. The question is should this be just an addition to split grammars into multiple files or something like inheritance, so one could inherit all rules for example and then overwrite specific rules in the new grammar.

Contributor

dmajda commented Feb 23, 2013

@dignifiedquire I am currently thinking about syntax & semantics that can probably be best explained by an example:

static-languages.pegjs

langauges  = "C" / "C++" / "Java" / "C#"

dynamic-languages.pegjs

languages = "Ruby" / "Python" / "JavaScript"

all-languages.pegjs

static  = require("./static-languages")
dynamic = require("./dynamic-languages")

all = static.languages / dynamic.languages

Each .pegjs file would implicitly define a module that would export all the rules it contains. The <name> = require(<module>) construct would import such a module. Its rules would then be available inside a namespace.

This design is deliberately similar to Node.js. Using namespaces will avoid conflicts. There are two downsides I see:

  1. The <name> = require(<module>) construct is too similar to rule definitions and thus can be confusing (one might think that just one rule is imported).
  2. The . syntax conflicts with the current meaning of ., which is “any character”. This can be solved by ugly hacks (e.g. . surrounded by whitespace means “any character”, while . surrounded by identifiers separates a namespace name from a rule name) or by changing the syntax (e.g. using any keyword to represent “any character”).

@dmajda As the <identifier> = <expression> pattern is already taken by the rule definitions, why not do something like this:

static := require("./static-languages")
dynamic := require("./dynamic-languages")

all = static::languages / dynamic::languages

The :: is not used anywhere that I know of in PEG.js and makes it easy to distinguish between namespaces and other things. I'm not sure about the := it brings the point across but feels very foreign for Javascript..

Also if you want to use namespaces, do you think there should be only one namespace per file or should there be a way of creating multiple namespaces in one file like this:

static := {
  languages  = "C" / "C++" / "Java" / "C#"
}

dynamic := {
  languages = "Ruby" / "Python" / "JavaScript"
}
Contributor

dmajda commented Feb 24, 2013

I'm not much of a fan of :: and :=, they look alien in javaScript/CoffeeScript world.

I'd also like to keep things simple and define namespaces implicitly only by requiring files. I don't see a big need for anything more complicated.

otac0n commented Mar 1, 2013

How about simply:

@require foo = "./foo"

bar = foo:languages

Colons are a compromise, but they are used to separate namespaces in many places: C++, C#, XML, etc.

: will always be associated with cons for many, many functional programmers. I suggest staying away from that operator. :: looks fine to me. Isn't that used for C++ namespaces? I'm not convinced yet that . is a bad choice, either.

otac0n commented Mar 1, 2013

. can't be used without a breaking change. It would be ambiguous in the language.

:: is used in C++ for namespaces, and in C# for namespace prefixes (global::System, for example).

Contributor

andreineculau commented May 24, 2013

I was thinking of a quick workaround on this topic - to solve simple inheritance only - glue pegjs files together, while having everything namespaced.

This might make grammars too verbose, and involves a building step - but looking at the bright side, it would force you to have granular DRY&OTW grammars

And regarding the markup, no saying that this is a proper fit to this thread, but just an option to consider, I was going for a simple __

languages = static__languages / dynamic__languages
<static-languages.pegjs>
<dynamic-languages.pegjs>
/* alternative */
languages = STATIC__languages / DYNAMIC__languages

@andreineculau I'm basically already doing this with a build step, so if you and others are just looking for something to generate useful parsers from a grammar with a dependency tree (where a single parser implementing the combined grammar is generated), I might clean what I have up and release it so the discussion can refocus on how to deal with this in a more permanent way.

Another thing: approaching this primarily by designing extensions to the grammar syntax misses something important, which is that one of the main reasons we all have the itch to pull in rules from other grammars (another being clarity) is the need to write parsers that share a lot of logic. So, while generated parsers might never be meaningfully re-composable at parse-time, it seems important that a tree of grammars generate a tree of parsers, rather than one monolithic parser. It's most important when a set of parsers will be part of a web UI, but it generally doesn't hurt to avoid unnecessary bloat in generated code.

Contributor

andreineculau commented May 25, 2013

@odonnell +1 for releasing anything - no matter if you have the time to clean it up

and +1 for the clarification. This should be treated as a quick workaround, not a long-term proper solution.

Contributor

andreineculau commented May 26, 2013

@odonnell my take on it is online at https://github.com/andreineculau/core-pegjs - please poke me if you have something better.

cpettitt commented Sep 3, 2013

+1 for this feature

👍

👍

👍

I went and wrote a plugin/extension for PEG.js that does imports: https://github.com/casetext/pegjs-import.

yinso commented Oct 23, 2014

+1 for this as well.

Contributor

Mingun commented Feb 3, 2015

I implement this in #308 in generic way: inclusion of grammar is only one way to implement decomposition rules.

@dmajda dmajda changed the title from Import/Include other grammars to Import/include other grammars Aug 14, 2015

Mingun added a commit to Mingun/pegjs that referenced this issue Dec 27, 2015

Issue #38 (import feature): Implement base support for import in gram…
…mar and compiler. Import clause syntax:

```
@<alias> = <string with path to .pegjs file>
```
Import clauses expected before initializer code block.

Usage of imported rules:
- @<alias> -- use default parse rule (implementation-defined, for example, first in imported grammar).
- @<alias>:<rule> -- use specified parse rule.

Implementation:
- All import clauses appears in AST `grammar` node in `imports` node property.
  This property contains array of AST `import` nodes with properties `alias` and `path`.
- All AST `rule_ref` nodes now have `namespace` property with alias of imported
  grammar or `null` if rule defined in the grammar itself. Also, if `name` property is
 `null` and `namespace` property is not, then will be used default rule of imported grammar.
- Support in compiler -- skip some checks for references to imported rules.

Mingun added a commit to Mingun/pegjs that referenced this issue Dec 27, 2015

Mingun added a commit to Mingun/pegjs that referenced this issue Dec 27, 2015

Mingun added a commit to Mingun/pegjs that referenced this issue Jan 3, 2016

Issue #38 (import feature): Implement base support for import in gram…
…mar and compiler. Import clause syntax:

```
@<alias> = <string with path to .pegjs file>
```
Import clauses expected before initializer code block.

Usage of imported rules:
- @<alias> -- use default parse rule (implementation-defined, for example, first in imported grammar).
- @<alias>:<rule> -- use specified parse rule.

Implementation:
- All import clauses appears in AST `grammar` node in `imports` node property.
  This property contains array of AST `import` nodes with properties `alias` and `path`.
- All AST `rule_ref` nodes now have `namespace` property with alias of imported
  grammar or `null` if rule defined in the grammar itself. Also, if `name` property is
 `null` and `namespace` property is not, then will be used default rule of imported grammar.
- Support in compiler -- skip some checks for references to imported rules.

Mingun added a commit to Mingun/pegjs that referenced this issue Jan 3, 2016

Mingun added a commit to Mingun/pegjs that referenced this issue Jan 3, 2016

Mingun added a commit to Mingun/pegjs that referenced this issue Jan 3, 2016

Mingun added a commit to Mingun/pegjs that referenced this issue Jan 3, 2016

Mingun added a commit to Mingun/pegjs that referenced this issue Jan 4, 2016

Mingun added a commit to Mingun/pegjs that referenced this issue Jan 4, 2016

Mingun added a commit to Mingun/pegjs that referenced this issue Jan 4, 2016

Mingun added a commit to Mingun/pegjs that referenced this issue Jan 4, 2016

@justinvdm justinvdm referenced this issue in praekelt/numl Mar 9, 2016

Merged

Support multiple choice properties #10

Great feature 👍

Looking forward to seeing it released.

rumkin commented Apr 8, 2016

Awesome! 👍

Mingun added a commit to Mingun/pegjs that referenced this issue Sep 16, 2016

Issue #38 (import feature): Implement base support for import in gram…
…mar and compiler. Import clause syntax:

```
@<alias> = <string with path to .pegjs file>
```
Import clauses expected before initializer code block.

Usage of imported rules:
- @<alias> -- use default parse rule (implementation-defined, for example, first in imported grammar).
- @<alias>:<rule> -- use specified parse rule.

Implementation:
- All import clauses appears in AST `grammar` node in `imports` node property.
  This property contains array of AST `import` nodes with properties `alias` and `path`.
- All AST `rule_ref` nodes now have `namespace` property with alias of imported
  grammar or `null` if rule defined in the grammar itself. Also, if `name` property is
 `null` and `namespace` property is not, then will be used default rule of imported grammar.
- Support in compiler -- skip some checks for references to imported rules.

Mingun added a commit to Mingun/pegjs that referenced this issue Sep 16, 2016

Mingun added a commit to Mingun/pegjs that referenced this issue Sep 16, 2016

Mingun added a commit to Mingun/pegjs that referenced this issue Nov 26, 2016

Issue #38 (import feature): Implement base support for import in gram…
…mar and compiler. Import clause syntax:

```
@<alias> = <string with path to .pegjs file>
```
Import clauses expected before initializer code block.

Usage of imported rules:
- @<alias> -- use default parse rule (implementation-defined, for example, first in imported grammar).
- @<alias>:<rule> -- use specified parse rule.

Implementation:
- All import clauses appears in AST `grammar` node in `imports` node property.
  This property contains array of AST `import` nodes with properties `alias` and `path`.
- All AST `rule_ref` nodes now have `namespace` property with alias of imported
  grammar or `null` if rule defined in the grammar itself. Also, if `name` property is
 `null` and `namespace` property is not, then will be used default rule of imported grammar.
- Support in compiler -- skip some checks for references to imported rules.

Mingun added a commit to Mingun/pegjs that referenced this issue Nov 26, 2016

Mingun added a commit to Mingun/pegjs that referenced this issue Nov 26, 2016

Mingun added a commit to Mingun/pegjs that referenced this issue Nov 26, 2016

dmsnell commented Dec 28, 2016

@dmajda I'm coming late to this party, but I wonder how often we need to import many rules from another library. I would love to be able to import things like Url and Email into my composed grammars but I don't care that Url may also have things like HierarchicalPart and AsciiLetter. Do you think something like Node's named exports would be a viable way forward, keeping the benefits of namespacing but allowing direct named imports?

import { SchemalessUrl, Url } from "./Urls.pegjs"

Token
  = PhoneNumber
  / Url
  / SchemalessUrl

Namespacing has been an issue for me as I try and explore writing otherwise-composable grammars. I'm stuck right now including files in files and naming things the way PHP functions were named before they introduced proper namespaces: UrlIpHost, HtmlQuotedString, etc…

@dmajda @futagoza

Any progress on this issue? or the primary discussion living now on #473 ?
My grammar file is growing very fast :(
It would be nice to split it several ones

I wouldn't mind being able to split grammars between files, simply for organization and composition. It would make them easier to test and re-use, as well as providing a way swap grammars dynamically, maybe? Just some thoughts.

The JavaScript example that I used as a base is over 1,300 lines. It took a while to learn where everything was, and jump around and edit different sections.

@mikeaustin I see this feature as some kind of Node.JS required:

cat bash.pegjs
{
const _ = require("whitespace");
const LB = require("line_break");
const CodeBlock = require("code_block");
const BoolExpr = require("boolean_expression");
}
...
IfStatement = "if" _ "[" BoolExpr "]" _ ";" _ "then" LB? CodeBlock "fi"

jodevsa commented Jun 5, 2017

I agree, splitting grammars and making them modular is a great feature, however handling these case's would be a a problem:
1- sub-grammar that relies on a global variable that was defined in the main grammar code ?
2- duplicate variables and grammar name ?

IMO, a temporally convenient approach would be creating a new addon for PEG.js (independent from PEG.js) that defines a keyword for importing (for example @load(anotherGrammarFileLocation) ) keyword should not part of javacsript/peg.js grammar,
build a reg-exp or a peg grammar to detect that keyword and substitute it with "anotherGrammarFile Location" content , and send the substituted code to PEG.js

Example:

integers.pegjs

integers=[0-9]* {return parseInt(text())}

main.pegjs
arrayOfInteger="["(integers ",")* integers"]"
@load("integers.pegjs")

Note using this method, if someone did not define the start grammar, and placed @load before "arrayOfInteger" peg.js will assume the first grammar as the start ( integers grammar)

One approach to handle this is , use same names of filename and start grammar and let the new ad-don manually configure the start attribute as the file name, or substitute all content at the end of file.

user should be responsible of any duplication .

Contributor

andreineculau commented Jun 9, 2017

I just want to highlight that this issue is primarily an optimization request, because composability/modularity is something that you can achieve on your own, especially when you control the full spectrum of the grammar.

If you're not comfortable with a grammar 1k-lines long, then split it up, and concatenate it back as you see fit before pumping it into pegjs.

Mingun added a commit to Mingun/pegjs that referenced this issue Jun 12, 2017

Issue #38 (import feature): Implement base support for import in gram…
…mar and compiler. Import clause syntax:

```
@<alias> = <string with path to .pegjs file>
```
Import clauses expected before initializer code block.

Usage of imported rules:
- @<alias> -- use default parse rule (implementation-defined, for example, first in imported grammar).
- @<alias>:<rule> -- use specified parse rule.

Implementation:
- All import clauses appears in AST `grammar` node in `imports` node property.
  This property contains array of AST `import` nodes with properties `alias` and `path`.
- All AST `rule_ref` nodes now have `namespace` property with alias of imported
  grammar or `null` if rule defined in the grammar itself. Also, if `name` property is
 `null` and `namespace` property is not, then will be used default rule of imported grammar.
- Support in compiler -- skip some checks for references to imported rules.

Mingun added a commit to Mingun/pegjs that referenced this issue Jun 12, 2017

Mingun added a commit to Mingun/pegjs that referenced this issue Jun 12, 2017

@futagoza futagoza closed this Aug 22, 2017

Mingun added a commit to Mingun/pegjs that referenced this issue Oct 10, 2017

Issue #38 (import feature): Implement base support for import in gram…
…mar and compiler. Import clause syntax:

```
@<alias> = <string with path to .pegjs file>
```
Import clauses expected before initializer code block.

Usage of imported rules:
- @<alias> -- use default parse rule (implementation-defined, for example, first in imported grammar).
- @<alias>:<rule> -- use specified parse rule.

Implementation:
- All import clauses appears in AST `grammar` node in `imports` node property.
  This property contains array of AST `import` nodes with properties `alias` and `path`.
- All AST `rule_ref` nodes now have `namespace` property with alias of imported
  grammar or `null` if rule defined in the grammar itself. Also, if `name` property is
 `null` and `namespace` property is not, then will be used default rule of imported grammar.
- Support in compiler -- skip some checks for references to imported rules.

Mingun added a commit to Mingun/pegjs that referenced this issue Oct 10, 2017

Mingun added a commit to Mingun/pegjs that referenced this issue Oct 10, 2017

Mingun added a commit to Mingun/pegjs that referenced this issue Oct 14, 2017

@futagoza futagoza removed this from the post-1.0.0 milestone Nov 29, 2017

Mingun added a commit to Mingun/pegjs that referenced this issue Jan 22, 2018

Issue #38 (import feature): Implement base support for import in gram…
…mar and compiler. Import clause syntax:

```
@<alias> = <string with path to .pegjs file>
```
Import clauses expected before initializer code block.

Usage of imported rules:
- @<alias> -- use default parse rule (implementation-defined, for example, first in imported grammar).
- @<alias>:<rule> -- use specified parse rule.

Implementation:
- All import clauses appears in AST `grammar` node in `imports` node property.
  This property contains array of AST `import` nodes with properties `alias` and `path`.
- All AST `rule_ref` nodes now have `namespace` property with alias of imported
  grammar or `null` if rule defined in the grammar itself. Also, if `name` property is
 `null` and `namespace` property is not, then will be used default rule of imported grammar.
- Support in compiler -- skip some checks for references to imported rules.

Mingun added a commit to Mingun/pegjs that referenced this issue Jan 22, 2018

Mingun added a commit to Mingun/pegjs that referenced this issue Jan 22, 2018

Mingun added a commit to Mingun/pegjs that referenced this issue Jan 22, 2018

Issue #38 (import): Add pass for include (at AST level) one grammar t…
…o another.

Conflicts:
	lib/compiler/index.js
	package.json
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment