Skip to content

ftomassetti/JavaCC2ANTLR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JavaCC2ANTLR

Build Status

JavaCC is an old and venerable tool, used in so many projects. In recent years however ANTLR seems to have a growing community and there are different tools to support ANTLR. Also, ANTLR can be used to generate a parser for so many target languages that are not supported by JavaCC.

So I hacked together this little project, in Kotlin.

For now it basically get a JavaCC grammar and produces a lexer and a parser ANTLR grammar which should hopefully be equivalent.

Generate ANTLR Lexer & Parser

Simply look at the class JavaCCToAntlrConverter. It takes the file name of the JavaCC grammar and outputs a Lexer and a parser Grammar.

Generate an ANTLR in memory

val file = File("src/test/resources/java.jj")
val grammarName = file.nameWithoutExtension.capitalize()

val javaCCGrammar = loadJavaCCGrammar(file)
val antlrGrammar = javaCCGrammar.convertToAntlr(grammarName)
this.genericParser = antlrGrammar.genericParser()
val ast = genericParser.parse("class A { }")

Push/Pop Mode Commands

JavaCC by default does not have a way for tokens to change the token manager lexical state with memory, like ANTLR provides with the pushMode and popMode commands. For example, to parse as a single token a balanced set of parentheses such as ((()) ()) you might have the following JavaCC parser:

TOKEN_MGR_DECLS : {
    static List<Integer> lexicalStateStack = new ArrayList<Integer>();

    static void openParen() {
        lexicalStateStack.add(curLexState);
    }

    static void closeParen() {
        SwitchTo(lexicalStateStack.remove(lexicalStateStack.size() - 1));
    }
}

<DEFAULT, LEVEL1, LEVELN> SKIP : {
    < " " >
}

<LEVELN> MORE : {
    <LPAREN:    "("> { openParen(); }
|   <RPAREN:    ")"> { closeParen(); }
}

MORE : {
    < "(" > { openParen(); } : LEVEL1
}

<LEVEL1> MORE : {
    < "(" > { openParen(); } : LEVELN
}

<LEVEL1> TOKEN : {
    <BALANCED_PARENS: ")" > { closeParen(); } : DEFAULT
}

void Start(): {} { <BALANCED_PARENS> <EOF> }

However, the ANTLR lexer would not behave correctly because we cannot infer when, according to the SwitchTo statements executed as part of the actions, the corresponding ANTLR rules should use mode, pushMode, or popMode commands:

lexer grammar Lexer;

SKIP0 : ' ' -> skip ;
MORE0 : '(' -> more, mode(LEVEL1) ;

mode LEVEL1;
LEVEL1_SKIP0 : SKIP0 -> skip ;
MORE1 : '(' -> more, mode(LEVELN) ;
BALANCED_PARENS : ')' -> mode(DEFAULT_MODE) ;

mode LEVELN;
LEVELN_SKIP0 : SKIP0 -> skip ;
LPAREN : '(' -> more ;
RPAREN : ')' -> more ;  // PROBLEM: Cannot escape this mode!


parser grammar Parser;

options { tokenVocab=Lexer; }

start :  BALANCED_PARENS EOF  ;

In order to handle such actions, you must add the following fields to your TOKEN_MGR_DECLS with values set to the name of your functions that should map to pushMode and popMode commands respectively:

TOKEN_MGR_DECLS : {
    ...
    final static String pushStateFunc = "openParen";
    final static String popStateFunc = "closeParen";
}

Now the lexer gets generated correctly:

SKIP0 : ' ' -> skip ;
MORE0 : '(' -> more, pushMode(LEVEL1) ;

mode LEVEL1;
LEVEL1_SKIP0 : SKIP0 -> skip ;
MORE1 : '(' -> more, pushMode(LEVELN) ;
BALANCED_PARENS : ')' -> popMode ;

mode LEVELN;
LEVELN_SKIP0 : SKIP0 -> skip ;
LPAREN : '(' -> more, pushMode(LEVELN) ;
RPAREN : ')' -> more, popMode ;

Licensing

The project is made available under the Apache Public License V2.0. Please see the file called LICENSE.

About

No description or website provided.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published