My language of choice is Python, but our shop is 90% Oracle PL/SQL. The codebase is huge, scattered across stored procedures (80%), SQL scripts (10%), Oracle Forms (9%) and triggers (1%).
My goals are (in order of possibility):
- generate nice dependency/call graphs with GraphViz - for this I need the function names and boundaries in the source code.
- auto-format source to our coding style - I think this is a faraway target for me...
I could create a stream reader which treats comments (block and line) as needed, and counts with string literals (''), but function boundaries are hard with my simple path with regexes or simple hand-crafted state machines.
I've found then ANTLR3 project, with its nice AntlrWorks 1.4 IDE and a community-provided PLSQL3.g grammar file - it could produce a more-or-less syntactically correct Python parser, which is provided here.
I'll try to use it to find function names and boundaries (esp. recursive declarations seems problematic).
First test shows this is a dead-end (for Python backend): parsing the 7Mb source is terminated after 4m28s and consumed 3Gb memory.
You can experiment it with
cd antlr3 && python PLSQL3Parser.py <somesql.sql
Use PyParsing (1.5.7 atm) - the plsql_parser.py in the root dir uses this, but it is in embryo stage at the moment, just usable for testing purposes!
python plsql_parser.py statement somesql.sql
Good Luck!