Skip to content

ruhank-dev/predictive-parser-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LL(1) Predictive Parser

A complete LL(1) parser pipeline for Context-Free Grammars — left factoring, left recursion removal, FIRST/FOLLOW computation, LL(1) table construction, stack-based predictive parsing, and parse tree generation. All in pure Java with zero dependencies.

CS4031 Compiler Construction · Assignment 02 · Spring 2026 Muhammad Ruhan Kamran · 23i-0062 · Section CS-A


Pipeline

CFG Input File
      │
      ▼
 Left Factoring              removes common prefixes (Expr → a b | a c  becomes  Expr → a Expr')
      │
      ▼
 Left Recursion Removal      eliminates direct + indirect left recursion (Dragon Book algorithm)
      │
      ▼
 FIRST / FOLLOW Sets         standard fixed-point iteration over all non-terminals
      │
      ▼
 LL(1) Parsing Table         M[NT, terminal] → production  (conflicts flagged)
      │
      ▼
 Predictive Stack Parser     token-by-token simulation with panic-mode error recovery
      │
      ▼
 Parse Tree                  built in parallel with the stack; rendered with Unicode box-drawing

Features

Feature Details
Left Factoring Iterative, handles arbitrarily deep common prefixes
Left Recursion Removal Direct + indirect, Dragon Book §4.3 algorithm
FIRST / FOLLOW Fixed-point iteration; correctly handles epsilon chains
LL(1) Parsing Table Conflict detection; reports non-LL(1) grammars
Predictive Parser Stack-based; full step-by-step trace
Error Recovery Panic mode — two strategies (pop on FOLLOW, skip to FIRST/FOLLOW)
Parse Tree Unicode box-drawing; preorder traversal; yield extraction
Output Files All results written to output/ automatically

Example: Arithmetic Expression Grammar

Input grammar (input/grammar2.txt):

Expr   -> Expr + Term | Term
Term   -> Term * Factor | Factor
Factor -> ( Expr ) | id

This grammar has left recursion (Expr -> Expr + Term) — it cannot be parsed by any top-down parser as-is.

After left recursion removal:

Expr      -> Term ExprPrime
Term      -> Factor TermPrime
Factor    -> ( Expr ) | id
ExprPrime -> + Term ExprPrime | epsilon
TermPrime -> * Factor TermPrime | epsilon

The grammar is now LL(1). The parser correctly handles operator precedence (* before +) and associativity through the structure of the production rules.

Grammar transformation output


FIRST and FOLLOW Sets

FIRST and FOLLOW sets drive every entry in the parsing table.

Non-Terminal FIRST FOLLOW
Expr (, id $, )
Term (, id +, $, )
Factor (, id *, +, $, )
ExprPrime +, ε $, )
TermPrime *, ε +, $, )

FIRST and FOLLOW sets output


LL(1) Parsing Table

Each cell M[NT, terminal] holds the production to apply. Empty cells are error entries. Multiple entries in one cell indicate an LL(1) conflict.

LL(1) parsing table


Parsing Trace

The parser logs every step: the current stack contents, the remaining input, and the action taken (match, expand, or error + recovery).

Input: id + id * id

Step   Stack (bottom → top)           Remaining Input        Action
──────────────────────────────────────────────────────────────────────────
1      $ Expr                         id + id * id $         Expr -> Term ExprPrime
2      $ ExprPrime Term               id + id * id $         Term -> Factor TermPrime
3      $ ExprPrime TermPrime Factor   id + id * id $         Factor -> id
4      $ ExprPrime TermPrime id       id + id * id $         Match 'id'
5      $ ExprPrime TermPrime          + id * id $            TermPrime -> epsilon
6      $ ExprPrime                    + id * id $            ExprPrime -> + Term ExprPrime
...
17     $                              $                      ACCEPT

Full parsing trace for multiple inputs


Parse Tree

For every accepted string the parser builds a complete parse tree rendered with Unicode box-drawing characters.

Input: ( id + id ) * id

Expr
├── Term
│   ├── Factor
│   │   ├── (
│   │   ├── Expr
│   │   │   ├── Term
│   │   │   │   ├── Factor
│   │   │   │   │   └── id
│   │   │   │   └── TermPrime
│   │   │   │       └── epsilon
│   │   │   └── ExprPrime
│   │   │       ├── +
│   │   │       ├── Term
│   │   │       │   └── ...
│   │   │       └── ExprPrime
│   │   │           └── epsilon
│   │   └── )
│   └── TermPrime
│       ├── *
│       ├── Factor
│       │   └── id
│       └── TermPrime
│           └── epsilon
└── ExprPrime
    └── epsilon

Parse tree output


Error Recovery

When no table entry exists for M[NT, token], the parser uses panic mode with two strategies:

Strategy A — Pop: if the current token is in FOLLOW(NT), pop the non-terminal and continue.

Strategy B — Skip: scan forward past tokens until FIRST(NT) or FOLLOW(NT) is found, then retry or pop.

Terminal mismatches are handled by deletion recovery — the expected terminal is popped from the stack.


Project Structure

LL1Parser/
├── src/
│   ├── Main.java            # Entry point — orchestrates all 6 steps
│   ├── Grammar.java         # CFG reader, left factoring, left recursion removal
│   ├── FirstFollow.java     # FIRST and FOLLOW set computation
│   ├── ParsingTable.java    # LL(1) table construction + conflict detection
│   ├── Parser.java          # Predictive stack parser + trace logging
│   ├── Tree.java            # Parse tree node + Unicode renderer
│   ├── Stack.java           # Generic stack (ArrayList-backed)
│   └── ErrorHandler.java    # Panic mode recovery + error reporting
├── input/
│   ├── grammar1.txt         # Simple grammar (no recursion)
│   ├── grammar2.txt         # Arithmetic expressions (left recursive)
│   ├── grammar3.txt         # If-else with dangling-else ambiguity
│   ├── grammar4.txt         # More complex grammar
│   ├── *_valid.txt          # Valid input strings per grammar
│   ├── *_errors.txt         # Inputs with syntactic errors
│   └── *_edge.txt           # Edge cases (empty input, single token, etc.)
└── output/                  # Auto-generated on each run
    ├── grammar_transformed.txt
    ├── first_follow_sets.txt
    ├── parsing_table.txt
    ├── parsing_trace.txt
    └── parse_trees.txt

Running the Project

Requirements: Java JDK 8+

Terminal

# Compile
javac -d bin src/*.java

# Run with any grammar + input pair
java -cp bin Main input/grammar2.txt input/grammar2_valid.txt

Output files are written to output/ automatically after every run.

Eclipse

  1. File → Import → General → Existing Projects into Workspace → select LL1Parser/
  2. Right-click Main.java → Run As → Run Configurations
  3. Arguments tab → Program arguments:
    input/grammar2.txt  input/grammar2_valid.txt
    
  4. Working Directory → Other → Workspace → LL1Parser
  5. Click Apply → Run

Grammar / Input File Pairs

Grammar Description Valid Input Error Input
grammar1.txt Simple, no recursion grammar1_valid.txt grammar1_errors.txt
grammar2.txt Arithmetic expressions grammar2_valid.txt grammar2_errors.txt
grammar2.txt Edge cases grammar2_edge.txt
grammar3.txt If-else grammar3_valid.txt grammar3_errors.txt
grammar4.txt Complex grammar grammar4_valid.txt grammar4_errors.txt

Grammar File Format

# Lines starting with # are comments

Expr   -> Expr + Term | Term
Term   -> Term * Factor | Factor
Factor -> ( Expr ) | id | epsilon

Rules:

  • Non-terminals must start with an uppercase letter
  • Alternatives separated by |
  • Use epsilon or @ for the empty string
  • Multiple lines for the same non-terminal are merged automatically

Output Files

All five output files are written to output/ on every run:

File Contents
grammar_transformed.txt Grammar after left factoring + left recursion removal
first_follow_sets.txt FIRST and FOLLOW sets for all non-terminals
parsing_table.txt Full LL(1) table; flags conflicts
parsing_trace.txt Step-by-step parse trace for every input string
parse_trees.txt Unicode parse trees for all accepted strings

Pre-generated outputs for all grammars are in output/grammar1/, output/grammar2/, etc.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages