A complete LL(1) parser pipeline for Context-Free Grammars — left factoring, left recursion removal, FIRST/FOLLOW computation, LL(1) table construction, stack-based predictive parsing, and parse tree generation. All in pure Java with zero dependencies.
CS4031 Compiler Construction · Assignment 02 · Spring 2026
Muhammad Ruhan Kamran · 23i-0062 · Section CS-A
CFG Input File
│
▼
Left Factoring removes common prefixes (Expr → a b | a c becomes Expr → a Expr')
│
▼
Left Recursion Removal eliminates direct + indirect left recursion (Dragon Book algorithm)
│
▼
FIRST / FOLLOW Sets standard fixed-point iteration over all non-terminals
│
▼
LL(1) Parsing Table M[NT, terminal] → production (conflicts flagged)
│
▼
Predictive Stack Parser token-by-token simulation with panic-mode error recovery
│
▼
Parse Tree built in parallel with the stack; rendered with Unicode box-drawing
| Feature | Details |
|---|---|
| Left Factoring | Iterative, handles arbitrarily deep common prefixes |
| Left Recursion Removal | Direct + indirect, Dragon Book §4.3 algorithm |
| FIRST / FOLLOW | Fixed-point iteration; correctly handles epsilon chains |
| LL(1) Parsing Table | Conflict detection; reports non-LL(1) grammars |
| Predictive Parser | Stack-based; full step-by-step trace |
| Error Recovery | Panic mode — two strategies (pop on FOLLOW, skip to FIRST/FOLLOW) |
| Parse Tree | Unicode box-drawing; preorder traversal; yield extraction |
| Output Files | All results written to output/ automatically |
Input grammar (input/grammar2.txt):
Expr -> Expr + Term | Term
Term -> Term * Factor | Factor
Factor -> ( Expr ) | id
This grammar has left recursion (Expr -> Expr + Term) — it cannot be parsed by any top-down parser as-is.
After left recursion removal:
Expr -> Term ExprPrime
Term -> Factor TermPrime
Factor -> ( Expr ) | id
ExprPrime -> + Term ExprPrime | epsilon
TermPrime -> * Factor TermPrime | epsilon
The grammar is now LL(1). The parser correctly handles operator precedence (* before +) and associativity through the structure of the production rules.
FIRST and FOLLOW sets drive every entry in the parsing table.
| Non-Terminal | FIRST | FOLLOW |
|---|---|---|
| Expr | (, id |
$, ) |
| Term | (, id |
+, $, ) |
| Factor | (, id |
*, +, $, ) |
| ExprPrime | +, ε |
$, ) |
| TermPrime | *, ε |
+, $, ) |
Each cell M[NT, terminal] holds the production to apply. Empty cells are error entries. Multiple entries in one cell indicate an LL(1) conflict.
The parser logs every step: the current stack contents, the remaining input, and the action taken (match, expand, or error + recovery).
Input: id + id * id
Step Stack (bottom → top) Remaining Input Action
──────────────────────────────────────────────────────────────────────────
1 $ Expr id + id * id $ Expr -> Term ExprPrime
2 $ ExprPrime Term id + id * id $ Term -> Factor TermPrime
3 $ ExprPrime TermPrime Factor id + id * id $ Factor -> id
4 $ ExprPrime TermPrime id id + id * id $ Match 'id'
5 $ ExprPrime TermPrime + id * id $ TermPrime -> epsilon
6 $ ExprPrime + id * id $ ExprPrime -> + Term ExprPrime
...
17 $ $ ACCEPT
For every accepted string the parser builds a complete parse tree rendered with Unicode box-drawing characters.
Input: ( id + id ) * id
Expr
├── Term
│ ├── Factor
│ │ ├── (
│ │ ├── Expr
│ │ │ ├── Term
│ │ │ │ ├── Factor
│ │ │ │ │ └── id
│ │ │ │ └── TermPrime
│ │ │ │ └── epsilon
│ │ │ └── ExprPrime
│ │ │ ├── +
│ │ │ ├── Term
│ │ │ │ └── ...
│ │ │ └── ExprPrime
│ │ │ └── epsilon
│ │ └── )
│ └── TermPrime
│ ├── *
│ ├── Factor
│ │ └── id
│ └── TermPrime
│ └── epsilon
└── ExprPrime
└── epsilon
When no table entry exists for M[NT, token], the parser uses panic mode with two strategies:
Strategy A — Pop: if the current token is in FOLLOW(NT), pop the non-terminal and continue.
Strategy B — Skip: scan forward past tokens until FIRST(NT) or FOLLOW(NT) is found, then retry or pop.
Terminal mismatches are handled by deletion recovery — the expected terminal is popped from the stack.
LL1Parser/
├── src/
│ ├── Main.java # Entry point — orchestrates all 6 steps
│ ├── Grammar.java # CFG reader, left factoring, left recursion removal
│ ├── FirstFollow.java # FIRST and FOLLOW set computation
│ ├── ParsingTable.java # LL(1) table construction + conflict detection
│ ├── Parser.java # Predictive stack parser + trace logging
│ ├── Tree.java # Parse tree node + Unicode renderer
│ ├── Stack.java # Generic stack (ArrayList-backed)
│ └── ErrorHandler.java # Panic mode recovery + error reporting
├── input/
│ ├── grammar1.txt # Simple grammar (no recursion)
│ ├── grammar2.txt # Arithmetic expressions (left recursive)
│ ├── grammar3.txt # If-else with dangling-else ambiguity
│ ├── grammar4.txt # More complex grammar
│ ├── *_valid.txt # Valid input strings per grammar
│ ├── *_errors.txt # Inputs with syntactic errors
│ └── *_edge.txt # Edge cases (empty input, single token, etc.)
└── output/ # Auto-generated on each run
├── grammar_transformed.txt
├── first_follow_sets.txt
├── parsing_table.txt
├── parsing_trace.txt
└── parse_trees.txt
Requirements: Java JDK 8+
# Compile
javac -d bin src/*.java
# Run with any grammar + input pair
java -cp bin Main input/grammar2.txt input/grammar2_valid.txtOutput files are written to output/ automatically after every run.
- File → Import → General → Existing Projects into Workspace → select
LL1Parser/ - Right-click
Main.java→ Run As → Run Configurations - Arguments tab → Program arguments:
input/grammar2.txt input/grammar2_valid.txt - Working Directory → Other → Workspace →
LL1Parser - Click Apply → Run
| Grammar | Description | Valid Input | Error Input |
|---|---|---|---|
grammar1.txt |
Simple, no recursion | grammar1_valid.txt |
grammar1_errors.txt |
grammar2.txt |
Arithmetic expressions | grammar2_valid.txt |
grammar2_errors.txt |
grammar2.txt |
Edge cases | grammar2_edge.txt |
— |
grammar3.txt |
If-else | grammar3_valid.txt |
grammar3_errors.txt |
grammar4.txt |
Complex grammar | grammar4_valid.txt |
grammar4_errors.txt |
# Lines starting with # are comments
Expr -> Expr + Term | Term
Term -> Term * Factor | Factor
Factor -> ( Expr ) | id | epsilon
Rules:
- Non-terminals must start with an uppercase letter
- Alternatives separated by
| - Use
epsilonor@for the empty string - Multiple lines for the same non-terminal are merged automatically
All five output files are written to output/ on every run:
| File | Contents |
|---|---|
grammar_transformed.txt |
Grammar after left factoring + left recursion removal |
first_follow_sets.txt |
FIRST and FOLLOW sets for all non-terminals |
parsing_table.txt |
Full LL(1) table; flags conflicts |
parsing_trace.txt |
Step-by-step parse trace for every input string |
parse_trees.txt |
Unicode parse trees for all accepted strings |
Pre-generated outputs for all grammars are in output/grammar1/, output/grammar2/, etc.




