LL(1) Predictive Parser

A complete LL(1) parser pipeline for Context-Free Grammars — left factoring, left recursion removal, FIRST/FOLLOW computation, LL(1) table construction, stack-based predictive parsing, and parse tree generation. All in pure Java with zero dependencies.

CS4031 Compiler Construction · Assignment 02 · Spring 2026 Muhammad Ruhan Kamran · 23i-0062 · Section CS-A

Pipeline

CFG Input File
      │
      ▼
 Left Factoring              removes common prefixes (Expr → a b | a c  becomes  Expr → a Expr')
      │
      ▼
 Left Recursion Removal      eliminates direct + indirect left recursion (Dragon Book algorithm)
      │
      ▼
 FIRST / FOLLOW Sets         standard fixed-point iteration over all non-terminals
      │
      ▼
 LL(1) Parsing Table         M[NT, terminal] → production  (conflicts flagged)
      │
      ▼
 Predictive Stack Parser     token-by-token simulation with panic-mode error recovery
      │
      ▼
 Parse Tree                  built in parallel with the stack; rendered with Unicode box-drawing

Features

Feature	Details
Left Factoring	Iterative, handles arbitrarily deep common prefixes
Left Recursion Removal	Direct + indirect, Dragon Book §4.3 algorithm
FIRST / FOLLOW	Fixed-point iteration; correctly handles epsilon chains
LL(1) Parsing Table	Conflict detection; reports non-LL(1) grammars
Predictive Parser	Stack-based; full step-by-step trace
Error Recovery	Panic mode — two strategies (pop on FOLLOW, skip to FIRST/FOLLOW)
Parse Tree	Unicode box-drawing; preorder traversal; yield extraction
Output Files	All results written to `output/` automatically

Example: Arithmetic Expression Grammar

Input grammar (input/grammar2.txt):

Expr   -> Expr + Term | Term
Term   -> Term * Factor | Factor
Factor -> ( Expr ) | id

This grammar has left recursion (Expr -> Expr + Term) — it cannot be parsed by any top-down parser as-is.

After left recursion removal:

Expr      -> Term ExprPrime
Term      -> Factor TermPrime
Factor    -> ( Expr ) | id
ExprPrime -> + Term ExprPrime | epsilon
TermPrime -> * Factor TermPrime | epsilon

The grammar is now LL(1). The parser correctly handles operator precedence (* before +) and associativity through the structure of the production rules.

FIRST and FOLLOW Sets

FIRST and FOLLOW sets drive every entry in the parsing table.

Non-Terminal	FIRST	FOLLOW
Expr	`(`, `id`	`$`, `)`
Term	`(`, `id`	`+`, `$`, `)`
Factor	`(`, `id`	`*`, `+`, `$`, `)`
ExprPrime	`+`, `ε`	`$`, `)`
TermPrime	`*`, `ε`	`+`, `$`, `)`

LL(1) Parsing Table

Each cell M[NT, terminal] holds the production to apply. Empty cells are error entries. Multiple entries in one cell indicate an LL(1) conflict.

Parsing Trace

The parser logs every step: the current stack contents, the remaining input, and the action taken (match, expand, or error + recovery).

Input: id + id * id

Step   Stack (bottom → top)           Remaining Input        Action
──────────────────────────────────────────────────────────────────────────
1      $ Expr                         id + id * id $         Expr -> Term ExprPrime
2      $ ExprPrime Term               id + id * id $         Term -> Factor TermPrime
3      $ ExprPrime TermPrime Factor   id + id * id $         Factor -> id
4      $ ExprPrime TermPrime id       id + id * id $         Match 'id'
5      $ ExprPrime TermPrime          + id * id $            TermPrime -> epsilon
6      $ ExprPrime                    + id * id $            ExprPrime -> + Term ExprPrime
...
17     $                              $                      ACCEPT

Parse Tree

For every accepted string the parser builds a complete parse tree rendered with Unicode box-drawing characters.

Input: ( id + id ) * id

Expr
├── Term
│   ├── Factor
│   │   ├── (
│   │   ├── Expr
│   │   │   ├── Term
│   │   │   │   ├── Factor
│   │   │   │   │   └── id
│   │   │   │   └── TermPrime
│   │   │   │       └── epsilon
│   │   │   └── ExprPrime
│   │   │       ├── +
│   │   │       ├── Term
│   │   │       │   └── ...
│   │   │       └── ExprPrime
│   │   │           └── epsilon
│   │   └── )
│   └── TermPrime
│       ├── *
│       ├── Factor
│       │   └── id
│       └── TermPrime
│           └── epsilon
└── ExprPrime
    └── epsilon

Error Recovery

When no table entry exists for M[NT, token], the parser uses panic mode with two strategies:

Strategy A — Pop: if the current token is in FOLLOW(NT), pop the non-terminal and continue.

Strategy B — Skip: scan forward past tokens until FIRST(NT) or FOLLOW(NT) is found, then retry or pop.

Terminal mismatches are handled by deletion recovery — the expected terminal is popped from the stack.

Project Structure

LL1Parser/
├── src/
│   ├── Main.java            # Entry point — orchestrates all 6 steps
│   ├── Grammar.java         # CFG reader, left factoring, left recursion removal
│   ├── FirstFollow.java     # FIRST and FOLLOW set computation
│   ├── ParsingTable.java    # LL(1) table construction + conflict detection
│   ├── Parser.java          # Predictive stack parser + trace logging
│   ├── Tree.java            # Parse tree node + Unicode renderer
│   ├── Stack.java           # Generic stack (ArrayList-backed)
│   └── ErrorHandler.java    # Panic mode recovery + error reporting
├── input/
│   ├── grammar1.txt         # Simple grammar (no recursion)
│   ├── grammar2.txt         # Arithmetic expressions (left recursive)
│   ├── grammar3.txt         # If-else with dangling-else ambiguity
│   ├── grammar4.txt         # More complex grammar
│   ├── *_valid.txt          # Valid input strings per grammar
│   ├── *_errors.txt         # Inputs with syntactic errors
│   └── *_edge.txt           # Edge cases (empty input, single token, etc.)
└── output/                  # Auto-generated on each run
    ├── grammar_transformed.txt
    ├── first_follow_sets.txt
    ├── parsing_table.txt
    ├── parsing_trace.txt
    └── parse_trees.txt

Running the Project

Requirements: Java JDK 8+

Terminal

# Compile
javac -d bin src/*.java

# Run with any grammar + input pair
java -cp bin Main input/grammar2.txt input/grammar2_valid.txt

Output files are written to output/ automatically after every run.

Eclipse

File → Import → General → Existing Projects into Workspace → select LL1Parser/
Right-click Main.java → Run As → Run Configurations

Arguments tab → Program arguments:

input/grammar2.txt  input/grammar2_valid.txt

Working Directory → Other → Workspace → LL1Parser
Click Apply → Run

Grammar / Input File Pairs

Grammar	Description	Valid Input	Error Input
`grammar1.txt`	Simple, no recursion	`grammar1_valid.txt`	`grammar1_errors.txt`
`grammar2.txt`	Arithmetic expressions	`grammar2_valid.txt`	`grammar2_errors.txt`
`grammar2.txt`	Edge cases	`grammar2_edge.txt`	—
`grammar3.txt`	If-else	`grammar3_valid.txt`	`grammar3_errors.txt`
`grammar4.txt`	Complex grammar	`grammar4_valid.txt`	`grammar4_errors.txt`

Grammar File Format

# Lines starting with # are comments

Expr   -> Expr + Term | Term
Term   -> Term * Factor | Factor
Factor -> ( Expr ) | id | epsilon

Rules:

Non-terminals must start with an uppercase letter
Alternatives separated by |
Use epsilon or @ for the empty string
Multiple lines for the same non-terminal are merged automatically

Output Files

All five output files are written to output/ on every run:

File	Contents
`grammar_transformed.txt`	Grammar after left factoring + left recursion removal
`first_follow_sets.txt`	FIRST and FOLLOW sets for all non-terminals
`parsing_table.txt`	Full LL(1) table; flags conflicts
`parsing_trace.txt`	Step-by-step parse trace for every input string
`parse_trees.txt`	Unicode parse trees for all accepted strings

Pre-generated outputs for all grammars are in output/grammar1/, output/grammar2/, etc.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
docs		docs
input		input
output		output
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LL(1) Predictive Parser

Pipeline

Features

Example: Arithmetic Expression Grammar

FIRST and FOLLOW Sets

LL(1) Parsing Table

Parsing Trace

Parse Tree

Error Recovery

Project Structure

Running the Project

Terminal

Eclipse

Grammar / Input File Pairs

Grammar File Format

Output Files

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LL(1) Predictive Parser

Pipeline

Features

Example: Arithmetic Expression Grammar

FIRST and FOLLOW Sets

LL(1) Parsing Table

Parsing Trace

Parse Tree

Error Recovery

Project Structure

Running the Project

Terminal

Eclipse

Grammar / Input File Pairs

Grammar File Format

Output Files

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages