A JSON parser built from scratch in Go.
smol-parser/
├── main.go # Core parser implementation
├── main_test.go # Comprehensive test suite
├── go.mod # Go module file
└── README.md # This file
The lexer breaks the input string into tokens. It's like reading words in a sentence.
Key Concepts:
- Token: A meaningful unit (e.g.,
{,"hello",123,true) - Scanning: Reading one character at a time
- Lookahead: Peeking at the next character without consuming it
type Lexer struct {
input string // The JSON string to parse
pos int // Current position in input
ch byte // Current character
}How it works:
readChar()advances to the next characterskipWhitespace()ignores spaces, tabs, newlinesNextToken()identifies and returns the next token
Example:
Input: {"name": "John"}
Tokens: { → STRING("name") → : → STRING("John") → }
The parser takes tokens and builds data structures. It uses recursive descent parsing.
Key Concepts:
- Recursive descent: Each grammar rule becomes a function
- Current token: The token we're looking at
- Advance: Move to the next token
type Parser struct {
lexer *Lexer
curToken Token // Current token being examined
}Grammar Rules (simplified):
Value → Object | Array | String | Number | Boolean | Null
Object → { } | { Members }
Array → [ ] | [ Elements ]
How it works:
parseValue()decides what type of value to parseparseObject()handles{...}structuresparseArray()handles[...]structures
Let's trace: {"name": "John", "age": 30}
1. parseValue() sees { → calls parseObject()
2. parseObject():
- Advance past {
- See "name" (string key)
- Advance, expect :
- Call parseValue() → returns "John"
- Store in map: {"name": "John"}
- See , → continue
- See "age" (string key)
- Advance, expect :
- Call parseValue() → returns 30.0
- Store in map: {"name": "John", "age": 30.0}
- See } → return map
Handles escape sequences:
\"→ quote\\→ backslash\n→ newline\uXXXX→ Unicode character
// Input: "Hello\nWorld"
// Output: Hello
// WorldSupports full JSON number spec:
- Integers:
123,-456 - Decimals:
123.456 - Scientific:
1.5e-10,1E+10
Errors are returned with context:
return nil, fmt.Errorf("expected colon after key")mkdir smol-parser
cd smol-parser
go mod init github.com/smol-go/smol-parserpackage main
import (
"fmt"
"log"
)
func main() {
jsonStr := `{"name": "Alice", "age": 30}`
result, err := Parse(jsonStr)
if err != nil {
log.Fatal(err)
}
// Result is map[string]interface{}
obj := result.(map[string]interface{})
fmt.Println(obj["name"]) // Alice
fmt.Println(obj["age"]) // 30
}go test -vgo test -bench=.Problem: How to handle \n, \t, \uXXXX?
Solution: Switch statement in readString() with special handling for Unicode escapes. Read 4 hex digits, parse to int, convert to rune.
Problem: JSON numbers can be complex: -123.456e-10
Solution: State machine approach:
- Optional minus
- Integer part (0 or 1-9 followed by digits)
- Optional decimal point + digits
- Optional exponent (e/E, optional +/-, digits)
Problem: Objects and arrays can contain each other infinitely
Solution: Recursive descent - parseValue() calls parseObject() which calls parseValue() again for nested values.
Problem: How to report meaningful errors?
Solution: Track position in token, return descriptive error messages with context.
| JSON Type | Go Type |
|---|---|
| object | map[string]interface{} |
| array | []interface{} |
| string | string |
| number | float64 |
| boolean | bool |
| null | nil |
- Performance: This parser prioritizes clarity over speed
- Numbers: All numbers become float64 (JSON spec doesn't distinguish int/float)
- Big Numbers: Very large integers may lose precision
- Memory: Large JSON files load entirely into memory
Concepts Demonstrated:
- Lexical analysis
- Recursive descent parsing
- State machines
- Go interfaces (
interface{}for dynamic types) - Error handling patterns
- Table-driven tests
Next Steps:
- Add streaming parser (don't load entire file)
- Support custom struct unmarshaling
- Add JSON schema validation
- Implement JSON pointer (RFC 6901)
- Pretty printing/formatting
The test suite covers:
- All JSON primitive types
- Nested structures
- Edge cases (empty arrays/objects)
- Error conditions
- Whitespace handling
- Escape sequences
- Benchmarks
Run specific tests:
go test -run TestParseString
go test -run TestParseObject
go test -bench=BenchmarkParseArray- JSON Specification (RFC 8259)
- Recursive Descent Parsing
- Go's official
encoding/jsonpackage (for comparison)