Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Introduce a non-recursive JSON parser
This parser uses an explicit prediction stack, unlike the present recursive descent parser where the parser state is represented on the call stack. This difference makes the new parser suitable for use in incremental parsing of huge JSON documents that cannot be conveniently handled piece-wise by the recursive descent parser. One potential use for this will be in parsing large backup manifests associated with incremental backups. Because this parser is somewhat slower than the recursive descent parser, it is not replacing that parser, but is an additional parser available to callers. For testing purposes, if the build is done with -DFORCE_JSON_PSTACK, all JSON parsing is done with the non-recursive parser, in which case only trivial regression differences in error messages should be observed. Author: Andrew Dunstan Reviewed-By: Jacob Champion Discussion: https://postgr.es/m/7b0a51d6-0d9d-7366-3a1a-f74397a02f55@dunslane.net
- Loading branch information
Showing
16 changed files
with
21,563 additions
and
9 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
|
||
PGFILEDESC = "standalone json parser tester" | ||
PGAPPICON = win32 | ||
|
||
TAP_TESTS = 1 | ||
|
||
OBJS = test_json_parser_incremental.o test_json_parser_perf.o | ||
|
||
ifdef USE_PGXS | ||
PG_CONFIG = pg_config | ||
PGXS := $(shell $(PG_CONFIG) --pgxs) | ||
include $(PGXS) | ||
else | ||
subdir = src/test/modules/test_json_parser | ||
top_builddir = ../../../.. | ||
include $(top_builddir)/src/Makefile.global | ||
include $(top_srcdir)/contrib/contrib-global.mk | ||
endif | ||
|
||
all: test_json_parser_incremental$(X) test_json_parser_perf$(X) | ||
|
||
%.o: $(top_srcdir)/$(subdir)/%.c | ||
|
||
PARSER_LIBS = $(top_builddir)/src/common/libpgcommon.a $(top_builddir)/src/port/libpgport.a | ||
|
||
test_json_parser_incremental$(X): test_json_parser_incremental.o $(PARSER_LIBS) | ||
$(CC) $(CFLAGS) $^ -o $@ | ||
|
||
test_json_parser_perf$(X): test_json_parser_perf.o $(PARSER_LIBS) | ||
$(CC) $(CFLAGS) $^ -o $@ | ||
|
||
speed-check: test_json_parser_perf$(X) | ||
@echo Standard parser: | ||
time ./test_json_parser_perf 10000 $(top_srcdir)/$(subdir)/tiny.json | ||
@echo Incremental parser: | ||
time ./test_json_parser_perf -i 10000 $(top_srcdir)/$(subdir)/tiny.json |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
Module `test_json_parser` | ||
========================= | ||
|
||
This module contains two programs for testing the json parsers. | ||
|
||
- `test_json_parser_incremental` is for testing the incremental parser, It | ||
reads in a file and pases it in very small chunks (60 bytes at a time) to | ||
the incremental parser. It's not meant to be a speed test but to test the | ||
accuracy of the incremental parser. It takes one argument: the name of the | ||
input file. | ||
- `test_json_parser_perf` is for speed testing both the standard | ||
recursive descent parser and the non-recursive incremental | ||
parser. If given the `-i` flag it uses the non-recursive parser, | ||
otherwise the stardard parser. The remaining flags are the number of | ||
parsing iterations and the file containing the input. Even when | ||
using the non-recursive parser, the input is passed to the parser in a | ||
single chunk. The results are thus comparable to those of the | ||
standard parser. | ||
|
||
The easiest way to use these is to run `make check` and `make speed-check` | ||
|
||
The sample input file is a small extract from a list of `delicious` | ||
bookmarks taken some years ago, all wrapped in a single json | ||
array. 10,000 iterations of parsing this file gives a reasonable | ||
benchmark, and that is what the `speed-check` target does. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
# Copyright (c) 2024, PostgreSQL Global Development Group | ||
|
||
test_json_parser_incremental_sources = files( | ||
'test_json_parser_incremental.c', | ||
) | ||
|
||
if host_system == 'windows' | ||
test_json_parser_incremental_sources += rc_bin_gen.process(win32ver_rc, extra_args: [ | ||
'--NAME', 'test_json_parser_incremental', | ||
'--FILEDESC', 'standalone json parser tester', | ||
]) | ||
endif | ||
|
||
test_json_parser_incremental = executable('test_json_parser_incremental', | ||
test_json_parser_incremental_sources, | ||
dependencies: [frontend_code], | ||
kwargs: default_bin_args + { | ||
'install': false, | ||
}, | ||
) | ||
|
||
test_json_parser_perf_sources = files( | ||
'test_json_parser_perf.c', | ||
) | ||
|
||
if host_system == 'windows' | ||
test_json_parser_perf_sources += rc_bin_gen.process(win32ver_rc, extra_args: [ | ||
'--NAME', 'test_json_parser_perf', | ||
'--FILEDESC', 'standalone json parser tester', | ||
]) | ||
endif | ||
|
||
test_json_parser_perf = executable('test_json_parser_perf', | ||
test_json_parser_perf_sources, | ||
dependencies: [frontend_code], | ||
kwargs: default_bin_args + { | ||
'install': false, | ||
}, | ||
) | ||
|
||
tests += { | ||
'name': 'test_json_parser', | ||
'sd': meson.current_source_dir(), | ||
'bd': meson.current_build_dir(), | ||
'tap': { | ||
'tests': [ | ||
't/001_test_json_parser_incremental.pl', | ||
't/002_inline.pl', | ||
't/003_test_semantic.pl' | ||
], | ||
}, | ||
} |
23 changes: 23 additions & 0 deletions
23
src/test/modules/test_json_parser/t/001_test_json_parser_incremental.pl
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
|
||
use strict; | ||
use warnings; | ||
|
||
use PostgreSQL::Test::Utils; | ||
use Test::More; | ||
use FindBin; | ||
|
||
use File::Temp qw(tempfile); | ||
|
||
my $test_file = "$FindBin::RealBin/../tiny.json"; | ||
|
||
my $exe = "test_json_parser_incremental"; | ||
|
||
for (my $size = 64; $size > 0; $size--) | ||
{ | ||
my ($stdout, $stderr) = run_command( [$exe, "-c", $size, $test_file] ); | ||
|
||
like($stdout, qr/SUCCESS/, "chunk size $size: test succeeds"); | ||
is($stderr, "", "chunk size $size: no error output"); | ||
} | ||
|
||
done_testing(); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
use strict; | ||
use warnings; | ||
|
||
use PostgreSQL::Test::Utils; | ||
use Test::More; | ||
|
||
use File::Temp qw(tempfile); | ||
|
||
sub test | ||
{ | ||
local $Test::Builder::Level = $Test::Builder::Level + 1; | ||
|
||
my ($name, $json, %params) = @_; | ||
my $exe = "test_json_parser_incremental"; | ||
my $chunk = length($json); | ||
|
||
if ($chunk > 64) | ||
{ | ||
$chunk = 64; | ||
} | ||
|
||
my ($fh, $fname) = tempfile(UNLINK => 1); | ||
print $fh "$json"; | ||
close($fh); | ||
|
||
foreach my $size (reverse(1..$chunk)) | ||
{ | ||
my ($stdout, $stderr) = run_command( [$exe, "-c", $size, $fname] ); | ||
|
||
if (defined($params{error})) | ||
{ | ||
unlike($stdout, qr/SUCCESS/, "$name, chunk size $size: test fails"); | ||
like($stderr, $params{error}, "$name, chunk size $size: correct error output"); | ||
} | ||
else | ||
{ | ||
like($stdout, qr/SUCCESS/, "$name, chunk size $size: test succeeds"); | ||
is($stderr, "", "$name, chunk size $size: no error output"); | ||
} | ||
} | ||
} | ||
|
||
test("number", "12345"); | ||
test("string", '"hello"'); | ||
test("false", "false"); | ||
test("true", "true"); | ||
test("null", "null"); | ||
test("empty object", "{}"); | ||
test("empty array", "[]"); | ||
test("array with number", "[12345]"); | ||
test("array with numbers", "[12345,67890]"); | ||
test("array with null", "[null]"); | ||
test("array with string", '["hello"]'); | ||
test("array with boolean", '[false]'); | ||
test("single pair", '{"key": "value"}'); | ||
test("heavily nested array", "[" x 3200 . "]" x 3200); | ||
test("serial escapes", '"\\\\\\\\\\\\\\\\"'); | ||
test("interrupted escapes", '"\\\\\\"\\\\\\\\\\"\\\\"'); | ||
test("whitespace", ' "" '); | ||
|
||
test("unclosed empty object", "{", error => qr/input string ended unexpectedly/); | ||
test("bad key", "{{", error => qr/Expected string or "}", but found "\{"/); | ||
test("bad key", "{{}", error => qr/Expected string or "}", but found "\{"/); | ||
test("numeric key", "{1234: 2}", error => qr/Expected string or "}", but found "1234"/); | ||
test("second numeric key", '{"a": "a", 1234: 2}', error => qr/Expected string, but found "1234"/); | ||
test("unclosed object with pair", '{"key": "value"', error => qr/input string ended unexpectedly/); | ||
test("missing key value", '{"key": }', error => qr/Expected JSON value, but found "}"/); | ||
test("missing colon", '{"key" 12345}', error => qr/Expected ":", but found "12345"/); | ||
test("missing comma", '{"key": 12345 12345}', error => qr/Expected "," or "}", but found "12345"/); | ||
test("overnested array", "[" x 6401, error => qr/maximum permitted depth is 6400/); | ||
test("overclosed array", "[]]", error => qr/Expected end of input, but found "]"/); | ||
test("unexpected token in array", "[ }}} ]", error => qr/Expected array element or "]", but found "}"/); | ||
test("junk punctuation", "[ ||| ]", error => qr/Token "|" is invalid/); | ||
test("missing comma in array", "[123 123]", error => qr/Expected "," or "]", but found "123"/); | ||
test("misspelled boolean", "tru", error => qr/Token "tru" is invalid/); | ||
test("misspelled boolean in array", "[tru]", error => qr/Token "tru" is invalid/); | ||
test("smashed top-level scalar", "12zz", error => qr/Token "12zz" is invalid/); | ||
test("smashed scalar in array", "[12zz]", error => qr/Token "12zz" is invalid/); | ||
test("unknown escape sequence", '"hello\vworld"', error => qr/Escape sequence "\\v" is invalid/); | ||
test("unescaped control", "\"hello\tworld\"", error => qr/Character with value 0x09 must be escaped/); | ||
test("incorrect escape count", '"\\\\\\\\\\\\\\"', error => qr/Token ""\\\\\\\\\\\\\\"" is invalid/); | ||
|
||
done_testing(); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
use strict; | ||
use warnings; | ||
|
||
use PostgreSQL::Test::Utils; | ||
use Test::More; | ||
use FindBin; | ||
|
||
use File::Temp qw(tempfile); | ||
|
||
my $test_file = "$FindBin::RealBin/../tiny.json"; | ||
my $test_out = "$FindBin::RealBin/../tiny.out"; | ||
|
||
my $exe = "test_json_parser_incremental"; | ||
|
||
my ($stdout, $stderr) = run_command( [$exe, "-s", $test_file] ); | ||
|
||
is($stderr, "", "no error output"); | ||
|
||
my ($fh, $fname) = tempfile(); | ||
|
||
print $fh $stdout,"\n"; | ||
|
||
close($fh); | ||
|
||
($stdout, $stderr) = run_command(["diff", "-u", $fname, $test_out]); | ||
|
||
is($stdout, "", "no output diff"); | ||
is($stderr, "", "no diff error"); | ||
|
||
done_testing(); | ||
|
||
|
||
|
||
|
||
|
||
|
Oops, something went wrong.