Skip to content

Commit

Permalink
Squashed 'misc/packcc/' changes from 6da5a4c6a..739b3ee9e
Browse files Browse the repository at this point in the history
739b3ee9e Fix argument type mismatch
150372de0 Merge branch 'feature/memory-recycling'
142660fb1 Reduce memory allocation frequency
b3f745496 Add a typecast and const modifiers
176f5c0f8 Simplify the code using pcc_context_t typedef
4dbcaae48 Rename identifiers related to memory recycling
08a6f0c56 Merge branch 'master' into feature/memory-recycling
3a0ecca3f Rename macros in generated parsers
e50f8b233 Merge pull request universal-ctags#63 from masatake/recycle-list
58ad04747 Merge pull request universal-ctags#64 from dolik-rce/benchmark-memory
e559f4c4e add memory measurement to benchmark script
f3a5c7e77 Preallocate memory objects for pcc_thunk_chunk_t, pcc_lr_head_t, and pcc_lr_answer_t
7cd6dffb7 Pass pcc_context_t instead of pcc_auxil_t in many places
710b51f7f Update the copyright years
70389ec19 Conform to the coding style
59668cf87 Divide the character_classes_0.d test into two tests
657508c52 Merge pull request universal-ctags#61 from mingodad/fix-charset-plus-minus
572951a8c Fix handling charset "[+-]"
0e3ee0c8b Update README.md
03c90e03e Fix the reopened issue universal-ctags#56
c2f499eb2 Ensures that all values of unevaluated rules are zero-cleared
f376e099d Support exact column numbers in the PEG source even if UTF-8 multibyte characters are contained
9dfcd9153 Modify a dump function
e27c05d91 Add codes for safety
da750a9a7 Refine code block output
afd64bc61 Update README.md
cea483b89 Support insertion of #line directives in the generated code (universal-ctags#55)
62130fe96 Add a feature to count text lines output to a stream
4982d72ea Introduce a structure to hold code block data
86874c214 Fix incorrect update of the parsing position
41be80f02 Introduce a structure to hold options
5b9f23d18 Rename functions
803317bc4 Update README.md

git-subtree-dir: misc/packcc
git-subtree-split: 739b3ee9edd62b8623d30272069e6fd446270591
  • Loading branch information
masatake committed May 27, 2022
1 parent d742742 commit 73c7f3e
Show file tree
Hide file tree
Showing 15 changed files with 1,387 additions and 871 deletions.
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
PackCC: a packrat parser generator for C.

Copyright (c) 2014, 2019-2021 Arihiro Yoshida. All rights reserved.
Copyright (c) 2014, 2019-2022 Arihiro Yoshida. All rights reserved.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
16 changes: 14 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,10 @@ packcc -o parser example.peg

By running this, the parser source `parser.h` and `parser.c` are generated.

If you want to disable UTF-8 support, specify the command line option `-a` (version 1.4.0 or later).
If you want to disable UTF-8 support, specify the command line option `-a` or `--ascii` (version 1.4.0 or later).

If you want to insert `#line` directives in the generated source and header files, specify the command line option `-l` or `--lines` (version 1.7.0 or later).
It is helpful to trace compilation errors of the generated source and header files back to the codes written in the PEG source file.

If you want to confirm the version of the `packcc` command, execute the below.

Expand Down Expand Up @@ -300,6 +303,7 @@ This matches `[[`...`]]`, `[=[`...`]=]`, `[==[`...`]==]`, etc.
Curly braces surround an action.
The action is arbitrary C source code to be executed at the end of matching.
Any braces within the action must be properly nested.
Note that braces in directive lines and in comments (`/*`...`*/` and `//`...) are appropriately ignored.
One or more actions can be inserted in any places between elements in the pattern.
Actions are not executed where matching fails.

Expand All @@ -323,6 +327,7 @@ In the action, the C source code can use the predefined variables below.
The default data type is `void *`.
- _variable_
The result of another rule that has already been evaluated.
If the rule has not been evaluated, it is ensured that the value is zero-cleared (version 1.7.1 or later).
The data type is the one specified by `%value`.
The default data type is `int`.
- **`$`**_n_
Expand Down Expand Up @@ -368,6 +373,7 @@ The data type is `size_t` (before version 1.4.0, it was `int`).
Curly braces following tilde (`~`) surround an error action.
The error action is arbitrary C source code to be executed at the end of matching only if the preceding _element_ matching fails.
Any braces within the error action must be properly nested.
Note that braces in directive lines and in comments (`/*`...`*/` and `//`...) are appropriately ignored.
One or more error actions can be inserted in any places after elements in the pattern.
The operator tilde (`~`) binds less tightly than any other operator except alternation (`/`) and sequencing.
The error action is intended to make error handling and recovery code easier to write.
Expand All @@ -382,15 +388,21 @@ rule2 <- (e1 e2 e3) ~{ error("one of e[123] has failed"); }
**`%header` `{` _c source code_ `}`**

The specified C source code is copied verbatim to the C header file before the generated parser API function declarations.
Any braces in the C source code must be properly nested.
Note that braces in directive lines and in comments (`/*`...`*/` and `//`...) are appropriately ignored.

**`%source` `{` _c source code_ `}`**

The specified C source code is copied verbatim to the C source file before the generated parser implementation code.
Any braces in the C source code must be properly nested.
Note that braces in directive lines and in comments (`/*`...`*/` and `//`...) are appropriately ignored.

**`%common` `{` _c source code_ `}`**

The specified C source code is copied verbatim to both of the C header file and the C source file
before the generated parser API function declarations and the implementation code respectively.
Any braces in the C source code must be properly nested.
Note that braces in directive lines and in comments (`/*`...`*/` and `//`...) are appropriately ignored.

**`%earlyheader` `{` _c source code_ `}`**

Expand Down Expand Up @@ -641,7 +653,7 @@ while (pcc_parse(ctx, &ret));
pcc_destroy(ctx);
```

## Example ##
## Examples ##

### Desktop calculator ###

Expand Down
103 changes: 68 additions & 35 deletions benchmark/benchmark.sh
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
#!/usr/bin/env bash
#
# Generates, builds and runs parsers from grammar directory for each git reference supplied as argument.
# Each action is performed multiple times and the times are averaged. First reference is always
# taken as a "baseline" and others are compared to it. This should allow to compare how any given commit
# affects PackCCs performance.
# Generates, builds and runs parsers from grammars directory for each git reference supplied as argument.
# Each action is performed multiple times and the times are averaged. Peak memory consumption is also measured.
# First reference is always taken as a "baseline" and others are compared to it. This should allow to compare
# how any given commit affects PackCCs performance.
#
# Usage:
# ./benchmark.sh <git ref> ...
Expand Down Expand Up @@ -35,79 +35,105 @@ format() {
elif [ $((TIME / 1000)) -gt 10 ]; then
echo "$((TIME / 1000)) us"
else
echo "$((TIME)) ns"
echo "$TIME ns"
fi
}

format_mem() {
MEM="$1"
if [ -z "$TIME_CMD" ]; then
echo "??? kB"
elif [ $((MEM / 1048576)) -gt 10 ]; then
echo "$((MEM / 1048576)) GB"
elif [ $((MEM / 1024)) -gt 10 ]; then
echo "$((MEM / 1024)) MB"
else
echo "$MEM kB"
fi
}

measure() {
COUNT="$1"
shift
MEM=0
if [ "$TIME_CMD" ]; then
MEM="$(${TIME_CMD[@]} -f %M "$@" 2>&1 >/dev/null)"
fi
START="$(date '+%s%N')"
for ((i=0; i<COUNT; i++)); do
"$@"
"$@" > /dev/null
done
END="$(date '+%s%N')"
TIME=$(( END - START ))
}

run() {
"$1" < "$2" > /dev/null
}

benchmark() {
KEY="${GRAMMAR}_${REF//\//_}"
NAME="tmp/parser_$KEY"

echo "Generating $GRAMMAR parser in $REF ($GEN_REPEATS times)..."
measure "$GEN_REPEATS" "$PACKCC" -o "$NAME" "$GRAMMAR_FILE"
GEN["$KEY"]=$TIME
echo " Repeated $GEN_REPEATS times in $(format $TIME)"
GEN_TIME["$KEY"]=$TIME
GEN_MEM["$KEY"]=$MEM
echo " Repeated $GEN_REPEATS times in $(format $TIME), peak memory $(format_mem $MEM)"

echo "Building $GRAMMAR parser in $REF ($BUILD_REPEATS times)..."
measure "$BUILD_REPEATS" $CC -I. "$NAME".c -o "$NAME"
BUILD["$KEY"]=$TIME
echo " Built $BUILD_REPEATS times in $(format $TIME)"
BUILD_TIME["$KEY"]=$TIME
BUILD_MEM["$KEY"]=$MEM
echo " Built $BUILD_REPEATS times in $(format $TIME), peak memory $(format_mem $MEM)"

echo "Running $GRAMMAR parser in $REF ($RUN_REPEATS times)..."
measure "$RUN_REPEATS" run "./$NAME" "$INPUT"
RUN["$KEY"]=$TIME
echo " Repeated $RUN_REPEATS times in $(format $TIME)"
measure "$RUN_REPEATS" "./$NAME" "$INPUT"
RUN_TIME["$KEY"]=$TIME
RUN_MEM["$KEY"]=$MEM
echo " Repeated $RUN_REPEATS times in $(format $TIME), peak memory $(format_mem $MEM)"
}

print_table() {
declare -n RESULTS="$1"
declare -n RESULTS_TIME="${1}_TIME"
declare -n RESULTS_MEM="${1}_MEM"
printf "%-12s" ""
for REF in "${REFS[@]}"; do
printf "%-16s" "$REF"
printf "%-32s" "$REF"
done
printf "\n"
MEMORY=0
RELATIVE_MEM="???"
COLOR_MEM=0
for GRAMMAR in "${GRAMMARS[@]}"; do
printf "%-12s" "$GRAMMAR"
for REF in "${REFS[@]}"; do
KEY="${GRAMMAR}_${REF//\//_}"
BASE="${GRAMMAR}_${REFS[0]//\//_}"
TIME="$((${RESULTS["$KEY"]} / RUN_REPEATS))"
RELATIVE="$((100 * RESULTS["$KEY"] / RESULTS["$BASE"]))"
COLOR=$((RELATIVE == 100 ? 0 : ( RELATIVE > 100 ? 31 : 32)))
printf "\033[0;${COLOR}m%-16s\033[0m" "$(format $TIME) ($RELATIVE%)"
TIME="$((${RESULTS_TIME["$KEY"]} / RUN_REPEATS))"
RELATIVE_TIME="$((100 * RESULTS_TIME["$KEY"] / RESULTS_TIME["$BASE"]))"
COLOR=$((RELATIVE_TIME == 100 ? 0 : ( RELATIVE_TIME > 100 ? 31 : 32)))
if [ "$TIME_CMD" ]; then
MEMORY="${RESULTS_MEM["$KEY"]}"
RELATIVE_MEM="$((100 * RESULTS_MEM["$KEY"] / RESULTS_MEM["$BASE"]))"
COLOR_MEM=$((RELATIVE_MEM == 100 ? 0 : ( RELATIVE_MEM > 100 ? 31 : 32)))
fi
printf "\033[0;${COLOR}m%-16s\033[0;${COLOR_MEM}m%-16s\033[0m" "$(format $TIME) ($RELATIVE_TIME%)" "$(format_mem $MEMORY) ($RELATIVE_MEM%)"
done
printf "\n"
done
}

print_results() {
echo
echo "Generation times:"
echo "================="
echo "Generation performance:"
echo "======================="
print_table GEN
echo
echo "Build times:"
echo "============"
echo "Build performance:"
echo "=================="
print_table BUILD
echo
echo "Run times:"
echo "=========="
echo "Run performance:"
echo "================"
print_table RUN
echo
}

main() {
Expand All @@ -116,13 +142,11 @@ main() {
BENCHDIR="$(cd "$(dirname "$0")" && pwd)"
ROOTDIR="$BENCHDIR/.."
declare -a GRAMMARS=()
declare -A BUILD=()
declare -A GEN=()
declare -A RUN=()
declare -A BUILD_TIME GEN_TIME RUN_TIME BUILD_MEM GEN_MEM RUN_MEM

declare -i GEN_REPEATS="${GEN_REPEATS:-10}"
declare -i BUILD_REPEATS="${BUILD_REPEATS:-5}"
declare -i RUN_REPEATS="${RUN_REPEATS:-20}"
declare -i GEN_REPEATS="${GEN_REPEATS:-1}"
declare -i BUILD_REPEATS="${BUILD_REPEATS:-1}"
declare -i RUN_REPEATS="${RUN_REPEATS:-1}"
CC="${CC:-cc -O2}"
REFS=("$@")

Expand All @@ -131,6 +155,15 @@ main() {
exit 0
fi

if which busybox &> /dev/null; then
TIME_CMD=(busybox time)
elif which time &> /dev/null; then
TIME_CMD=("$(which time)")
else
echo "NOTE: No time command found, please install GNU time or busybox to measure memory consumption."
TIME_CMD=""
fi

START_REF="$(git name-rev --name-only HEAD)"
trap "echo 'Returning to $START_REF...' && git checkout $START_REF" EXIT ERR INT

Expand Down
5 changes: 4 additions & 1 deletion benchmark/grammars/calc.peg
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,10 @@ _ <- [ \t]*
EOL <- '\n' / '\r\n' / '\r' / ';'

%%
int main() {
int main(int argc, char **argv) {
if (argc > 1) {
freopen(argv[1], "r", stdin);
}
calc_context_t *ctx = calc_create(NULL);
while (calc_parse(ctx, NULL));
calc_destroy(ctx);
Expand Down
5 changes: 4 additions & 1 deletion benchmark/grammars/json.peg
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,10 @@ null <- 'null'
_ <- [ \n\r\t]*

%%
int main() {
int main(int argc, char **argv) {
if (argc > 1) {
freopen(argv[1], "r", stdin);
}
json_context_t *ctx = json_create(NULL);
while (json_parse(ctx, NULL));
json_destroy(ctx);
Expand Down
3 changes: 3 additions & 0 deletions benchmark/grammars/kotlin.peg
Original file line number Diff line number Diff line change
Expand Up @@ -441,6 +441,9 @@ EOF <- !.
%%

int main(int argc, char **argv) {
if (argc > 1) {
freopen(argv[1], "r", stdin);
}
int ret;
pcc_context_t *ctx = pcc_create(NULL);
while (pcc_parse(ctx, &ret));
Expand Down
12 changes: 6 additions & 6 deletions examples/ast-tinyc/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,19 +46,19 @@ You must have [Build Tools for Visual Studio](https://visualstudio.microsoft.com
You can get the executable by executing the following commands using 'Developer Command Prompt for VS 2019' or 'Developer PowerShell for VS 2019':

```
cd /path/to/this_directory
cd \path\to\this_directory
mkdir build
cd build
cmake -DPACKCC=/path/to/packcc ..
cmake -DPACKCC=\path\to\packcc ..
MSBuild ALL_BUILD.vcxproj
```

Here, `/path/to/this_directory` represents the path name of this directory,
and `/path/to/packcc` represents the path name of `packcc` command.
Here, `\path\to\this_directory` represents the path name of this directory,
and `\path\to\packcc` represents the path name of `packcc` command.
If `packcc` command is installed in one of the directories specified in the environment variable `PATH`,
the option `-DPACKCC=/path/to/packcc` is not necessary.
the option `-DPACKCC=\path\to\packcc` is not necessary.

The executable `ast.exe` will be created in the directory `build`.
The executable `ast.exe` will be created in the directory `build\Debug`.

#### Using MinGW-w64 ####

Expand Down
Loading

0 comments on commit 73c7f3e

Please sign in to comment.