Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Issue on a Nested JSON example #15

Closed
soasme opened this issue Feb 16, 2021 · 1 comment
Closed

Performance Issue on a Nested JSON example #15

soasme opened this issue Feb 16, 2021 · 1 comment

Comments

@soasme
Copy link
Owner

soasme commented Feb 16, 2021

PeppaPEG version: 1.2.0.

Given program, half of the execution time spent on NeedLoosen. We should optimize it.

$ cat examples/json.c
#include <stdio.h>
#include <stdlib.h>
#include "../peppapeg.h"
#include "json.h"

# define NESTING_DEPTH          1000

int main(int argc, char* argv[]) {
    char* input = malloc(sizeof(char) * (NESTING_DEPTH*2 + 1));
    int i;

    for (i = 0; i < NESTING_DEPTH; i++) {
        input[i] = '[';
        input[NESTING_DEPTH+i] = ']';
    }
    input[NESTING_DEPTH*2] = '\0';

    P4_Grammar* grammar = P4_CreateJSONGrammar();
    P4_Source* source = P4_CreateSource(input, P4_JSONEntry);

    printf("%u\n", P4_Parse(grammar, source));

    free(input);
    P4_DeleteSource(source);
    P4_DeleteGrammar(grammar);

    return 0;
}

Run it with valgrind:

$ gcc -g  peppapeg.c examples/json.c && valgrind --tool=callgrind ./a.out
$ callgrind_annotate callgrind.out.63642

We have below profiling result:

--------------------------------------------------------------------------------
Ir
--------------------------------------------------------------------------------
1,265,470,759 (100.0%)  PROGRAM TOTALS

--------------------------------------------------------------------------------
Ir                    file:function
--------------------------------------------------------------------------------
706,187,168 (55.80%)  peppapeg.c:P4_NeedLoosen [/app/a.out]
320,620,450 (25.34%)  peppapeg.c:P4_IsTight [/app/a.out]
170,320,160 (13.46%)  peppapeg.c:P4_IsScoped [/app/a.out]
 25,020,000 ( 1.98%)  peppapeg.c:P4_NeedSquash [/app/a.out]
 10,000,000 ( 0.79%)  peppapeg.c:P4_IsSquashed [/app/a.out]
  4,617,160 ( 0.36%)  ???:_int_free [/usr/lib64/libc-2.28.so]
  3,382,226 ( 0.27%)  ???:malloc [/usr/lib64/ld-2.28.so]
  2,727,160 ( 0.22%)  peppapeg.c:P4_Match'2 [/app/a.out]
  1,925,056 ( 0.15%)  ???:__strlen_avx2 [/usr/lib64/libc-2.28.so]
  1,848,485 ( 0.15%)  ???:free [/usr/lib64/ld-2.28.so]
  1,841,351 ( 0.15%)  peppapeg.c:P4_MatchLiteral [/app/a.out]
  1,705,988 ( 0.13%)  peppapeg.c:P4_MatchChoice'2 [/app/a.out]
  1,540,594 ( 0.12%)  peppapeg.c:P4_IsRule [/app/a.out]
  1,518,598 ( 0.12%)  peppapeg.c:P4_Expression_dispatch'2 [/app/a.out]
$ time ./a.out
real	0m0.157s
user	0m0.154s
sys	0m0.002s

A full profiling result can be seen here: https://gist.github.com/soasme/38471063511aa14302e5ccad173767de

soasme added a commit that referenced this issue Feb 16, 2021
This should improve the issue #15 more or less.
The execution time was reduced from 0.16s to 0.09s.

    # time ./a.out
    real    0m0.090s
    user    0m0.084s
    sys     0m0.003s
soasme added a commit that referenced this issue Feb 16, 2021
…al. (#16)

* Reduce the number of call executions: P4_NeedLoosen by caching to local.

This should improve the issue #15 more or less.
The execution time was reduced from 0.16s to 0.09s.

    # time ./a.out
    real    0m0.090s
    user    0m0.084s
    sys     0m0.003s

* update roadmap.
soasme added a commit that referenced this issue Feb 17, 2021
Inside the stack frame, the silent and space states are also cached so we don't need the heavy lift of using NeedSquash & NeedLoosen.

The performance optimization is significant:

Running #15 against the current HEAD (10x faster than #15):

```c
time ./a.out
real	0m0.014s
user	0m0.006s
sys	0m0.004s
```

See callgrind output: https://gist.github.com/soasme/f31ea5f78420304a1b12b434a4a808e9
@soasme
Copy link
Owner Author

soasme commented Feb 18, 2021

The performance of Current HEAD is 10x faster than v1.2.0:

$ time ./a.out

real	0m0.011s
user	0m0.007s
sys	0m0.003s
--------------------------------------------------------------------------------
Ir
--------------------------------------------------------------------------------
33,294,688 (100.0%)  PROGRAM TOTALS

--------------------------------------------------------------------------------
Ir                  file:function
--------------------------------------------------------------------------------
5,259,163 (15.80%)  ???:_int_free [/usr/lib64/libc-2.28.so]
3,846,555 (11.55%)  ???:malloc [/usr/lib64/ld-2.28.so]
2,595,108 ( 7.79%)  peppapeg.c:P4_Match'2 [/app/a.out]
2,078,669 ( 6.24%)  ???:free [/usr/lib64/ld-2.28.so]
1,841,351 ( 5.53%)  peppapeg.c:P4_MatchLiteral [/app/a.out]
1,805,004 ( 5.42%)  ???:__strlen_avx2 [/usr/lib64/libc-2.28.so]
1,705,988 ( 5.12%)  peppapeg.c:P4_MatchChoice'2 [/app/a.out]
1,518,598 ( 4.56%)  peppapeg.c:P4_Expression_dispatch'2 [/app/a.out]
1,260,936 ( 3.79%)  ???:strdup [/usr/lib64/ld-2.28.so]
1,174,097 ( 3.53%)  peppapeg.c:P4_MatchRepeat [/app/a.out]
1,092,247 ( 3.28%)  peppapeg.c:P4_RaiseError [/app/a.out]
1,003,802 ( 3.01%)  ???:__memcpy_avx_unaligned_erms [/usr/lib64/libc-2.28.so]
  900,542 ( 2.70%)  peppapeg.c:P4_MatchSequence'2 [/app/a.out]
  806,160 ( 2.42%)  peppapeg.c:P4_RescueError [/app/a.out]
  805,790 ( 2.42%)  peppapeg.c:P4_PushFrame [/app/a.out]
  791,322 ( 2.38%)  peppapeg.c:P4_GetPosition [/app/a.out]
  743,611 ( 2.23%)  ???:_int_malloc [/usr/lib64/libc-2.28.so]
  671,409 ( 2.02%)  peppapeg.c:P4_DeleteToken [/app/a.out]
  440,200 ( 1.32%)  peppapeg.c:P4_SetPosition [/app/a.out]
  414,244 ( 1.24%)  peppapeg.c:P4_MatchReference'2 [/app/a.out]
  410,080 ( 1.23%)  peppapeg.c:P4_RemainingText [/app/a.out]
  310,279 ( 0.93%)  peppapeg.c:P4_PopFrame [/app/a.out]
  297,096 ( 0.89%)  ???:__memcmp_avx2_movbe [/usr/lib64/libc-2.28.so]
  210,129 ( 0.63%)  peppapeg.c:P4_GetWhitespaces [/app/a.out]
  196,095 ( 0.59%)  peppapeg.c:P4_GetReference [/app/a.out]
  185,000 ( 0.56%)  peppapeg.c:P4_MatchSpacedExpressions [/app/a.out]
  140,100 ( 0.42%)  ???:0x0000000004c4e760 [???]
  140,100 ( 0.42%)  ???:0x0000000004c4e820 [???]
  140,100 ( 0.42%)  ???:0x0000000004c4e880 [???]
  122,000 ( 0.37%)  peppapeg.c:P4_NeedLift [/app/a.out]
   90,144 ( 0.27%)  peppapeg.c:cleanup_freep [/app/a.out]

The result shows the performance has been improved significantly. Close the issue.

@soasme soasme closed this as completed Feb 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant