-
Notifications
You must be signed in to change notification settings - Fork 436
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
OMFG. string lexing improvements that yield about a 20% improvement i…
…n parsing performance. inspiration provided by mike hanson.
- Loading branch information
Showing
1 changed file
with
70 additions
and
28 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ec8204d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make it more faster by not referring to buffered data (lexer->buf) ever? Instead of rescanning and checking for skips, just maintain previous state of parsing and continue to parse the incoming data. This requires a small state machine.
typedef enum {
yajl_string_parse_start = 0,
yajl_string_parse_complete = 1,
yajl_string_parse_escape_start,
yajl_string_parse_found_char_u,
yajl_string_parse_got_hex_first_byte,
yajl_string_parse_got_hex_second_byte,
yajl_string_parse_got_hex_third_byte,
yajl_string_parse_ut8_mode_2,
yajl_string_parse_ut8_mode_3,
yajl_string_parse_ut8_mode_3_2,
yajl_string_parse_ut8_mode_3_3,
yajl_string_parse_ut8_mode_4,
yajl_string_parse_ut8_mode_4_2,
yajl_string_parse_ut8_mode_4_3,
yajl_string_parse_ut8_mode_4_4,
yajl_string_parse_invalid
}yajl_string_parsing_state;
My Changes in yajl_lex.c:
static yajl_tok
yajl_lex_string(yajl_lexer lexer, const unsigned char * jsonText,
unsigned int jsonTextLen, unsigned int * offset)
{
yajl_tok tok = yajl_tok_error;
finish_string_lex:
/* tell our buddy, the parser, wether he needs to process this string
* again */
if (lexer->hasEscapes && tok == yajl_tok_string) {
tok = yajl_tok_string_with_escapes;
lexer->hasEscapes = 0;
}
}
Please let me know if this turns out to be faster.
Regards
Abhi