Skip to content

Commit

Permalink
Build de-escaped JSON strings in larger chunks during lexing
Browse files Browse the repository at this point in the history
During COPY BINARY with large JSONB blobs, it was found that half
the time was spent parsing JSON, with much of that spent in separate
appendStringInfoChar() calls for each input byte.

Add lookahead loop to json_lex_string() to allow batching multiple bytes
via appendBinaryStringInfo(). Also use this same logic when de-escaping
is not done, to avoid code duplication.

Report and proof of concept patch by Jelte Fennema, reworked by Andres
Freund and John Naylor

Discussion: https://www.postgresql.org/message-id/CAGECzQQuXbies_nKgSiYifZUjBk6nOf2%3DTSXqRjj2BhUh8CTeA%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/flat/PR3PR83MB0476F098CBCF68AF7A1CA89FF7B49@PR3PR83MB0476.EURPRD83.prod.outlook.com
  • Loading branch information
j-naylor committed Jul 11, 2022
1 parent a6434b9 commit 3838fa2
Showing 1 changed file with 39 additions and 19 deletions.
58 changes: 39 additions & 19 deletions src/common/jsonapi.c
Expand Up @@ -686,15 +686,6 @@ json_lex_string(JsonLexContext *lex)
lex->token_terminator = s;
return JSON_INVALID_TOKEN;
}
else if (*s == '"')
break;
else if ((unsigned char) *s < 32)
{
/* Per RFC4627, these characters MUST be escaped. */
/* Since *s isn't printable, exclude it from the context string */
lex->token_terminator = s;
return JSON_ESCAPING_REQUIRED;
}
else if (*s == '\\')
{
/* OK, we have an escape character. */
Expand Down Expand Up @@ -849,22 +840,51 @@ json_lex_string(JsonLexContext *lex)
return JSON_ESCAPING_INVALID;
}
}
else if (lex->strval != NULL)
else
{
char *p;

if (hi_surrogate != -1)
return JSON_UNICODE_LOW_SURROGATE;

appendStringInfoChar(lex->strval, *s);
}
}
/*
* Skip to the first byte that requires special handling, so we
* can batch calls to appendBinaryStringInfo.
*/
for (p = s; p < end; p++)
{
if (*p == '\\' || *p == '"')
break;
else if ((unsigned char) *p < 32)
{
/* Per RFC4627, these characters MUST be escaped. */
/*
* Since *p isn't printable, exclude it from the context
* string
*/
lex->token_terminator = p;
return JSON_ESCAPING_REQUIRED;
}
}

if (hi_surrogate != -1)
return JSON_UNICODE_LOW_SURROGATE;
if (lex->strval != NULL)
appendBinaryStringInfo(lex->strval, s, p - s);

/* Hooray, we found the end of the string! */
lex->prev_token_terminator = lex->token_terminator;
lex->token_terminator = s + 1;
return JSON_SUCCESS;
if (*p == '"')
{
/* Hooray, we found the end of the string! */
lex->prev_token_terminator = lex->token_terminator;
lex->token_terminator = p + 1;
return JSON_SUCCESS;
}

/*
* s will be incremented at the top of the loop, so set it to just
* behind our lookahead position
*/
s = p - 1;
}
}
}

/*
Expand Down

0 comments on commit 3838fa2

Please sign in to comment.