Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvment: Add format "raw" for "string" type (skip escaping) #685

Closed
wants to merge 14 commits into from
Closed

Conversation

cesco69
Copy link
Contributor

@cesco69 cesco69 commented Mar 8, 2024

JSON with a lot of strings take time during escaping.

This PR introduce "raw" string format (no need to escape).

"schema": {
  "type": "string",
  "format": "raw"
}

benchmark (compare classic bench VS same bench with format raw)

short string............................................. x 22,581,464 ops/sec ±0.79% (186 runs sampled)
short string and raw..................................... x 1,038,659,823 ops/sec ±0.40% (189 runs sampled)

long string without double quotes........................ x 22,477 ops/sec ±0.46% (192 runs sampled)
long string without double quotes raw.................... x 1,035,763,724 ops/sec ±0.39% (189 runs sampled)

long string.............................................. x 15,865 ops/sec ±0.42% (190 runs sampled)
long string raw.......................................... x 1,060,214,143 ops/sec ±0.49% (190 runs sampled)

raw string seem x45 faster

Signed-off-by: francesco <francesco.bagnoli.69@gmail.com>
Signed-off-by: francesco <francesco.bagnoli.69@gmail.com>
Signed-off-by: francesco <francesco.bagnoli.69@gmail.com>
Signed-off-by: francesco <francesco.bagnoli.69@gmail.com>
Signed-off-by: francesco <francesco.bagnoli.69@gmail.com>
@cesco69 cesco69 changed the title Raw string Performance improvment: Add format "raw" for "string" type (skip escaping) Mar 8, 2024
Signed-off-by: francesco <francesco.bagnoli.69@gmail.com>
fix lint

Signed-off-by: francesco <francesco.bagnoli.69@gmail.com>
@mcollina
Copy link
Member

mcollina commented Mar 8, 2024

Can you add some docs?

index.js Outdated Show resolved Hide resolved
@@ -113,7 +113,9 @@ module.exports = class Serializer {
}
}

if (str.length < 42) {
if (format === 'raw') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would create a different serializer method instead of adding the check here, because you know the if condition at the compile time and there is no need to to it in a serialization time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mhh eg. asStringRaw(str) ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeap

@@ -4,6 +4,22 @@ const t = require('tap')
const test = t.test
const build = require('..')

test('serialize short string raw', (t) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add some test cases where it doesn't escape chars

@@ -113,7 +113,9 @@ module.exports = class Serializer {
}
}

if (str.length < 42) {
if (format === 'raw') {
Copy link
Member

@ivan-tymoshenko ivan-tymoshenko Mar 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use a different name than raw to emphasize that is format has a very dangerous potential security issue. Raw format would not escape a double quote char which makes it very easy to inject something into the data. You can use it only if you 200% sure in your data. I would call it something like unsafe.

@mcollina WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for unsafe ...

Signed-off-by: francesco <francesco.bagnoli.69@gmail.com>
Signed-off-by: francesco <francesco.bagnoli.69@gmail.com>
Signed-off-by: francesco <francesco.bagnoli.69@gmail.com>
Signed-off-by: francesco <francesco.bagnoli.69@gmail.com>
Signed-off-by: francesco <francesco.bagnoli.69@gmail.com>
Signed-off-by: francesco <francesco.bagnoli.69@gmail.com>
Signed-off-by: francesco <francesco.bagnoli.69@gmail.com>
@cesco69
Copy link
Contributor Author

cesco69 commented Mar 8, 2024

Can you add some docs?

Done. 4f30811

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@@ -284,7 +284,7 @@ function buildExtraObjectPropertiesSerializer (context, location, addComma) {
code += `
if (/${propertyKey.replace(/\\*\//g, '\\/')}/.test(key)) {
${addComma}
json += serializer.asString(key) + ':'
json += serializer.asString(key,null) + ':'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why pass null? It’s type is object while the parameter type is string — to me that gives me the impression that an object argument is expected there

What’s wrong with leaving it as undefined?

Suggested change
json += serializer.asString(key,null) + ':'
json += serializer.asString(key) + ':'

@nigrosimone nigrosimone mentioned this pull request Mar 10, 2024
4 tasks
@ivan-tymoshenko
Copy link
Member

superseded by #686

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants