Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the dom::parser internal loaded_bytes buffer as a tmp padded buffer when realloc_if_needed #1518

Merged
merged 1 commit into from
Apr 23, 2021

Conversation

bobergj
Copy link
Contributor

@bobergj bobergj commented Mar 25, 2021

Use the dom::parser internal loaded_bytes buffer as a tmp padded buffer when realloc_if_needed is true.

Motivation:

  • Avoid allocating a new buffer for every parse when realloc_if_needed
  • If it's good enough for file parsing (doesn't adjust capacity downwards) it should be good enough for this use case

@lemire
Copy link
Member

lemire commented Mar 25, 2021

@bobergj The fuzzer appears to have found an issue, can you have a look?

../include/simdjson/dom/parser-inl.h:112:17: runtime error: null pointer passed as argument 1, which is declared to never be null
310
/usr/include/string.h:44:28: note: nonnull attribute specified here
311
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../include/simdjson/dom/parser-inl.h:112:17 in 

}
_loaded_bytes_capacity = len;
}
std::memcpy(static_cast<void *>(loaded_bytes.get()), buf, len);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be unsafe in the sense that a pointer could be null.

Copy link
Contributor Author

@bobergj bobergj Mar 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this could indeed happen if the input buffer was empty (len == 0). Fixed in new commit.
It could be that the parse file code path (not modified by this PR) has a similar issue when a zero byte file is parsed, not sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be that the parse file code path (not modified by this PR) has a similar issue when a zero byte file is parsed, not sure.

I will add a test for that specifically.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #1519

@lemire lemire added this to the 1.0 milestone Mar 25, 2021
@lemire lemire mentioned this pull request Mar 25, 2021
@lemire
Copy link
Member

lemire commented Mar 26, 2021

This looks pretty good.

@lemire
Copy link
Member

lemire commented Mar 27, 2021

Marked for 1.0 release.

@lemire
Copy link
Member

lemire commented Apr 23, 2021

Merging.

@lemire lemire merged commit ef8c2c4 into simdjson:master Apr 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants