Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integer parsing allows illegal spaces or structural indices. #1878

Closed
MashPlant opened this issue Jul 29, 2022 · 5 comments · Fixed by #1880
Closed

Integer parsing allows illegal spaces or structural indices. #1878

MashPlant opened this issue Jul 29, 2022 · 5 comments · Fixed by #1880

Comments

@MashPlant
Copy link

MashPlant commented Jul 29, 2022

Describe the bug

Integer parsing (get_int64, get_uint64) may allow illegal JSON strings containing an integer followed by illegal spaces or structural indices.

To Reproduce

#include "simdjson.h"
#include <iostream>

using namespace simdjson;

int main() {
  ondemand::parser parser;
  auto json = R"(123_abc)"_padded;
  for (char ch : {'[', ']', '{', '}', ',', ' '}) {
    json.data()[3] = ch;
    ondemand::document doc = parser.iterate(json);
    std::cout << doc.get_int64().value() << "\n";
  }
  json.data()[3] = '-';
  ondemand::document doc = parser.iterate(json);
  std::cout << doc.get_int64().value() << "\n";
}

With the loop, the parsed JSON string will be "123[abc", "123]abc", ... None of them is a valid JSON string. However, get_int64 will return 123 regardless of the error. Only when the string is set to "123-abc", where - is not a space or structural index, can the error be correctly reported.

The expected behavior of the code is throwing an exception at the first doc.get_int64().value() in the loop. The actual behavior is outputting 123 six times in the loop, and only throwing an exception at the last doc.get_int64().value() outside the loop.

simjson release

v1.1.0 (commit 8c1bfe7).

Configuration (please complete the following information if relevant)

  • OS: macOS Monterey
  • Compiler Apple clang version 13.0.0 (clang-1300.0.29.3) x86_64-apple-darwin21.4.0
  • Version: 12.3.1
  • Optimization setting: None
@lemire
Copy link
Member

lemire commented Jul 30, 2022

I will review!

@jkeiser
Copy link
Member

jkeiser commented Jul 30, 2022

It is acceptable for 123] to produce 123 and then an error, or to just produce an error, but unacceptable to produce 123 6 times!

@lemire
Copy link
Member

lemire commented Jul 30, 2022

We are potentially missing a check when it is a lone number.

@MashPlant
Copy link
Author

It is acceptable for 123] to produce 123 and then an error, or to just produce an error, but unacceptable to produce 123 6 times!

Sorry for any confusion. 6 times actually mean one time in each loop iteration. They are different strings.

@lemire
Copy link
Member

lemire commented Aug 4, 2022

So what happens is that we consume the number, but even though we advance, we never read the trailing component that follows so we never create an error. The solution is to simply add an additional check when the parsing is a success, we must also verify that there is nothing else in the document.

It is a small thing...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants