Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documents better the type method and makes is_null return an error condition in some instances #1909

Merged
merged 7 commits into from
Oct 6, 2022

Conversation

lemire
Copy link
Member

@lemire lemire commented Oct 5, 2022

Some extra documentation and a substantial change: the is_null method will not return an error when the input is obviously not JSON (a token begins with 'n' but is not null).

doc.is_null();
// We check that the value is indeed null
// otherwise: an error is thrown.
if(doc.is_null()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simdjson_inline bool document::is_null() noexcept {
  return get_root_value_iterator().is_root_null();
}

I wasn't sure about this - I think the root level is special cased for document (not value), and this won't throw for doc specifically. i.e. it returns bool and not simdjson_result<bool>

Makes sense for the value iteration

Copy link
Contributor

@TysonAndre TysonAndre Oct 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simdjson_warn_unused simdjson_inline simdjson_result<bool> value_iterator::get_root_bool() noexcept {
  auto max_len = peek_start_length();
  auto json = peek_root_scalar("bool");
  uint8_t tmpbuf[5+1];
  if (!_json_iter->copy_to_buffer(json, max_len, tmpbuf)) { return incorrect_type_error("Not a boolean"); }
  auto result = parse_bool(tmpbuf);
  if(result.error() == SUCCESS) {
    if (!_json_iter->is_single_token()) { return TRAILING_CONTENT; }
    advance_root_scalar("bool");
  }
  return result;
}
simdjson_inline bool value_iterator::is_root_null() noexcept {
  // If there is trailing content, then the document is not null.
  if (!_json_iter->is_single_token()) { return false; }
  auto max_len = peek_start_length();
  auto json = peek_root_scalar("null");
  bool result = (max_len >= 4 && !atomparsing::str4ncmp(json, "null") &&
         (max_len == 4 || jsoncharutils::is_structural_or_whitespace(json[5])));
  if(result) { advance_root_scalar("null"); }
  return result;
}

The implementations of helpers for is_bool throw, but not is_null. (just for the root document)

It seems like use cases like this might benefit from a helper such as doc.throw_if_not_null() or changing doc.is_null to simdjson_result

@TysonAndre TysonAndre self-requested a review October 5, 2022 13:29
@TysonAndre
Copy link
Contributor

So examples such as json2json are also affected

» echo tada > example.json
» tools/json2json --file example.json --ondemand
tada

@lemire
Copy link
Member Author

lemire commented Oct 5, 2022

So examples such as json2json are also affected

I don't think that's related.

@lemire
Copy link
Member Author

lemire commented Oct 5, 2022

The json2json utility just calls...

/**
 * Create a string-view instance out of a document instance. The string-view instance
 * contains JSON text that is suitable to be parsed as JSON again.
 */
inline simdjson_result<std::string_view> to_json_string(SIMDJSON_IMPLEMENTATION::ondemand::document& x) noexcept;

It just returns a direct pointer to the source, nothing more.

@lemire lemire changed the title Documents better the type method. Documents better the type method and makes is_null return an error condition in some instances Oct 5, 2022
@lemire lemire requested review from TysonAndre and removed request for TysonAndre October 5, 2022 14:55
@lemire
Copy link
Member Author

lemire commented Oct 5, 2022

@TysonAndre It turns out that the design was not ideal. I think we should validate null values for the user. I am now proposing to change the behaviour somewhat so that it is easier to validate the null values. This will be a breaking change, but it will affect few users, I expect, and even the affected users can deal with the change without much hassle.

Note that it is entirely distinct from extracting a string_view instance from a document instance: that is not meant to be a validating operation. The idea in ondemand is that if you have an array, you can just compute the string_view instance and print it out... If you want to actually parse and validate it, then you have to do more work. By default, ondemand is lazy: it does what it says and no more.

@@ -311,7 +310,7 @@ simdjson_inline simdjson_result<bool> simdjson_result<SIMDJSON_IMPLEMENTATION::o
if (error()) { return error(); }
return first.get_bool();
}
simdjson_inline bool simdjson_result<SIMDJSON_IMPLEMENTATION::ondemand::value>::is_null() noexcept {
simdjson_inline simdjson_result<bool> simdjson_result<SIMDJSON_IMPLEMENTATION::ondemand::value>::is_null() noexcept {
if (error()) { return false; }
Copy link
Contributor

@TysonAndre TysonAndre Oct 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to return false instead of error()? If so, document it?

I thought error() would make more sense to return

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

return result;
simdjson_inline simdjson_result<bool> value_iterator::is_null() noexcept {
bool is_null_value;
auto error = parse_null(peek_non_root_scalar("null")).get(is_null_value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Can this be shortened by replacing this and error check with SIMDJSON_TRY(peek_non_root_scalar("null")).get(is_null_value))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point taken.

@TysonAndre
Copy link
Contributor

I think we should validate null values for the user. I am now proposing to change the behaviour somewhat so that it is easier to validate the null values. This will be a breaking change, but it will affect few users, I expect, and even the affected users can deal with the change without much hassle.

The proposed change makes sense to me

Note that it is entirely distinct from extracting a string_view instance from a document instance: that is not meant to be a validating operation. The idea in ondemand is that if you have an array, you can just compute the string_view instance and print it out... If you want to actually parse and validate it, then you have to do more work. By default, ondemand is lazy: it does what it says and no more.

Got it

@TysonAndre
Copy link
Contributor

lgtm

@lemire
Copy link
Member Author

lemire commented Oct 6, 2022

Merging. I will issue a release.

@lemire lemire merged commit db08d78 into master Oct 6, 2022
@lemire lemire deleted the dlemire/better_documentation_for_type branch October 6, 2022 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants