New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why we do not have a std::string_view version for the key() of field (not key_raw_json_token)? #2149
Comments
Sure! It's not unreasonable to have Just for context, the rationales behind not making
|
I agree with @jkeiser that it is reasonable to extend the API further. |
Note that is it not difficult to implement. We effectively have the code already. |
I think the length of the simdjson_inline std::string_view field::key_raw_json_token() const noexcept {
SIMDJSON_ASSUME(first.buf != nullptr); // We would like to call .alive() by Visual Studio won't let us.
return std::string_view(reinterpret_cast<const char*>(first.buf-1), second.iter._json_iter->token.peek(-1) - first.buf + 1);
} |
Unfortunately, |
It might indeed require scanning for the ending double quote to find the length of the key, but for ease of use, I think this may be worthwhile. If we compare the key multiple times with some other strings, then we've almost certainly scanned the key multiple times already. Additionally, in situations where the key is used as the key in a hash map, having the length information and a We can remind the user that this operation to get a string_view has some cost, but smaller than unescaped_key(). |
@renzibei When checking for equality, we do not actually need to find the end quote... so it is not work that we do in any case, or that we could necessarily amortize in practice. There is no argument against the fact that the feature request is valid and we will provide it. Let me be clear : we will provide this functionality in a future release. In fact, I am openly inviting folks to provide a pull request. If nobody does it, I will. |
Yeah, to be specific, the code to compare the key with a string is basically |
@lemire Thanks for the reply. You mentioned that the function has been implemented somewhere already?
|
I understand. What I'm saying is that, when we compare the key() to other strings multiple times, we may have accessed the whole key memory already. The |
But that's not what we do in the code. |
We can locate the start and the end. It is a simple matter of backtracking and finding the quote. Except for the copy-pasting, the whole thing can be implemented with one or two extra lines of code. The expensive part is to write the documentation and the new tests. |
This week I'm preoccupied with some commitments. If you have the bandwidth to tackle this soon, that would be fantastic. Otherwise, I'd be happy to contribute a pull request, potentially after one or two weeks. Of course, if anyone else has the capacity to jump in sooner, that would be great as well. |
This will be part of the next release. |
Hi, I've been exploring simdjson's API, particularly focusing on iterating over JSON objects and accessing their keys. From my understanding and current experimentation, it appears there are several methods to access keys but with certain limitations:
field::key()
returns asimdjson::ondemand::raw_json_string
, which provides the raw key but not in a directly usablestd::string_view
format without further processing.field::unescaped_key()
does return an unescaped key asstd::string_view
, which is great but involves the overhead of unescaping.key_raw_json_token
, which retains the quotes.Considering the efficiency and ease of use of
std::string_view
in C++ for string operations, I'm wondering why there isn't a method likefield::key_sv()
(or any similarly named method) that directly returns astd::string_view
of the key, akin tofield::key()
but without the need to deal with raw_json_string or token specifics. This addition would streamline operations that can benefit from the lightweight and efficient nature ofstd::string_view
, especially in scenarios where string manipulation or comparisons are frequent.Is there a technical or design rationale for this absence, or could this be considered for future implementation? Such a feature would significantly enhance usability for developers focusing on performance-critical applications.
The text was updated successfully, but these errors were encountered: