Replies: 6 comments 26 replies
-
I just saw how raw_json_string::unsafe_is_equal() is implemented... I guess that I could specify SIMDJSON_PADDING to my lexers... All their tokens are all much smaller than SIMDJSON_PADDING... |
Beta Was this translation helpful? Give feedback.
-
A raw_json_string is basically meant to support comparisons (
The next simdjson will include a new method to allow you to get exactly that... it will be called
Almost all data structures in On Demand are lightweight and meant to be passed by value, not by reference. However, I think that the problem you are encountering is not with simdjson per se. It is a C++ language issue. The following C++ code is invalid: #include <vector>
std::vector<int> get(int num) {
return std::vector<int>(num);
}
int f(std::vector<int>& ref) {
return ref.size();
}
int test() {
return f(get(10));
} If you want to pass a reference, you first have to create a value to reference: #include <vector>
std::vector<int> get(int num) {
return std::vector<int>(num);
}
int f(std::vector<int>& ref) {
return ref.size();
}
int test() {
auto value = get(10);
return f(value);
} (This code is now valid.) The same applies with simdjson types. To pass an array to a function, just pass it by value. There is typically no point to the pass-by-reference paradigm in simdjson. Example: #include "simdjson.h"
#include <iostream>
// prints the content of the array as hexadecimal 64-bit integers
void f(simdjson::ondemand::array v) {
for(uint64_t val : v) {
std::cout << "0x" << std::hex << val << std::endl;
}
}
int main(void) {
simdjson::padded_string json = R"( [ 897314173811950000, 3122321 ])"_padded;
simdjson::ondemand::parser parser;
simdjson::ondemand::document doc = parser.iterate(json);
f(doc.get_array());
return EXIT_SUCCESS;
}
|
Beta Was this translation helpful? Give feedback.
-
Hi Daniel, yes... key_raw_json_token() is going to be very appreciated in my simdjson usage... in the meantime, I have discovered raw_json... Which I think I can do a good usage of it because I know for sure that the vast majority of strings in my documents contains zero escape chars... perhaps another suggestion... maybe having a function the trim the quotes could be handy... I see myself doing it all over the place as soon as I start using raw_json()... I guess the main reason for keeping them is to make the string a valid json document by itself... (especially when called on arrays or objects) You have nailed correctly what I am encountering as attempting to pass an rvalue reference to a function expecting a reference... another unintended consequence of your very elegant solution for allowing using the API with or without exceptions is that it makes it harder to use the auto keyword with return value... Concerning the simdjson lightweight object philosophy, I'll try to consider these objects a little bit like string_view... I will need to clarify that point because this is a little bit confusing... FYI, my migration to simdjson is doing well... I have so far migrated well contained and small json parsers... possibly to get the hand on how to use well simdjson... no performance sensitive json handling has been touched yet. The performance sensitive part is the core of the app where there is the most json parsing code. I am keeping this part last because it is going to be the bulk of the migration. The biggest hardest and the longest. That part, there is no partial migration possible. It is an all or nothing situation. I have created a thin abstraction layer using type erasure that allows me to switch from one lib to another at compile time in case there is a future need for that. One thing that I can say so far from my experience... It is that the binary size is significantly increasing. I am a bit surprised by that because
one thing that I can say, is that I am keeping the debug symbols in to make possible core dumps easy to analyze... maybe code size is not that much bigger but using simdjson generates a lot of debug symbols.... I am not sure if you can comment on my last observation... about what you know on the binary generated size from using simdjson compared to other json libs |
Beta Was this translation helpful? Give feedback.
-
I have another question popping out in my mind... lets say that I have an object... I fetch the first field with find_field. ie: find_field("event") next depending on the value of that field, the object is dispatched to different functions. Each function then are going to process the remaining fields. (JSON polymorphism?) What is the best approach to continue the fields traversal to where the object was left at? would a If I know that event is going to be the first field. Would it better to access it with an iterator and pass the iterator to the functions to continue the iteration? |
Beta Was this translation helpful? Give feedback.
-
you are correct but my understanding with get_string() is that the whole string is going to be scanned for possible escaping substitution. idk if you think the idea unreasonable but I think that there would be an audience for having a raw string without quotes... I did not dig deep enough in simdjson code to figure out if quotes are indexed along the other important json markers... so maybe it is possible to leverage simd magic to make that operation fast... Otherwise, the best way that I know to do it is to use search for the last quote char and trim the string_view from there up to the end... I did not make any benchmark but I would think that this is faster than performing the regular escape processing... if you tell me that you feel it is a good idea that you would like to see added into the project... I could look into it... |
Beta Was this translation helpful? Give feedback.
-
FYI, I have completed my migration from rapidjson to simdjson... The difference is not immediately obvious since my program parse very small json packets. The speed gain might be in the order of uSecs while there is a standard deviation in the network RTT 0.5 msec. So basically any speed gain is not immediately obvious. CPU usage may have been reduced but this has not been scientifically measured. Maybe my new simdjson code will shine during the occasional packet burst but I'll need another 24-48h to conclude anything in that regard... Bottomline, I am glad to have replaced that ugly SAX code. I think, it was worth the move only for code maintainability and for the easy of writing new JSON code in the future... |
Beta Was this translation helpful? Give feedback.
-
I am currently in the process to migrating my app code from rapidjson to simdjson... it is long tedious and boring but the task is progressing well and I am getting better...
it is very satisfying to be able to simplify the SAX style code and replace dozens of stateful methods with a simple loop function...
I wrote my first custom type template method full specialization!
My first difficulty or incomprehension is:
I am getting weird typecast compiler errors...
lets say I create a function taking an ondemand::array reference... then I call
func(field.value().get_array())
the compiler will complain that it cannot convert an ondemand::implementation::icelake::array rvalue to a reference to ondemand::array...
I guess this is some sort of implementation artifact. If I pin the return value to a local variable, I can then pass a reference to that local variable. I am just confused about having to create a local copy of the return value when the documentation is specifically warning about not make copies...
I am having a hard time working with raw_json_string... A common idiom that I have in my code is if the key is unrecognized, I am logging it.
The API expose an ostream operator but it is forcing me to do some gymnastic because my logging subsystem is more of the form of printf style...
Another usage that I do with JSON keys, it is to pass them to a lexer... raw_json_string would be workable but it does not expose any length info... Having that would be very useful... The len does not need to point precisely the end of the key but some arbitrary limit would be fine. I would be happy the length to be set to the next marker/pointer simdjson has in its indexes data...
for now I guess that I can provide the lenght of the longest token in the lexer specifications... It should be safe since the quote char appears nowhere in the specs therefore if there is no match, the lexer will not go further than the closing quote...
yeah... I just wanted to let you know that in my opinion... there is a very small something missing to raw_json_string to make it really user friendly...
Beta Was this translation helpful? Give feedback.
All reactions