Location and lifetime of ondemand strings #2087
Replies: 5 comments 11 replies
-
If you specifically as for a raw JSON, then it points that the input ( In other words, if it is possible for us to just point at the input, then we do so. If you are asking for the string value of a JSON string, then, in general, it needs to be escaped (e.g,
You can store them directly into an |
Beta Was this translation helpful? Give feedback.
-
It sounds like you want to keep the strings around from one document while you parse other documents? The parser.unescape() API lets you manage the buffer yourself when the parser's lifetime isn't working. simdjson allocates a single buffer that is large enough to include all possible strings that could be dumped into it. This is probably what you'd want to do as well. You need to hold on to the original pointer so you can free it, of course, but you'll also need a pointer to where simdjson should write the next string. uint8_t* const string_buf = malloc(ENOUGH_SPACE_TO_HOLD_ALL_THE_STRINGS);
uint8_t* string_buf_end = string_buf; Anytime you are about to get a string_view, either by casting to string_view or using string_view name = parser.unescape(json_obj["name"], string_buf_end);
string_view description = parser.unescape(json_obj["description"], string_buf_end); This will unescape name and description into string_buf, one after the other, and modifies string_buf_end to point after the description. Even if the parser is destroyed or used to parse a new document, the string_views remain valid as long as string_buf is valid and doesn't get overwritten. We could make this easier, but I think this will do what you're after if you're really looking for the least number of allocations possible. |
Beta Was this translation helpful? Give feedback.
-
I see - thank you for the clarification! I'll take this approach if I cannot grab the string without the escape sequence decoding, which I don't think I need/will make use of as I replied above. |
Beta Was this translation helpful? Give feedback.
-
Just make sure that when you write the string back out to json, you write it raw: if you use a json writing library, by default it will escape the escape characters, turning |
Beta Was this translation helpful? Give feedback.
-
@SpeedyCraftah You may enjoy this code written by @anonrig for the Node.js runtime environment: It basically does what @jkeiser and I were discussing: it grabs raw JSON that does not need to be processed immediately and store it in a structure. It can be parsed later as JSON. As with everything @anonrig does, it is great code. It is going to be stupidly efficient compared to the obvious alternative of deserializing->serializing->deserializing. |
Beta Was this translation helpful? Give feedback.
-
I'm looking to parse a JSON object with the ondemand API, but I'm not sure when it comes to string_view's.
The documentation mentions that string_view's may either point to the source JSON string, or to an internal temporary buffer, is there a set of rules or specific operations which stores the string in one or the other?
I need the processing to have minimal memory allocations and copying, and so I'm not sure if I should copy these parsed strings to somewhere else, in my use case, the input JSON string will always outlive the next
iterate
call on another JSON string, so if string_view's were to be pointing to the input JSON string, this wouldn't be a problem, but since I may access different elements in the object and potentially even iterate another JSON string before needing the value of that string_view, I'm worried that if that string does end up in the temporary buffer before I manage to do things with the contents, I would need to copy it somewhere else immediately on reading it, which I want to avoid.Thanks!
Beta Was this translation helpful? Give feedback.
All reactions