-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decoding of string with null-byte. #108
Comments
A solution could be to change the linkhash fonction to accept a c_string (char *str + int len) as key (using _ex function and keep current function as a wrapper, to keep stable API) What do you think? |
Extending the linkhash code to take a c_string won't be a problem, since you can already choose the types of keys that a particular hash table stores. That part should be just a matter of defining a lh_c_string_hash() function, and then allocating the jso->o.c_object table with that. However, that will increase the memory usage and might slightly slow things down too, so I'm reluctant to have it enabled all the time. |
I have been bitten by the same problem: I receive JSON from external sources over which I have no control. Sometimes the source sends "\u0000" as part of a key. It would be really great if json-c could handle this, as it is perfectly valid JSON. In my opinion, it's a bug, and not a missing feature. I'd rather have a small performance impact on parsing, vs. not being able to handle embedded null characters correctly. A quick and dirty hack to at least not lose a lot of the key part would be to simply avoid unescaping \u0000 when parsing json. Or provide a parser callback so the user can specify how to handle embedded null characters. |
I don't care much for that hack. You'll run into issues if someone includes \u0000 in the string. |
Sorry for re-viving this old thread. I came accross the NUL byte problem when working on some performance enhancements. It looks like json-c as a producer supports them, so there is indeed some inconsistency. But what I wonder mostly is how would one expect to be able to work with them in a generic type of API? All APIs seem basically to work with C strings. So let's assume a NUL is read in. How would you expect to pass this buffer to a caller? Of course, you can pass buffer pointer and size, but is that what a typical C program expects? I would really appreciate some feedback on this issue. |
Well, no, a C program written to work with "normal", nul terminated strings will not expect a buffer and size. As I mentioned before, there would need to be "c_string" (i.e. the buffer+len data structure) variants of all of the json-c API functions. If you wanted to take advantage of any features that allow for embedded nul characters your C code would need to change to use those new (not yet written) functions. |
Well, I have totally no desire to use NUL bytes unencoded in my program. But if we do not want to support that in json-c, we could officially state so. In this case, some simplifications could be done inside the code base (there is even a test for NUL byte encoding in the testbench...). |
I think you misunderstand me. I am not saying that json-c should not support that, I'm saying that it will be a significant change to fully do so. I expect that adding c_string variants of all the API functions will actually mean changing all internal handling of strings to use c_string, and turning the current asciiz API calls to be wrappers around the c_string ones. However, I haven't actually spent any time to evaluate the actual scope of the change, so who knows, maybe it'll be easy. :) |
About this issue, I want to know do you have one mature and preferred amendment. Can you give me some advice? |
So let me get this straight. In the current debian sid, widely diverse and important programs happen to depend this, your json-c library. Like libcryptsetup12 (which is actually vital for systemd), sway, thunderbird, and others. Yet here you are, trying for going on 10 years now, to fix this pesky issue about actually concurring with the JSON standard in terms of null-bytes in strings/keys. And to really get across how much you aim for standard conformity, you decide to say so within the very first sentences of your README.md file. You know, JSON, the gargantuan notation format which took the Ecma International an unbelievable 16 site pdf to write down, though most of these sites are either blank or contain large, colorful state-machine representations, to really drive home how it all operates. Yet even the most colorful pictures in this boundless sea of words didn't convey to you that it might not be the smartest idea to use standard c-strings that end with a null-byte themselves? Is that about the gist of it? Many thanks |
Oh, and of cause that's a "new feature" too, figures. |
json-c, just like many other projects, and the C language itself, is constrained by its history. Are there ways that nul bytes could be handled, and different APIs to allow for access to JSON objects that contain them? Sure, of course, but those would be very different from the existing APIs that are depended on by, as you say, @MikuChan03, clearly you're quite frustrated by this limitation. I wish there was a magic bullet to address json-c's inability to handle embedded nul bytes, but unfortunately I don't believe that's possible. If you'd like to start a conversation over on our discussion forum (https://groups.google.com/forum/#!forum/json-c) about the details of the problems you've run into, perhaps we could figure out a workaround. Alternately, if you're willing to spend some time hacking on json-c itself (after all, this is an entirely volunteer, open-source project), I'd be happy to work with you to try to figure out some approaches. |
Hello, Just my 2c |
@MikuChan03 There is an open merge request to fix this (#715), it should properly handle null bytes in keys. I'm just not sure if hawicz saw the request yet |
String may contains null byte, ex "foo\u0000bar"
While json-c 0.11 allow to decode such value, I don't find an easy solution to decode such key in an object.
Ex : {"foo\u0000bar":"bar\u0000baz"}
Related to remicollet/pecl-json-c#7
The text was updated successfully, but these errors were encountered: