Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decoding of string with null-byte. #108

Open
remicollet opened this issue Sep 26, 2013 · 14 comments
Open

Decoding of string with null-byte. #108

remicollet opened this issue Sep 26, 2013 · 14 comments

Comments

@remicollet
Copy link
Contributor

String may contains null byte, ex "foo\u0000bar"

While json-c 0.11 allow to decode such value, I don't find an easy solution to decode such key in an object.

Ex : {"foo\u0000bar":"bar\u0000baz"}

Related to remicollet/pecl-json-c#7

@remicollet
Copy link
Contributor Author

A solution could be to change the linkhash fonction to accept a c_string (char *str + int len) as key (using _ex function and keep current function as a wrapper, to keep stable API)

What do you think?

@hawicz
Copy link
Member

hawicz commented Mar 3, 2014

Extending the linkhash code to take a c_string won't be a problem, since you can already choose the types of keys that a particular hash table stores. That part should be just a matter of defining a lh_c_string_hash() function, and then allocating the jso->o.c_object table with that. However, that will increase the memory usage and might slightly slow things down too, so I'm reluctant to have it enabled all the time.
Another difficult part of this will be the json_object_object_{get_ex,add,del,foreach} functions in json_object.h which work with normal C strings (char *'s). We'd need to add c_string versions of each of these.
Also, when the c_string hash table isn't enabled, parsing something with embedded 0 characters should fail, as should attempting to use the new c_string functions.

@skaes
Copy link

skaes commented Apr 24, 2014

I have been bitten by the same problem: I receive JSON from external sources over which I have no control. Sometimes the source sends "\u0000" as part of a key. It would be really great if json-c could handle this, as it is perfectly valid JSON.

In my opinion, it's a bug, and not a missing feature. I'd rather have a small performance impact on parsing, vs. not being able to handle embedded null characters correctly.

A quick and dirty hack to at least not lose a lot of the key part would be to simply avoid unescaping \u0000 when parsing json. Or provide a parser callback so the user can specify how to handle embedded null characters.

@hawicz
Copy link
Member

hawicz commented Apr 24, 2014

I don't care much for that hack. You'll run into issues if someone includes \u0000 in the string.
Also, I don't think parser callbacks will help if there isn't already a way for json-c to do something reasonable with the nul-containing string. However, if you've got an API and example usage (or even better some actual code) then I'm all ears.

@rgerhards
Copy link
Contributor

Sorry for re-viving this old thread. I came accross the NUL byte problem when working on some performance enhancements. It looks like json-c as a producer supports them, so there is indeed some inconsistency. But what I wonder mostly is how would one expect to be able to work with them in a generic type of API? All APIs seem basically to work with C strings. So let's assume a NUL is read in. How would you expect to pass this buffer to a caller? Of course, you can pass buffer pointer and size, but is that what a typical C program expects? I would really appreciate some feedback on this issue.

@hawicz
Copy link
Member

hawicz commented Nov 17, 2015

Well, no, a C program written to work with "normal", nul terminated strings will not expect a buffer and size. As I mentioned before, there would need to be "c_string" (i.e. the buffer+len data structure) variants of all of the json-c API functions. If you wanted to take advantage of any features that allow for embedded nul characters your C code would need to change to use those new (not yet written) functions.

@rgerhards
Copy link
Contributor

Well, I have totally no desire to use NUL bytes unencoded in my program. But if we do not want to support that in json-c, we could officially state so. In this case, some simplifications could be done inside the code base (there is even a test for NUL byte encoding in the testbench...).

@hawicz
Copy link
Member

hawicz commented Nov 18, 2015

I think you misunderstand me. I am not saying that json-c should not support that, I'm saying that it will be a significant change to fully do so. I expect that adding c_string variants of all the API functions will actually mean changing all internal handling of strings to use c_string, and turning the current asciiz API calls to be wrappers around the c_string ones. However, I haven't actually spent any time to evaluate the actual scope of the change, so who knows, maybe it'll be easy. :)
Clearly, something should be done, and if not the full conversion to c_string, then it would be a good idea to at least cause embedded \u0000's to result in an error, perhaps with "quick and dirty hack" options (as @skaes said) to either pass those through as-is, or re-enable the current rather broken behavior.

@dota17
Copy link
Member

dota17 commented Jan 16, 2020

About this issue, I want to know do you have one mature and preferred amendment. Can you give me some advice?

@MikuChan03
Copy link

So let me get this straight. In the current debian sid, widely diverse and important programs happen to depend this, your json-c library. Like libcryptsetup12 (which is actually vital for systemd), sway, thunderbird, and others. Yet here you are, trying for going on 10 years now, to fix this pesky issue about actually concurring with the JSON standard in terms of null-bytes in strings/keys. And to really get across how much you aim for standard conformity, you decide to say so within the very first sentences of your README.md file. You know, JSON, the gargantuan notation format which took the Ecma International an unbelievable 16 site pdf to write down, though most of these sites are either blank or contain large, colorful state-machine representations, to really drive home how it all operates. Yet even the most colorful pictures in this boundless sea of words didn't convey to you that it might not be the smartest idea to use standard c-strings that end with a null-byte themselves? Is that about the gist of it?

Many thanks

@MikuChan03
Copy link

Oh, and of cause that's a "new feature" too, figures.

@hawicz
Copy link
Member

hawicz commented Jun 27, 2021

json-c, just like many other projects, and the C language itself, is constrained by its history. Are there ways that nul bytes could be handled, and different APIs to allow for access to JSON objects that contain them? Sure, of course, but those would be very different from the existing APIs that are depended on by, as you say, "widely diverse and important programs". Even with boundless development resources within json-c we would not be able to simply drop the existing, not-nul-safe API functions, as that would render the library useless for any existing dependents.
C programs almost always deal in zero terminated strings, and while you might rail against the limitations of that data structure, the fact is that it is the common, expected convention for those programs. Any attempt to ditch that must be balance the benefit of being able to handle embedded nul's against a developer's lack of familiarity with such an API, and thus a far steeper learning curve for any dev to use json-c,, plus likely some additional overhead within the library itself.

@MikuChan03, clearly you're quite frustrated by this limitation. I wish there was a magic bullet to address json-c's inability to handle embedded nul bytes, but unfortunately I don't believe that's possible. If you'd like to start a conversation over on our discussion forum (https://groups.google.com/forum/#!forum/json-c) about the details of the problems you've run into, perhaps we could figure out a workaround. Alternately, if you're willing to spend some time hacking on json-c itself (after all, this is an entirely volunteer, open-source project), I'd be happy to work with you to try to figure out some approaches.

@tu-alex-rico
Copy link

Hello,
Some time ago I had an issue with the same problem however not with json-c
What I did was putting a middle layer to encode/decode in base64

Just my 2c

@Hex052
Copy link
Contributor

Hex052 commented Jul 18, 2021

@MikuChan03 There is an open merge request to fix this (#715), it should properly handle null bytes in keys. I'm just not sure if hawicz saw the request yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants