Decoding of string with null-byte. #108

remicollet · 2013-09-26T05:11:43Z

String may contains null byte, ex "foo\u0000bar"

While json-c 0.11 allow to decode such value, I don't find an easy solution to decode such key in an object.

Ex : {"foo\u0000bar":"bar\u0000baz"}

remicollet · 2013-09-26T05:27:11Z

A solution could be to change the linkhash fonction to accept a c_string (char *str + int len) as key (using _ex function and keep current function as a wrapper, to keep stable API)

What do you think?

hawicz · 2014-03-03T04:33:31Z

Extending the linkhash code to take a c_string won't be a problem, since you can already choose the types of keys that a particular hash table stores. That part should be just a matter of defining a lh_c_string_hash() function, and then allocating the jso->o.c_object table with that. However, that will increase the memory usage and might slightly slow things down too, so I'm reluctant to have it enabled all the time.
Another difficult part of this will be the json_object_object_{get_ex,add,del,foreach} functions in json_object.h which work with normal C strings (char *'s). We'd need to add c_string versions of each of these.
Also, when the c_string hash table isn't enabled, parsing something with embedded 0 characters should fail, as should attempting to use the new c_string functions.

skaes · 2014-04-24T08:36:22Z

I have been bitten by the same problem: I receive JSON from external sources over which I have no control. Sometimes the source sends "\u0000" as part of a key. It would be really great if json-c could handle this, as it is perfectly valid JSON.

In my opinion, it's a bug, and not a missing feature. I'd rather have a small performance impact on parsing, vs. not being able to handle embedded null characters correctly.

A quick and dirty hack to at least not lose a lot of the key part would be to simply avoid unescaping \u0000 when parsing json. Or provide a parser callback so the user can specify how to handle embedded null characters.

hawicz · 2014-04-24T14:38:24Z

I don't care much for that hack. You'll run into issues if someone includes \u0000 in the string.
Also, I don't think parser callbacks will help if there isn't already a way for json-c to do something reasonable with the nul-containing string. However, if you've got an API and example usage (or even better some actual code) then I'm all ears.

rgerhards · 2015-11-17T13:27:14Z

Sorry for re-viving this old thread. I came accross the NUL byte problem when working on some performance enhancements. It looks like json-c as a producer supports them, so there is indeed some inconsistency. But what I wonder mostly is how would one expect to be able to work with them in a generic type of API? All APIs seem basically to work with C strings. So let's assume a NUL is read in. How would you expect to pass this buffer to a caller? Of course, you can pass buffer pointer and size, but is that what a typical C program expects? I would really appreciate some feedback on this issue.

hawicz · 2015-11-17T18:01:39Z

Well, no, a C program written to work with "normal", nul terminated strings will not expect a buffer and size. As I mentioned before, there would need to be "c_string" (i.e. the buffer+len data structure) variants of all of the json-c API functions. If you wanted to take advantage of any features that allow for embedded nul characters your C code would need to change to use those new (not yet written) functions.

rgerhards · 2015-11-17T18:07:49Z

Well, I have totally no desire to use NUL bytes unencoded in my program. But if we do not want to support that in json-c, we could officially state so. In this case, some simplifications could be done inside the code base (there is even a test for NUL byte encoding in the testbench...).

hawicz · 2015-11-18T01:27:25Z

I think you misunderstand me. I am not saying that json-c should not support that, I'm saying that it will be a significant change to fully do so. I expect that adding c_string variants of all the API functions will actually mean changing all internal handling of strings to use c_string, and turning the current asciiz API calls to be wrappers around the c_string ones. However, I haven't actually spent any time to evaluate the actual scope of the change, so who knows, maybe it'll be easy. :)
Clearly, something should be done, and if not the full conversion to c_string, then it would be a good idea to at least cause embedded \u0000's to result in an error, perhaps with "quick and dirty hack" options (as @skaes said) to either pass those through as-is, or re-enable the current rather broken behavior.

dota17 · 2020-01-16T12:48:03Z

About this issue, I want to know do you have one mature and preferred amendment. Can you give me some advice？

MikuChan03 · 2021-06-06T22:58:01Z

So let me get this straight. In the current debian sid, widely diverse and important programs happen to depend this, your json-c library. Like libcryptsetup12 (which is actually vital for systemd), sway, thunderbird, and others. Yet here you are, trying for going on 10 years now, to fix this pesky issue about actually concurring with the JSON standard in terms of null-bytes in strings/keys. And to really get across how much you aim for standard conformity, you decide to say so within the very first sentences of your README.md file. You know, JSON, the gargantuan notation format which took the Ecma International an unbelievable 16 site pdf to write down, though most of these sites are either blank or contain large, colorful state-machine representations, to really drive home how it all operates. Yet even the most colorful pictures in this boundless sea of words didn't convey to you that it might not be the smartest idea to use standard c-strings that end with a null-byte themselves? Is that about the gist of it?

Many thanks

MikuChan03 · 2021-06-06T23:23:39Z

Oh, and of cause that's a "new feature" too, figures.

hawicz · 2021-06-27T18:52:41Z

json-c, just like many other projects, and the C language itself, is constrained by its history. Are there ways that nul bytes could be handled, and different APIs to allow for access to JSON objects that contain them? Sure, of course, but those would be very different from the existing APIs that are depended on by, as you say, "widely diverse and important programs". Even with boundless development resources within json-c we would not be able to simply drop the existing, not-nul-safe API functions, as that would render the library useless for any existing dependents.
C programs almost always deal in zero terminated strings, and while you might rail against the limitations of that data structure, the fact is that it is the common, expected convention for those programs. Any attempt to ditch that must be balance the benefit of being able to handle embedded nul's against a developer's lack of familiarity with such an API, and thus a far steeper learning curve for any dev to use json-c,, plus likely some additional overhead within the library itself.

@MikuChan03, clearly you're quite frustrated by this limitation. I wish there was a magic bullet to address json-c's inability to handle embedded nul bytes, but unfortunately I don't believe that's possible. If you'd like to start a conversation over on our discussion forum (https://groups.google.com/forum/#!forum/json-c) about the details of the problems you've run into, perhaps we could figure out a workaround. Alternately, if you're willing to spend some time hacking on json-c itself (after all, this is an entirely volunteer, open-source project), I'd be happy to work with you to try to figure out some approaches.

tu-alex-rico · 2021-07-16T14:11:30Z

Hello,
Some time ago I had an issue with the same problem however not with json-c
What I did was putting a middle layer to encode/decode in base64

Just my 2c

Hex052 · 2021-07-18T02:30:03Z

@MikuChan03 There is an open merge request to fix this (#715), it should properly handle null bytes in keys. I'm just not sure if hawicz saw the request yet

remicollet mentioned this issue Sep 26, 2013

json_decode: strings cut off after first null-byte remicollet/pecl-json-c#7

Open

hawicz added the new-feature label Jun 8, 2016

bperian mentioned this issue Apr 11, 2017

Logstash errors when parsing data from ntopng ntop/ntopng#1144

Closed

dota17 added a commit to dota17/json-c that referenced this issue Jan 9, 2020

Solve issue json-c#108. Skip \u0000 while parsing.

f659c6d

dota17 added a commit to dota17/json-c that referenced this issue Jan 9, 2020

Solve issue json-c#108. Skip \u0000 while parsing.

3402fa2

dota17 added a commit to dota17/json-c that referenced this issue Jan 9, 2020

Solve issue json-c#108. Skip \u0000 while parsing.

112a5ca

dota17 added a commit to dota17/json-c that referenced this issue Jan 10, 2020

Solve issue json-c#108. Skip \u0000 while parsing.

a5f529d

dota17 mentioned this issue Jan 10, 2020

Solve issue #108. Skip \u0000 while parsing. #528

Closed

Hex052 mentioned this issue Jul 7, 2021

Allow null character in keys #715

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decoding of string with null-byte. #108

Decoding of string with null-byte. #108

remicollet commented Sep 26, 2013

remicollet commented Sep 26, 2013

hawicz commented Mar 3, 2014

skaes commented Apr 24, 2014

hawicz commented Apr 24, 2014

rgerhards commented Nov 17, 2015

hawicz commented Nov 17, 2015

rgerhards commented Nov 17, 2015

hawicz commented Nov 18, 2015

dota17 commented Jan 16, 2020

MikuChan03 commented Jun 6, 2021

MikuChan03 commented Jun 6, 2021

hawicz commented Jun 27, 2021

tu-alex-rico commented Jul 16, 2021

Hex052 commented Jul 18, 2021

Decoding of string with null-byte. #108

Decoding of string with null-byte. #108

Comments

remicollet commented Sep 26, 2013

remicollet commented Sep 26, 2013

hawicz commented Mar 3, 2014

skaes commented Apr 24, 2014

hawicz commented Apr 24, 2014

rgerhards commented Nov 17, 2015

hawicz commented Nov 17, 2015

rgerhards commented Nov 17, 2015

hawicz commented Nov 18, 2015

dota17 commented Jan 16, 2020

MikuChan03 commented Jun 6, 2021

MikuChan03 commented Jun 6, 2021

hawicz commented Jun 27, 2021

tu-alex-rico commented Jul 16, 2021

Hex052 commented Jul 18, 2021