Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Q: few questions about json construction #1092

Closed
crusader-mike opened this issue May 16, 2018 · 20 comments
Closed

Q: few questions about json construction #1092

crusader-mike opened this issue May 16, 2018 · 20 comments

Comments

@crusader-mike
Copy link

crusader-mike commented May 16, 2018

I was digging in json.hpp (v2.1.0) in order to understand how stuff gets created and check for unnecessary copies/allocations in simple cases like:

json x = { {"val1", 1}, {"val2", true} };

As I see it:

  • basic_json has expensive copy semantics as evidenced by it's cctor and op=:
    basic_json(const basic_json& other)
        : m_type(other.m_type)
    {
        ...
        switch (m_type)
        {
            case value_t::object:
            {
                m_value = *other.m_value.object;   <-- mem alloc + cctor call
                break;
            }
        ...

    reference& operator=(basic_json other) ...    <-- note lack of const&
    {
        ...
        using std::swap;
        swap(m_type, other.m_type);
        swap(m_value, other.m_value);
  • which means in my example we will end up copying the same data many times (depending how deep in hierarchy given piece of data is)

  • {"val1", 1} will be used to create a basic_json of type array that contains two basic_json objects of types string and whatever respectively (underlying vector will be allocated on heap, same for that string)

  • then it will be used to populate an std::map entry in basic_json object of type object here:

    basic_json(std::initializer_list<basic_json> init,
               bool type_deduction = true,
               value_t manual_type = value_t::array)
    {
        ...
            std::for_each(init.begin(), init.end(), [this](const basic_json & element)
            {
                m_value.object->emplace(*(element[0].m_value.string), element[1]);
            });

where m_value.object is of type std::map<string, basic_json> and emplace will end up passing a string& to related constructor (which will have to make another copy of a string)

  • and at some point initializer_list instances will be destroyed causing destruction of related basic_json objects and aforementioned vector and string objects will be deallocated

Please, correct me if this is incorrect or I missed smth important.

Now, as I see it there is a quick and easy way to avoid few copies:

m_value.object->emplace(std::move(*(element[0].m_value.string)), std::move(element[1]));

Q1: do you think it is ok? if not -- why?

Q2: this process has a lot of extra allocations (basic_json copies, underlying vectors and strings) -- is there an approach to construct my json object and avoid them?

Q3: pushing it further a bit -- inserting new sub-element into object means std::map lookup, can I avoid it somehow? For example by providing sub-elements in proper order and forcing related std::map::emplace/insert to use end() as a hint.

@crusader-mike
Copy link
Author

crusader-mike commented May 16, 2018

Q1: I've realized that std::initializer_list::begin/end() return T const* which can be const_casted away with (very remote) possibility of problems if compiler somehow manages to put pre-created basic_json objects into read-only memory. This will never happen, but from purity standpoint I could see how this might be an obstacle...

@jaredgrubb
Copy link
Contributor

I'll take a stab at this.

The lack of "const&" on the "assignment operator" is ok. This is a technique to avoid having to define two assignment operators (one lvalue const-ref, one rvalue-ref) that have identical implementations. Instead you pass-by-value and then rely on move+swap to get about the same thing.

Q1: No, this doesn't work because you cannot move-from a std::initializer_list :( [see https://stackoverflow.com/questions/8193102/initializer-list-and-move-semantics#8193157]

If you're seeing a performance penalty using initializer-ilst construction, then you should use the other ways (eg, create an empty json::object, and then load it iteratively with the things). The initializer_list is a convenience but it will be faster to avoid it.

Q3: if you create a json::object, you can use the 'insert' function to insert a range of elements.

@crusader-mike
Copy link
Author

@jaredgrubb I am pretty sure it is ok to const_cast in this case -- basic_json object is non-trivial object with non-constexpr ctor, compiler simply can't pre-compute it (which is a requirement to be able to stuff it into read-only segment). But, yeah... purists will disagree

If you're seeing a performance penalty using initializer-ilst construction, then you should use the other ways

I apologize for my ignorance of nlohmann/json API, but what would be the alternative way to code this:

json x = { {"val1", 1}, {"val2", true} };

?

@crusader-mike
Copy link
Author

crusader-mike commented May 16, 2018

Nevermind, I think I pulled it apart:

  • Q1: can be done, but iffy (see above)

  • Q2: more efficient way would be smth like this:

json j;
j.emplace("val1", 1);
j.emplace("val2", true);
  • Q3: there is no (generic enough) ranged insert, nor hinted insert -- no way to avoid std::map lookups during basic_json population

@jaredgrubb
Copy link
Contributor

Although casting away the const may appear to work, it's not legal to do so. You might get away with it on some versions of some compilers, but you should not rely on it. Seriously: avoid the temptation to work around the system on things like this. The more "hacks" you use, karma will bite you hard one day! (To be more technical about this, casting away the const is undefined behavior; eg, https://stackoverflow.com/questions/4042626/const-cast-and-ub).

@crusader-mike
Copy link
Author

crusader-mike commented May 17, 2018

It is certainly legal to do if you can prove that given reference points to a modifiable object in this particular case. That SO link doesn't tell the whole picture and there is a good reason why standard say "may" instead of "shall".

In case of initializer_list -- yes, you can argue that extremely smart compiler can potentially figure out details and put your list of very trivial basic_json objects into read-only segment. But in practice it will never happen -- not until you change your basic_json to be (smth like) POD with constexpr ctor.

@jaredgrubb
Copy link
Contributor

If you can prove a very specific thing like you describe about type definitions you control (or types whose behavior is explicitly defined), then const_cast is legal -- that's what it's for.

However, std::initializer_list is not a type you control and the standard says nothing about how it is implemented or how its elements are stored -- as you concede, smart compilers and library authors are permitted to be very clever.

Look, I am not arguing with your intuition here. I agree that "in practice", it'll probably just work on almost every compiler and library you try it on. Your reasoning is sound -- but that is not the same thing as whether you should write code that does it (answer: absolutely not; intentionally writing UB just because you can prove it works is not worth it).

Hopefully, one day, the standard will permit non-const initializer lists. It would be valuable. I'm sure we're not the first ones to want this, and there could be good reasons why it is not done (yet) or is impossible, and there may be no good reasons and it's just waiting for someone with enough initiative to shepherd it through.

@crusader-mike
Copy link
Author

crusader-mike commented May 17, 2018

it'll probably just work on almost every compiler and library you try it on.

I bet $50 -- as of now it works on every compiler that can compile this library :-)

However, std::initializer_list is not a type you control and the standard says nothing about how it is implemented or how its elements are stored ...

Actually standard does say few things about that -- for example elements are stored in contiguous memory.

You could also imply certain behavior even if standard doesn't promise it specifically. For example, the fact that initializer_list::begin() returns T const* implies that you can call it's const methods, which (as we know) can modify object state. So, unless compiler can prove that we don't -- it can't put that object in read-only memory and/or pool it (see "string pooling").

Can we guarantee that compiler won't be able to find out? I bet there is a way to do that (some weird method declaration?)... I don't know. For example, I can provide some level of assurance by declaring a "do-nothing" method (implemented in another translation unit) and call it. This will certainly work for any compiler that doesn't do link-time optimization.

In any case -- std::initializer_list definitely needs some improvement in standard... Pretty sure they'll add something along the lines of "non-trivial objects should not be stored in read-only memory". Or maybe it is already implied in some obscure corner of standard.

@jaredgrubb
Copy link
Contributor

Well, the compiler won't stop you from writing non-portable or UB. I am a firm believer that playing with fire will get you burned, but do as you like!

@nlohmann
Copy link
Owner

The handling of initializer lists has been changed a bit in version 3.0.0 (see #663). I'm not sure whether it is relevant to this discussion, but I wanted to still let you know.

@crusader-mike
Copy link
Author

lol... No, actually, HA-HA!!

Hey, @jaredgrubb what where you telling me about non-portability or UB again? I am sorry, you were too far up on your high horse, I didn't hear it very well... Check out this and this.

I find it quite amusing -- you ended up using extra proxy class, while all you needed is a one-liner with exactly the same "UB". :)

@nlohmann
Copy link
Owner

@crusader-mike please remain constructive.

@nlohmann
Copy link
Owner

What do you propose to improve the code?

@jaredgrubb
Copy link
Contributor

@crusader-mike Just for the record, please note that I am not an official representative of this project -- just an avid and eager user, supporter, and occassional contributor. I don't want any confusion that my opinions are those of the official maintainers.

I spent some time looking at that proxy class, and in my personal opinion, it looks great. I will remember that approach the next time I need to work around the lack of move support in std::initializer_list.

That said, I'll bow out of participating in this thread further since my engagements have not been helpful for you.

@crusader-mike
Copy link
Author

@jaredgrubb Oh, come on, dude -- I think this situation ended up being quite amusing. Happened to me many times in my life -- where I was absolutely sure in my correctness and yet somehow "failed to win". :)

@nlohmann Code is ok -- in light of my findings I am unlikely to use initializer_list-related functionality anyway. You may consider removing proxy class and replacing it with aforementioned one-liner (+const_cast) for simplicity. This won't fix problems mentioned in Q2. (and of course it may not work at all for some reason).

For Q2 you may consider changing proxy class -- make it represent (name+value) pair (instead of having two instances of proxy class). Not sure how it is going to play C++ rules related to brace-initializations, etc -- there is a chance it may just work. I.e. ideally this:

json x = { {"val1", 1}, {"val2", true} };

should result in creation of only two proxy objects where each one is (conceptually) a pair<string, basic_json> -- in related basic_json ctor you just move these elements to destination std::map instance. No wasted allocations this way...

There will be some trickery involved as (I assume) you'll also want to support constructs like this:

json x = { 1, 2, 3, 4}; // <-- builds array

tbh, I don't know about every usage pattern this library supports -- you should accept my recommendations with grain of salt.

@crusader-mike
Copy link
Author

@nlohmann another thing you may consider is storing std::string by value (in basic_json) -- modern implementation of std::string use SSO, you'll end up increasing size of basic_json a bit, but improving performance in typical case (of short field names) and half of Q2 problems gone,

@crusader-mike
Copy link
Author

... almost the same for std::vector -- if you hold it by value, you'll avoid one allocation in Q2. Unfortunately vector doesn't have any SSO equivalent, but if it had one (for size 2 or less) -- this would have solved second half of Q2 problems completely.

@nlohmann
Copy link
Owner

@jaredgrubb Thank you for your contributions! I do find them helpful, and I would like to keep you in the conversation - especially, since I lost track about that proxy class myself.

@crusader-mike I think your #1092 (comment) was lacking respect. Please note we have a code of conduct in this project. It's not about "being right" or "proving someone wrong" - it should be about helping each other out and improving the project.

@crusader-mike
Copy link
Author

Guys, you need a vacation. Seriously.

@jaredgrubb
Copy link
Contributor

@nlohmann No need for thanks, I really appreciate the hard work you do in maintaining this awesome project. It's popular for a good reason!

@crusader-mike I love C++ and coming here is a kind of "vacation" to engage in interesting applications of nerdy edge cases of rvalues and such. But unless we're debating over a beer and have know each other a while, personal jabs ("high horses" and "vacations") will distract from your technical arguments. I encourage you to raise the professionalism a notch and you'll find a larger and more receptive audience for your ideas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants