Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvs: return const from cache_entry_get_json() #1269

Merged
merged 6 commits into from Nov 3, 2017

Conversation

Projects
None yet
5 participants
@chu11
Copy link
Contributor

chu11 commented Nov 3, 2017

This is a refactor for the cache_entry_get_json() function to return a const json_t pointer. It's not valid for a caller to modify a json object that is sitting in the KVS cache. The idea to make this const came from #1265, as it took me awhile to hunt down a corruption issue when I began modifying json objects stored in the cache.

Making cache_entry_get_json() return a const pointer had a lot of fallout.

A very large number of treeobj functions had to be made const, which is the right thing to do and is semi-obvious (i.e. functions like treeobj_is_dir() or treeobj_get_count()).

In the few cases that it made no sense for a treeobj function to take or return a const, an alternate function to take / return const was supplied (i.e. treeobj_peek_entry() over treeobj_get_entry()).

In the KVS, there were several major changes that while correct / safe is probably a performance loss in the aggregate. Given more changes coming for the KVS, I think the correct / safe thing is preferable.

Most notably, a large number of json_incref() had to be changed to treeobj_deep_copy() in the KVS internal lookup API. B/c json_incref() is obviously non-const and the caller could modify the json value returned from the KVS cache.

In addition, in the KVS internal commit API, some shallow copies (treeobj_copy()) had to be converted to deep copies (treeobj_deep_copy()) as a shallow copy is non-const. This was probably the one change I disliked the most, as I know its not something we have to do performance wise. But for const safety, it's probably the best thing to do.

This is a patch with both @garlick and @chu11 patches in it, so neither of us can merge. I've already reviewed @garlick's patch and approve.

As an aside, the shallow treeobj_copy() function is no longer needed. We can remove it, but I left it in there for the time being.

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Nov 3, 2017

Oh yeah, @garlick , this comment

treeobj_validate(), which now calls treeobj_peek() to access the
const data member, the data const had to be dropped while iterating
over its directory entries with json_object_foreach(), since
json_object_iter() does not accept a const object.  I'm hopeful
that the iterator does not actually modify the object as it seems
self-contained, and this is just an oversight in the jansson API design.

My suspicion is that it's not const b/c of the potential to later call functions like json_object_iter_set_new. My presumption is its safe to cast away const if you never call functions like that. A comment along those lines may be worthwhile.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Nov 3, 2017

Coverage Status

Coverage decreased (-0.03%) to 78.506% when pulling 3dfbb4a on chu11:issue1264 into 07780af on flux-framework:master.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Nov 3, 2017

one builder hit:

ERROR: t2001-jsc.t - exited with status 137 (terminated by signal 9?)

haven't seen that one before, but I wonder if oom killer went off inside travis VM

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Nov 3, 2017

Codecov Report

Merging #1269 into master will decrease coverage by 0.03%.
The diff coverage is 79.41%.

@@            Coverage Diff             @@
##           master    #1269      +/-   ##
==========================================
- Coverage   77.91%   77.88%   -0.04%     
==========================================
  Files         154      154              
  Lines       29104    29138      +34     
==========================================
+ Hits        22677    22694      +17     
- Misses       6427     6444      +17
Impacted Files Coverage Δ
src/modules/kvs/commit.c 74.56% <100%> (ø) ⬆️
src/modules/kvs/cache.c 88.72% <100%> (ø) ⬆️
src/modules/kvs/kvs.c 62.37% <100%> (-0.26%) ⬇️
src/modules/kvs/lookup.c 80.47% <45%> (-2.25%) ⬇️
src/common/libkvs/treeobj.c 85.54% <93.02%> (+0.72%) ⬆️
src/common/libflux/keepalive.c 86.66% <0%> (-6.67%) ⬇️
src/modules/connector-local/local.c 76.26% <0%> (-1.13%) ⬇️
src/common/libflux/rpc.c 93.38% <0%> (-0.83%) ⬇️
src/common/libflux/response.c 83.73% <0%> (-0.82%) ⬇️
src/common/libkvs/kvs_txn.c 74.71% <0%> (-0.57%) ⬇️
... and 13 more
@garlick

This comment has been minimized.

Copy link
Member

garlick commented Nov 3, 2017

I'll try the soak test and see if there's any obvious performance hit for the additional copying.

@grondo

grondo approved these changes Nov 3, 2017

Copy link
Contributor

grondo left a comment

LGTM, just one very trivial comment (hesitated to even make it).

I was going to also do the soak test but looks like @garlick is already on it.

if (!(str = json_string_value (data)))
return NULL;
return str;
}

This comment has been minimized.

@grondo

grondo Nov 3, 2017

Contributor

Very minor, but any reason not to just

return (json_string_value (data));

? (thought perhaps you meant to set an errno here)

This comment has been minimized.

@garlick

garlick Nov 3, 2017

Member

Yeah I think that should set errno = EINVAL. Thanks!

This comment has been minimized.

@chu11

chu11 Nov 3, 2017

Author Contributor

Doh, I had forgotten the EINVAL part. Thanks for the catch.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Nov 3, 2017

Baseline (master)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.1200  0.1400  0.1400  0.1429  0.1500  0.1900 

tmp 6jolcr5hd4

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Nov 3, 2017

This PR

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.1200  0.1400  0.1400  0.1446  0.1500  0.1900 

tmp zqc6h9whk1

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Nov 3, 2017

Just as a sanity check I ran test_treeobj.t under valgrind and it found some leaks, though they were in the test not in the treeobj code itself.

diff --git a/src/common/libkvs/test/treeobj.c b/src/common/libkvs/test/treeobj.c
index 7301f13..353e29c 100644
--- a/src/common/libkvs/test/treeobj.c
+++ b/src/common/libkvs/test/treeobj.c
@@ -23,6 +23,7 @@ json_t *create_large_dir (void)
             json_decref (dir);
             return NULL;
         }
+        json_decref (ent);
     }
     return dir;
 }
@@ -675,9 +676,11 @@ void test_corner_cases (void)
     ok (treeobj_get_count (val) < 0 && errno == EINVAL,
         "treeobj_get_count detects invalid type");
 
-    ok (treeobj_decode (treeobj_encode (val)) == NULL && errno == EPROTO,
+    char *s = treeobj_encode (val);
+    ok (treeobj_decode (s) == NULL && errno == EPROTO,
         "treeobj_decode returns EPROTO on bad treeobj");
 
+    free (s);
     json_decref (val);
 
     valref = treeobj_create_valref (NULL);

After these fixes the test runs clean, so it is up to you guys if you want to fix the test or not.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Nov 3, 2017

Thanks @grondo, will add that to the treeobj fixes I'm making right now.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Nov 3, 2017

@garlick, can the axes be sized to fit the data on the soak test? Runtimes are obviously ok, but it is difficult to see if anything is going on with the RSS. (doubtful, but hard to see)

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Nov 3, 2017

Sorry, yeah, I'll fix and edit those comments with new graphs. That RSS difference looked significant to me, but it's hard to get a sense of it with those units.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Nov 3, 2017

I pushed some fixes to my issue1264 branch:

commit d2e583a
Author: Jim Garlick garlick.jim@gmail.com
Date: Fri Nov 3 10:13:09 2017 -0700

test/treeobj: fix memory leaks

Fix mem leaks in the test code spotted by @grondo with valgrind.

commit e7d4296
Author: Jim Garlick garlick.jim@gmail.com
Date: Fri Nov 3 10:04:50 2017 -0700

libkvs/treeobj: treeobj_get_symlink set errno

Problem: treeobj_get_symlink() does not set errno if
symlink target is NULL, but sets it in other cases where
NULL is returned.

Set errno = EINVAL and return NULL if symlink target is NULL.

commit 0331256
Author: Jim Garlick garlick.jim@gmail.com
Date: Fri Nov 3 10:03:38 2017 -0700

libkvs/treeobj: drop lame comment on iter vs const

Feel free to cherry pick @chu11.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Nov 3, 2017

Just updated those graphs. It's easier to see the diff now.

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Nov 3, 2017

@garlick thanks for fixes and the soak runs.

I probably should have mentioned I didn't anticipate massive performance slowdown. Only slowdown in the sense that there are more mallocs & frees effectively. Seems there could be a nominal slowdown, but it's within noise of just being even.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Nov 3, 2017

Agreed performance impact looks negligible. The rss effect is worrisome though. I was just trying to walk through the code and see if any of the new copies can be avoided some way...

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Nov 3, 2017

I had an 8000 job soak run going in the background and it finished. The slope of the RSS growth doesn't seem too concerning.

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.1300  0.1500  0.1600  0.1606  0.1700  0.2800 

Max RSS was about 338188

soak-53517

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Nov 3, 2017

my suspicion to the cause of the rss increase is the deep copying being done in commit.c. They were shallow copies before. There's going to be a nice chunk of extra allocations that aren't really necessary. Some experiments verify this.

Mulling over if there's a way to do the shallow copy but keep const-ness (outside of outright casting).
My first thought was if I could some way "unroll" deep copied json objects (i.e. replace newly allocated copies with a copy already exists in the KVS cache), but I don't think that's going to be doable.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Nov 3, 2017

Coverage Status

Coverage decreased (-0.01%) to 78.528% when pulling c936f4b on chu11:issue1264 into 07780af on flux-framework:master.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Nov 3, 2017

Maybe the additional memory is OK if it helps ensure correctness. And maybe we can optimize later.

Couple other comments:

It seems like we can put a const on the data returned by cache_entry_get_raw() for the same reason as get_treeobj().

I'm a little worried about cache_entry_set_raw() and cache_entry_set_treeobj(). Both allow their argument to change after it's been assigned to the cache entry. Maybe they should copy their argument.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Nov 3, 2017

interesting one of the test runs seemed to fail with a new valgrind failure, that looks bogus

==11436== 
==11436== HEAP SUMMARY:
==11436==     in use at exit: 119,792 bytes in 156 blocks
==11436==   total heap usage: 852,195 allocs, 852,039 frees, 79,303,766 bytes allocated
==11436== 
==11436== 288 bytes in 1 blocks are possibly lost in loss record 45 of 62
==11436==    at 0x4C2CC70: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==11436==    by 0x4012FE4: allocate_dtv (dl-tls.c:296)
==11436==    by 0x4012FE4: _dl_allocate_tls (dl-tls.c:460)
==11436==    by 0x59C9D92: allocate_stack (allocatestack.c:589)
==11436==    by 0x59C9D92: pthread_create@@GLIBC_2.2.5 (pthread_create.c:500)
==11436==    by 0x52BE8DA: zactor_new (zactor.c:122)
==11436==    by 0x4E49E95: flux_sec_comms_init (security.c:197)
==11436==    by 0x407010: main (broker.c:409)
==11436== 
@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Nov 3, 2017

It seems like we can put a const on the data returned by cache_entry_get_raw() for the same reason
as get_treeobj().

I actually totally forgot about that. I also forgot about possibly moving kvs_util stuff into treeobj (which I described in #1264). I think I'd prefer those changes in another PR. This one is a bit involved which is why I perhaps mentally stopped where I did (I need to remove the "Fixes #1264" in the commit message).

I'm a little worried about cache_entry_set_raw() and cache_entry_set_treeobj(). Both allow their
argument to change after it's been assigned to the cache entry. Maybe they should copy their
argument.

True. Perhaps worth considering as well. I'll note it in the issue.

Shall I squash?

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Nov 3, 2017

@chu11 I think you can go ahead and squash as far as I'm concerned. Not sure if you need to wait for @garlick though.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Nov 3, 2017

Works for me to do this in pieces. Plus, it's nice to start with a clean slate on monday!

garlick and others added some commits Nov 1, 2017

libkvs/treeobj: use const on json_t where approp
Add a new internal function treeobj_peek() which accepts
a const json_t * input parameter, and produces a const json_t
*data output parameter.

Internally, treeobj_peek() calls json_unpack(), which required
the const on the input parameter to be cast away - safe since,
while we use the "o" format specifier to extract the data object,
we put a const on it before we return it to internal callers.

The treeobj_peek() function allows many functions to accept
const json_t * where before they did not.

There was one other place where a const had to be cast away.  In
treeobj_validate(), which now calls treeobj_peek() to access the
const data member, the data const had to be dropped while iterating
over its directory entries with json_object_foreach(), since
json_object_iter() does not accept a const object.  I'm hopeful
that the iterator does not actually modify the object as it seems
self-contained, and this is just an oversight in the jansson API design.

As discussed in #1264
test/treeobj: fix memory leaks
Fix mem leaks in the test code spotted by @grondo with valgrind.
common/libkvs/treeobj: Add treeobj_peek_entry()
Add const version of treeobj_get_entry(), so users can use const
treeobjs safely.

Add unit tests appropriately.
modules/kvs/lookup: Refactor return val logic
Refactor internal logic when returning values to callers.  Do not
return pointer to cache entries, as callers could mistakenly modify
those entries.  Always return a copy of those entries.
modules/kvs: Refactor cache_entry_get_treeobj()
Return const json_t * from function, as we do not want users
to modify any json object stored in the cache.
common/libkvs/treeobj: Add treeobj_get_symlink()
Add convenience function to get symlink string directly from a
treeobj entry.  This convenience function also allows const symlink
tree objects to be used without casting const away.

@chu11 chu11 force-pushed the chu11:issue1264 branch from c936f4b to bc0a6e2 Nov 3, 2017

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Nov 3, 2017

just squashed and re-pushed

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Nov 3, 2017

Did you want me to fix that treeobj commit messasge? I could just squash and re-push on top of yours... LMK

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Nov 3, 2017

Er, actually, I don't think it's a big deal if it goes in with the orig commit message. At least I took out the comment in the code that was a bit misguided.

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Nov 3, 2017

nah, i think the fix in the code was enough.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Nov 3, 2017

Coverage Status

Coverage decreased (-0.03%) to 78.507% when pulling bc0a6e2 on chu11:issue1264 into 07780af on flux-framework:master.

@grondo grondo merged commit 85353aa into flux-framework:master Nov 3, 2017

4 checks passed

codecov/patch 79.41% of diff hit (target 77.91%)
Details
codecov/project Absolute coverage decreased by -0.03% but relative coverage increased by +1.49% compared to 07780af
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage decreased (-0.03%) to 78.507%
Details

@grondo grondo referenced this pull request May 10, 2018

Closed

0.9.0 Release #1479

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.