Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Object field caching #280

Merged
merged 1 commit into from
Jul 23, 2019
Merged

Object field caching #280

merged 1 commit into from
Jul 23, 2019

Conversation

sbarzowski
Copy link
Collaborator

So far I did this in a way which is intended to be minimally
invasive, even at the cost of worse performance and some weirdness.
Most importantly valueCachedObject should probably replace valueObject
interface to avoid some indirection (and it shouldn't be at the same
"level" as valueSimpleObject/valueExtendedObject (these should no longer
even be values on their own).

This is a reasonable proof of concept, which we can try benchmarking
on the real world code.

@sbarzowski
Copy link
Collaborator Author

Related: google/jsonnet#663

I think we can do object local caching in another change after this gets finished.

@coveralls
Copy link

coveralls commented May 25, 2019

Coverage Status

Coverage increased (+0.06%) to 77.406% when pulling 7e5d7ff on sbarzowski:object-caching into 4996d46 on google:master.

@davidreuss
Copy link

We have a fairly large corpus of configuration that takes ~5s to render -- I read the article on sjsonnet (https://databricks.com/blog/2018/10/12/writing-a-faster-jsonnet-compiler.html), and realized that object caching might be what is causing our configuration to be slow to render.

We got a few native functions so we can't apply sjsonnet directly as a measure, but .. there's very real benefits with this patch for us so, that's interesting to pursue.

We go from ~4.5s to about ~2.8s with this patch applied, and it makes a noticeably difference when "developing" on the config, and trying to expand the entire thing.

Is there anything hindering this from being merged?

@sparkprime
Copy link
Member

Thanks for describing your situation. I haven't reviewed it yet but it's high up my priorities. I don't suppose your config is public is it? :)

@davidreuss
Copy link

No it’s not — it’s non prod sensitive though (it’s generating a complete environment for our ci/e2e testing) so we can possibly figure something out if it would be of value in determining if these perf changes are working on a non-trivial config

we do have a couple of custom functions and currently we’re using an in-house wrapper for that but we can maybe figure something out if you’re interested .. might also be able to strip/customize some specific bits so a native jsonnet evaluator could be used ..

@sparkprime
Copy link
Member

I suspect it'd be possible to swap out the native bits with hardcoded values or jsonnet functions without changing the performance of it

@sparkprime
Copy link
Member

Also might be possible to anonymise it by changing all the strings to "DDDDDDDDDDDDD" or whatever

@davidreuss
Copy link

i’ll get back to you when i have something simplified to try things out with :)

@sbarzowski
Copy link
Collaborator Author

FYI I expect to finish the refactoring and drop WIP mark today or tomorrow.

@davidreuss
Copy link

davidreuss commented Jun 7, 2019

I've put together a container which allows expanding our configuration with plain command line

I tried just for kicks both with the current go version and with the c++ version ...

c++

$ time ./expand.sh >/dev/null
./expand.sh > /dev/null  24.18s user 1.19s system 99% cpu 25.562 total

go

$ time ./expand.sh >/dev/null
./expand.sh > /dev/null  4.07s user 0.20s system 139% cpu 3.069 total

I'm trying to figure out how much i'm allowed to share from the configuration, or finding other ways of anonymizing the data

value.go Outdated
valueBase
assertionError error
cache map[objectCacheKey]value
underlying objectBody
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any ideas for a better names than "underlying" and "objectBody"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could rename the top-level one valueCachedObject and then the other one could just be object or valueObject or uncachedObject. I don't really like the word "body" as it usually implies the bit that isn't a header but there's no header here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The objectBody, so valueObject is out of the question. I like the uncachedObject name the most.

So I renamed objectBody to uncachedObject and underlying to uncached. I left valueObject as it was, because I think it's clear enough - it is the only representation of object values (uncachedObject is not a value on its own).

@@ -647,10 +657,20 @@ func objectIndex(i *interpreter, trace TraceElement, sb selfBinding, fieldName s
return nil, i.Error(fmt.Sprintf("Field does not exist: %s", fieldName), trace)
}

if val, ok := sb.self.cache[objectCacheKey{field: fieldName, depth: foundAt}]; ok {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, in this function the actual caching happens.

@sbarzowski sbarzowski changed the title [WIP] Object field caching Object field caching Jun 7, 2019
@sbarzowski
Copy link
Collaborator Author

Ready for review

@davidreuss
Copy link

I’ll try and apply this tomorrow and test it out... i’ll also see if i can anonymize our code so you can see for yourselves, if you got time for it

@sbarzowski
Copy link
Collaborator Author

@sparkprime Are we going forward with this?

@sparkprime
Copy link
Member

sparkprime commented Jul 21, 2019 via email

value.go Outdated
}

func (*valueObjectBase) getType() *valueType {
return objectType
func (obj *valueObject) inheritanceSize() int {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this ever get called on the top level (or if it does, do you happen to have the inside object readily available)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does, but only in context where we are playing with the object internals anyway, so we can reach for the uncachedObject. I removed it.

It comes with an added advantage - valueObject no longer conforms to uncachedObject interface - so they cannot be passed there by mistake (and there were some places where it was unnecessarily passed instead of uncachedObject). It is now fixed.

Thanks for noticing this.

@sparkprime
Copy link
Member

LGTM

This change adds caching to objects fields, i.e. now subsequent
references to an object field are going to be served from cache.

Cache is kept within an object. Objects created with operator +
start with a clean cache (they have to, because in general all
the fields may have changed their values due to late binding).

This change comes naturally with a change of structure of objects,
now valueObject is a concrete struct which keeps "uncachedObject"
which is roughly equivalent to old objects.
@sparkprime
Copy link
Member

Alright, merge it when you're ready. Nice job!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants