-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Application code should not be able to modify locacl cache except through Put/Get #27
Conversation
We aren't loading/unloading interface types, only structs
…odified since we're passing a *struct back to the application application code should not be able to modify locacl cache except through Put/Get Implemented code that allows for you to pass an interface variable into Get. If both *struct and struct satisfy the interface, whichever was placed in the cache is what is pulled. Conflicts: goon_test.go
Is this how NDB works? We should do whatever they do. |
I am not an NDB user and the negative assertion about this specific issue I cannot find. That being said, the positive is also not documented which if it existed, I'm sure it would be discussed.
In no circumstances would I expect to find the changes in step 2 represented in the object fetched in step 4. That's what I'm fixing in goon. Python doesn't have pointers like Go, so it seems NDB would have to go out of their way to do this, but again, I'm not a Python expert. |
Speaking purely on the merits of this change in theory. I, too, have thought about bringing this issue up, where a modification will result in a subsequent get from the cache to return a reference to the same struct. I hadn't yet thought through the pros/cons of the many approaches we could take, but I guess now is as good of a time as any to start. First, talking about the 4-point example that @mzimmerman brought up. Even with the current reference-based solution, there wouldn't be a problem in this example. Unless we're using the same goon instance to serve multiple http requests. Edit: Is this even possible though? I haven't tried, but I know that the appengine context is created with the request parameter. In any case, some of my text follows with the presumption that it's possible. Should using the same goon instance for different requests be a thing? It's an interesting question, to be sure. By making a single goon instance usable concurrently from goroutines, we've also effectively made it usable from concurrent http requests. Using a shared goon instance like this would provide us with a shared local cache. Having a shared local cache has the good side of being able to skip memcache requests for very frequently accessed data. But then having a shared local cache also means that the cache keeps on growing and growing and growing, because it keeps data from all the different requests in memory. For a popular app, it takes no time at all for 1000+ users to hit the same instance, causing the goon local cache to be filled with 1000x user-specific data. Depending on the application, this can mean that the instance memory usage goes above GAE limits. Periodically clearing this shared local cache can result in undesirable effects as well. Imagine a request fetching 100 entities, and then fetching them again a bit later with the presumption that they're cached. However a concurrent request, using the same shared goon instance, cleared the cache. I guess it comes down to the app developr to decide whether using a shared goon instance across requests is desirable. I think it's neat that goon even allows this sharing, as this certainly isn't possible in Objectify, the java datastore library that I used to use. Whichever way we end up going, I think having the goon documentation spell out the cache mechanics is key, so that app developers could make an educated decision. Now, even if we use a different goon instance per request, arbitrary changes will still be very visible with a reference based cache system. The most alarming example that I can currently think of is this:
Not a pretty sight. The app developers would need to be well informed of pitfalls like this by the goon documentation, if we stick with the reference based cache system. Having a copy based cache system would avoid such problems. At a cost to memory, naturally. Though I'm not sure if the added memory cost is big. With a reference based cache, there will basically always be a single copy of the data in memory. With a copy based cache, there will be a single copy of everything, plus an additional copy for every entity that will still need to be referenced by the app code. I haven't put much thought into this, but this may be a small time window for most cases, where the app retrieves an entity, does something based on it, and never references it again, thus being available for cleanup by the garbage collector. My thoughts are still developing on this issue, and I'm not quite ready to rally behind one side or another. Still, I hope I provided some food for thought. |
Sorry, I gave a poor example. Each goon object should be at the http
|
I did some research to find out what other libraries do with their local cache. Objectify was initially going to go with a reference based cache, but then decided to go with a copy based approach. However for the final release, they went with a reference based cache. I'm still trying to probe them for the reasons. I also looked into a non-GAE but widely used library, the Java Persistence API. They are also using reference based caching. Then I proceeded to dig in the NDB source and found a reference to an issue, which describes pretty much the same phenomenon that we're discussing here. Another issue confirms that NDB also has a reference based cache. One interesting partial remedy that was implemented in NDB is that if there's a cache hit, check that the key still matches. This means that the side-effects of changing a struct's data is at least limited to the same exact id, and thus the most alarming achievements example that I described earlier would not be possible. If we decide to go with a reference based cache, I think this sanity check should be included. |
Wow, excellent research. I must admit I'm very surprised to see that other implementations have a reference cache. Memory usage wise, it's going to be better, but it opens up all kinds of interesting application level bugs (like this test)[https://github.com//pull/27/files#diff-94b3ba1603e0ae9f46877bb1a21560b5R87] Either way I'm fine, I just happened to think about this case when looking at the code. If this isn't wanted it's fine to close the issue; there was just no part of me that thought this was intended behavior. The additional tests in this pull request too are superseded by what @xStrom did in #28 |
A copy based cache would require additional allocations by definition. Allocating memory takes time, but how much time? I did some benchmarking of GetMulti hitting the local cache to find out. The results, on avarage, are as follows:
There is definitely a significant relative performance cost to pay when using a copy based cache. However the absolute added cost may be acceptable. Even when retrieving 100x 10KB entities from the cache, the added cost is merely a quarter of a millisecond. PS. The code in this pull request is currently unsuitable for replicating these performance tests, because it only makes a copy of the struct. Meaning, if a struct has a []byte member variable (which is a pointer to a byte array, with some metadata), then the copy of this struct has the same []byte member variable, i.e. the same pointer to the same underlying byte array. |
In the current scheme, we're storing Put and Get objects in local cache, often struct pointers. If an application does a Put/Get and then modifies a field in their struct for some local processing, the struct in local cache gets updated too (since it's the same one). That wouldn't necessarily be a problem if the application then does a Put to save the changes, but that's not a requirement. The application may want to use it for some temporary processing since the object was already allocated. Temporary processing doesn't change the data in the datastore (and hence it shouldn't change it in goon either). Temporary p
This change stores whatever the native format of the request is into cache. That is, if the object is a struct pointer, a new allocation occurs, and the data is populated. If the object is a struct, then assignment by value naturally stores a copy.
When retrieving from the local cache we also have to make a copy, otherwise any temporary calculations without a Put request would update the local cache as well.