Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New serialization code is too complicated for its benefits #31

Closed
maddyblue opened this issue Mar 12, 2014 · 5 comments
Closed

New serialization code is too complicated for its benefits #31

maddyblue opened this issue Mar 12, 2014 · 5 comments

Comments

@maddyblue
Copy link
Owner

I'm happy with the recent changes to the memcache layer. However, the complexity of the serialization code prevented me from reading any of the changes. While the performance benefits are nice, I do not think they are enough to justify this complexity of code. I think it is more important that many people agree that it is good.

I propose that we remove this magic and revert back to the gob version. I definitely want to keep the new features that were added, but I want to understand how they work.

If there is strong disagreement to this proposal, I'd like to discuss it. My goal is to have the code in a state where I can understand it and have motivation to read through all changes. I thought I could ignore that, but goread broke so many times that this is clearly not possible.

Thoughts?

@maddyblue
Copy link
Owner Author

Some other thoughts: the PR that triggered this was huge. It involved many things including the changes mentioned above. Perhaps a better approach would have been submitting these as separate PRs so that each feature change could be considered as a whole, and then merged in after discussion. As it was, no one reviewed it because it was too big.

Here is, I think, my new rule: I do not plan on accepting any new code that I do not fully understand myself. This will prevent work that may be beneficial but complicated. I'm quite alright with having it be a bit slower if it is easier to reason about what is going on. The race condition bug from before is an exceedingly strange bug that appeared to take a long time to diagnose and fix. I'm willing to pay the speed price for the benefit of not having bugs like that be possible.

@mzimmerman
Copy link
Contributor

I have to admit I was surprised you pulled the serialization code in given the known issues with it that @xStrom detailed.

I too only quickly reviewed it; it was quite large. I didn't see any issues, but I didn't try to understand every piece of it either; just at a high level. Combined with being busy on other projects at the moment it didn't get my full attention.

I was really surprised by the metrics/results. I wouldn't have thought the standard gob encoder was that slow. It wasn't clear to me why the new solution was actually faster, but again, I didn't really dig into it.

@xStrom
Copy link
Collaborator

xStrom commented Mar 12, 2014

Increasing the complexity of the code base is a slippery slope that needs to be treaded carefully. If it were only the data migration features at the cost of the complexity, then I would agree that it's not worth it. However the performance gains aren't marginal. As evidenced by my benchmarking, deserializing simple entities has a 10x (!!) speedup with this more complex version. The performance difference is even greater in long-lived instances. Additionally, it's possible to tweak this complex version even more for gains, which isn't really possible with a simple gob.Encode call. I think the added complexity is worth it for this case, primarily due to the performance gains, which can be massive.

That said, I am completely willing to work to make this compex version more understandable. There could be some low hanging refactoring fruit, which I could look for. For a bigger effect though, I could greatly increase the number of comments. I originally kept the comments minimal because the existing goon code had few comments, and over-verbose comments can make code tedious to read. However a better balance of verbosity/obscurity could be found by adding some comments to the current version.

Additionally, perhaps we should introduce an architecture document for internal development, where we describe some inner workings & design decisions. Basically this would be a place where I could describe at a high level how the complex serialization engine works and why it's so much faster than encoding/gob. Spoiler: basically encoding/gob is optimized for serializing 100MB, not 10KB.

Regarding huge pull requests, I completely agree that pull requests should be smaller rather than bigger. However for this case it wasn't really possible to do a smaller changeset. The pull request could have been reduced in size by not having any of the performance enhancements in it, but then it wouldn't even meet my own merge requirements.

PS. Regarding goread breaking, are there still some unresolved issues with this new code?

@maddyblue
Copy link
Owner Author

@xStrom No, there are no currently-known issues with the code and goread.

I'll read this later tonight, but putting this here for my memory: if it is decided to remove the code, we should consider using appengine/memcache.Gob as a replacement.

@maddyblue
Copy link
Owner Author

I misread some of those benchmark results. I didn't realize the performance gain was as high as 10x, I thought it was somewhat less than twice as fast. Yes, since the gain is that high I think it's worth keeping this. I'm not thrilled that we got hit with that race condition bug since it suggests bugs of a similar class could be out there and they are nasty.

If you feel like either documenting those functions or producing some longer multi-paragraph comment somewhere about the serialization methods, I'm not opposed. It's still some hundreds of lines of pretty dense code, so I think it may, even then, be unlikely that I sit down and read through it all carefully. Do what you want. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants