Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: first attempt at #776 / #941 #942

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

marvinthepa
Copy link

WIP as it contains a few FIXMEs that need to be addressed (especially error handling).

Uploaded anyway as a basis for discussion.

@marvinthepa marvinthepa force-pushed the interleaved_fetch branch 3 times, most recently from 350d7b4 to 03ad88e Compare December 12, 2022 07:18
end

def server_pipelined_get(server, request)
buffer_size = server.socket_sndbuf
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had been thinking about doing it by batching the number of keys, but I think this approach of looking it at from the perspective of # of bytes is a better way to address the issue. It also allows us to give the end user direction about how to control the behavior (by tuning sndbuf)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started out with batching the keys, but I got the idea to batch by bytes instead when I struggled to come up with a good default value for the batch size - It seemed to me that the original problem was caused by the response size rather than the number of keys requested..

I don't know if sndbuf is a good value though, maybe it is more efficient to try to write larger batches than the buffer.
It will also not allow users of the library to tweak the send buffer and the chunk size independently (I do not know if that makes sense).

It might not matter at all as this uses nonblocking IO for the writes, which could mean that there would just be partial writes (which are already correctly handled AFAIK) if we try to write everything at once. Maybe I will do some experiments, but I think there are other areas I should polish first (basic correctness being the most important one).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might do something like this:

  1. Add an explicit chunking parameter that can be set on the client
  2. Have that chunking parameter fallback to the sndbuf size if it isn't explicitly set

start_time = Time.now
servers = requests.keys

# FIXME: this was executed before the finish request was sent. Why?
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to look at this in a little detail. The behavior in the case of no servers available is supposed to a returned [] with no error. But if I recall correctly, there are cases where the a server is not connected that can trigger raising a Dalli::Error without this check. I'll find some time to look.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any hints on the intended original behavior of the multi-server code are highly appreciated. I haven't been able to fully wrap my head around it yet, and I guess it is the hardest to get right when things fail in between reading/writing chunks. Hard to correctly test as well..

@@ -143,6 +145,14 @@ def quiet?
end
alias multi? quiet?

def pipelined_get_request(keys)
req = +''
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this line was just copied and moved from pre-existing code, but looking at it we can almost certainly reduce allocations by pre-allocating req to a reasonable size. Something like keys.size * 24 + keys.map(&:size).sum) for binary? And something like keys.size * 18 + keys.map(&:size).sum) for meta.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. But it seems that it should be 17 for meta, at least it was in my tests (non-base64, which will completely break this calculation..).
Does not seem to make a significant difference in speed, though.

I have another idea for a performance optimization: start of by only generating as much bytes as are needed for the first chunk and send it, then pre-calculate upcoming bytes whenever the select blocks (or the socket is writable and no pre-calculated data is available).
This means the server can start generating the response earlier, dalli would use less memory, and time that is "lost" waiting for I/O could be used for calculation.
But it will be a bit harder to do elegantly without a library like async or eventmachine, and I don't know if it is worth the added complexity. Anyway, I would put this in an extra PR once this one is finished.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's very possible I got the number wrong for meta (I did the calculation quickly) but just make sure you're including the separator.

I think the benefit for that optimization is unlikely to be worth the tradeoff, but we can discuss.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a neat idea for an optimization, but yeah it feels like it could be an improvement for a future PR.

The .map(&:size).sum) is faster in my little benchmark, but by only by like 10 nanoseconds a key.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .map(&:size).sum) is faster in my little benchmark, but by only by like 10 nanoseconds a key.

I had the (naive) assumption that the cost of allocating an array should be prevented - not knowing that it is outweighed by doing the sum in C instead if in Ruby. I should have remembered Knuth.

@petergoldstein
Copy link
Owner

There's a bit of "devil in the details", but directionally this looks correct. Thanks for drafting @marvinthepa . With a little back and forth we can almost certainly get this in.

@marvinthepa
Copy link
Author

Thanks for drafting @marvinthepa

Happy to give back and spend some company time on a library we have been using for a few years now..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants