-
Notifications
You must be signed in to change notification settings - Fork 152
WIP: track and reuse empty slots in HTTP probe #578
Conversation
1fddae7
to
6dc5ba0
Compare
@benner Sorry for the delay in reviewing. It's been a bit busy for past 1 week. I'll try to review in a couple of days. |
@manugarg, did you find time to look into it? |
@benner Sorry for the silence on this. I think I now understand the problem better. When we go through the targets list, we skip existing targets but don't advance "startWaitTime", which means that 1st new target will start in the first slot and overall there will be overlaps in goroutines. cloudprober/probes/http/http.go Line 495 in 1a77284
I think this problem will be much less pronounced if we just updated startWaitTime even for skipped targets: cloudprober/probes/http/http.go Line 511 in 1a77284
It will not be a perfect distribution, because "gapBetweenTargets" will keep changing based on the targets size, and there may be some crowding over time (new target's Slots is an interesting idea. I think if we go that way, we should probably think about more formal way to create and distribute slots. For example, if we get more targets than last time, we should probably double the number of slots, keeping half the slots still assigned to old goroutines, and picking the first free slot from the beginning. What do you think about just advancing startTime for skipped targets for now and working on slots method later on. |
It may work. I need to look into exact stats to make more proper estimation if this helps. Also I initially was thinking about something like memory allocators does - linked lists (for free or used, singled or doubled) but as demonstration just WIP'ed with an array :-) |
When I started thinking more deeply about it, that's where I was leading too. :-) |
Then I will back with free list soon. Still not sure what is best way to handle overfill. Maybe it can be custom metric for Cloudprober itself or per probe. Any suggestions? |
I think let's try the simple thing first. Distributing probing slots equally may not be a very important requirement. My main goal behind trying to distribute equally, was to minimize the overlap, which impacts stuff for smaller CPU instances trying to run many high frequency probes. I am worried that if we try to do something clever, it may introduce bugs/corner-cases that may be hard to discover. |
My main goal is opposite 😄 - to protect targets (actually physical server which has these targets). Initial description and proposed solution was to limit concurrency: #510 To repeat problem in short: I have highly overbooked physical servers (business specifics) whose has thousands of services and by triggering some of them they triggers other dependencies (e.g. MySQL, Redis, memcached etc). I want to ensure that Cloudprober will not not probe to much targets on same physical server at once.
|
6dc5ba0
to
1f05536
Compare
1f05536
to
3903a9f
Compare
@benner I'm leaving Google. To continue working on Cloudprober, I am forking it to github.com/cloudprober/cloudprober. As I was pretty much the only person maintaining this project from Google, I'll be archiving this repository (For some internal reasons, Google doesn't want to migrate the repository). I hope you'll be able to resubmit this PR. Sorry for not getting to it until now. |
Sure. I'll resubmit PR . Thank you for informing |
@benner Just a heads up, I know you starred the new location github.com/cloudprober/cloudprober, but I had to delete that repository to remove its "fork" relationship. I've recreated it now, and it should be pretty stable now. |
Dirty suggestion for: #575
Not intended for merge. Comments are welcome.
Open questions:
I'll improve (tests, speed etc) PR if this approach seems reasonable.