Skip to content
This repository has been archived by the owner on Nov 5, 2021. It is now read-only.

WIP: track and reuse empty slots in HTTP probe #578

Closed

Conversation

benner
Copy link
Contributor

@benner benner commented Mar 16, 2021

Dirty suggestion for: #575

Not intended for merge. Comments are welcome.

Open questions:

  • What actually to do if there are no free slots
  • Should this behavior be optional
  • Memory consideration with huge intervals and small gaps between targets

I'll improve (tests, speed etc) PR if this approach seems reasonable.

@benner benner force-pushed the feat/http_autodiscovery_slots branch 4 times, most recently from 1fddae7 to 6dc5ba0 Compare March 18, 2021 12:53
@manugarg
Copy link
Contributor

@benner Sorry for the delay in reviewing. It's been a bit busy for past 1 week. I'll try to review in a couple of days.

@benner
Copy link
Contributor Author

benner commented Apr 13, 2021

@manugarg, did you find time to look into it?

@manugarg
Copy link
Contributor

@benner Sorry for the silence on this. I think I now understand the problem better. When we go through the targets list, we skip existing targets but don't advance "startWaitTime", which means that 1st new target will start in the first slot and overall there will be overlaps in goroutines.

// This target is already initialized.

I think this problem will be much less pronounced if we just updated startWaitTime even for skipped targets:

startWaitTime += gapBetweenTargets

It will not be a perfect distribution, because "gapBetweenTargets" will keep changing based on the targets size, and there may be some crowding over time (new target's n*gapBetweenTargets may still fall near an old target's start time), but it will be quite random, which is not so bad.

Slots is an interesting idea. I think if we go that way, we should probably think about more formal way to create and distribute slots. For example, if we get more targets than last time, we should probably double the number of slots, keeping half the slots still assigned to old goroutines, and picking the first free slot from the beginning.

What do you think about just advancing startTime for skipped targets for now and working on slots method later on.

@benner
Copy link
Contributor Author

benner commented Apr 25, 2021

It may work. I need to look into exact stats to make more proper estimation if this helps.

Also I initially was thinking about something like memory allocators does - linked lists (for free or used, singled or doubled) but as demonstration just WIP'ed with an array :-)

@manugarg
Copy link
Contributor

manugarg commented Apr 26, 2021

Also I initially was thinking about something like memory allocators does - linked lists (for free or used, singled or doubled) but as demonstration just WIP'ed with an array :-)

When I started thinking more deeply about it, that's where I was leading too. :-)

@benner
Copy link
Contributor Author

benner commented Apr 27, 2021

Then I will back with free list soon. Still not sure what is best way to handle overfill. Maybe it can be custom metric for Cloudprober itself or per probe. Any suggestions?

@manugarg
Copy link
Contributor

Then I will back with free list soon. Still not sure what is best way to handle overfill. Maybe it can be custom metric for Cloudprober itself or per probe. Any suggestions?

I think let's try the simple thing first. Distributing probing slots equally may not be a very important requirement. My main goal behind trying to distribute equally, was to minimize the overlap, which impacts stuff for smaller CPU instances trying to run many high frequency probes.

I am worried that if we try to do something clever, it may introduce bugs/corner-cases that may be hard to discover.

@benner
Copy link
Contributor Author

benner commented Apr 28, 2021

I think let's try the simple thing first.
Do you mean just keep startWaitTime increasing?

Distributing probing slots equally may not be a very important requirement. My main goal behind trying to distribute equally, was to minimize the overlap, which impacts stuff for smaller CPU instances trying to run many high frequency probes.

My main goal is opposite 😄 - to protect targets (actually physical server which has these targets). Initial description and proposed solution was to limit concurrency: #510

To repeat problem in short: I have highly overbooked physical servers (business specifics) whose has thousands of services and by triggering some of them they triggers other dependencies (e.g. MySQL, Redis, memcached etc). I want to ensure that Cloudprober will not not probe to much targets on same physical server at once.

I am worried that if we try to do something clever, it may introduce bugs/corner-cases that may be hard to discover.
I agree with this possibility. Another suggestion is to keep same slot array with index to last used slot and in case of reaching end of slots do same walk trough from beginning

@benner benner force-pushed the feat/http_autodiscovery_slots branch from 6dc5ba0 to 1f05536 Compare April 28, 2021 07:59
@benner benner force-pushed the feat/http_autodiscovery_slots branch from 1f05536 to 3903a9f Compare October 13, 2021 19:16
@manugarg
Copy link
Contributor

@benner I'm leaving Google. To continue working on Cloudprober, I am forking it to github.com/cloudprober/cloudprober. As I was pretty much the only person maintaining this project from Google, I'll be archiving this repository (For some internal reasons, Google doesn't want to migrate the repository). I hope you'll be able to resubmit this PR. Sorry for not getting to it until now.

@benner
Copy link
Contributor Author

benner commented Oct 28, 2021

Sure. I'll resubmit PR . Thank you for informing

@benner benner closed this Oct 28, 2021
@benner benner deleted the feat/http_autodiscovery_slots branch October 28, 2021 18:28
@manugarg
Copy link
Contributor

@benner Just a heads up, I know you starred the new location github.com/cloudprober/cloudprober, but I had to delete that repository to remove its "fork" relationship. I've recreated it now, and it should be pretty stable now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants