Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Caching of network interface for Azure #12622

Merged
merged 9 commits into from
Nov 21, 2023

Conversation

shinji62
Copy link
Contributor

@shinji62 shinji62 commented Aug 1, 2023

This PR add the caching of Network API interface for Azure this is related to issue #12613 .

This PR will:

  • add a new go modules https://github.com/Code-Hex/go-generics-cache
  • use go generics from for the cache
  • add a new configuration settings cache_refresh_interval which is a part of the calculation of final cache refreshing
  • add new counter to count the number of cache hit prometheus_sd_azure_cache_hit_total

Cache is disabled by default that means same behavior as without the PR.

The cache itself as an hard limit to 10K items, the refresh (expiration time) is by item and calculated like
cache_refresh_interval + refresh_interval + random(20s)).
We add refresh_interval because the cache have no utility if we cannot save item for at least one refresh_interval
We add random(20s) to avoid that all object expired within a small amount of time.

Have been tested :

  • agains real Azure with or without the cache
  • make test
  • -race

@roidelapluie
Copy link
Member

roidelapluie commented Aug 2, 2023

Thanks for your contribution :-)

I don't think the cache interval should be configurable. Additionally, I am not keen of importing a full caching library into Prometheus. I feel like a simple cache based on a map could do the trick.

@shinji62
Copy link
Contributor Author

shinji62 commented Aug 3, 2023

@roidelapluie thanks for taking the time to review, really appreciated.

I understand your point on the lib to import, and yes using Sync and map can do the trick, but I think having lru and expiration is an important feature of a cache and implementing them will be a bit more complex that simple map.

lru or any others policy I think make sense because we don't want to see the cache growing and we need a good way to evict key. expiration is also important as we need to refresh the cache view of the world (in this case the network interface), plus expiration time by item is a good way to avoid all item to expire at the same time which my cause rate limiting in case of Azure.

First I was thinking of using https://github.com/hashicorp/golang-lru but this don't have expiration, I was hoping that PR 116 to be merge but it's still not the case.

So I found https://github.com/Code-Hex/go-generics-cache which is still maintained, well tested and implement the function I described.

Concerning cache_refresh_interval I am keen to remove the settings and use a setting enable_caching to enable of not the cache and put the refresh interval between 1h to 2h.

Of course if you are still not keen, I will just implement a simple LRU (inspired by the lib code).

@roidelapluie let me know what you think.

@shinji62
Copy link
Contributor Author

shinji62 commented Aug 8, 2023

@roidelapluie Let me know what you think about my answer, I would like to move forward if possible.

@roidelapluie
Copy link
Member

Thanks, we have discussed this during our bug scrub and we will go forward with this pull request. I will do a proper review later.

@shinji62
Copy link
Contributor Author

shinji62 commented Aug 9, 2023

Thanks, we have discussed this during our bug scrub and we will go forward with this pull request. I will do a proper review later.

thanks @roidelapluie

@shinji62
Copy link
Contributor Author

@roidelapluie any luck in reviewing the PR ?

@rtsisyk
Copy link

rtsisyk commented Aug 30, 2023

Any news? I am desperately waiting for this change.

@shinji62
Copy link
Contributor Author

Sorry to keep pushing you @roidelapluie

if d.cache == nil {
return
}
rand.Seed(time.Now().UnixNano())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you use random here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I didn't want the key to expire at the same time, which will cause the same effect as having no cache.

If Key expire at the same time we will just consume all the api call that we try to avoid with the cache

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree, if the cache lifetime is > than the refresh interval, the cache is worth it. I do not see why we need to introduce random sleep.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of introducing a random bias to the expiration time to mitigate spikes when the initial batches of nodes expiring is quite reasonable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have remove the seed put let the random [0-refreshInterval] added to the exp time.

if d.cache == nil {
return
}
random := time.Duration(d.cfg.RefreshInterval)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you want random instead of 10* refresh interval?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I guess I am missing something:

We don't want the item to expire at the same time, because if that happen will hit the api call limit, everytime the cache expire, so to mitigate that I use a random to avoid everything to expire at the same time (within a few sec).

Etourneau Gwenn added 7 commits September 27, 2023 18:09
Signed-off-by: Etourneau Gwenn <getourneau@yugabyte.com>
Signed-off-by: Etourneau Gwenn <getourneau@yugabyte.com>
Signed-off-by: Etourneau Gwenn <getourneau@yugabyte.com>
Enabled cache by default with 5x the default refresh time

Signed-off-by: Etourneau Gwenn <getourneau@yugabyte.com>
Signed-off-by: Etourneau Gwenn <getourneau@yugabyte.com>
Signed-off-by: Etourneau Gwenn <getourneau@yugabyte.com>
Signed-off-by: Etourneau Gwenn <getourneau@yugabyte.com>
Removed uneeded error

Signed-off-by: Etourneau Gwenn <getourneau@yugabyte.com>
@shinji62
Copy link
Contributor Author

shinji62 commented Oct 4, 2023

@roidelapluie anything you need from my side ? thanks

…hing

Signed-off-by: Etourneau Gwenn <getourneau@yugabyte.com>
@shinji62
Copy link
Contributor Author

@roidelapluie I have rebase from master, run the test again, all good

@shinji62
Copy link
Contributor Author

small bump

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(During bug scrub).

LGTM! We were looking if we could reuse some existing caches, but we only have k8s or hard coded ones. Plus the https://github.com/Code-Hex/go-generics-cache/blob/main/go.mod seems pretty slim - it's fine for now.

@bwplotka bwplotka merged commit b37258c into prometheus:main Nov 21, 2023
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants