Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial Caffeine CacheStats recorder #1897

Merged
merged 16 commits into from Mar 7, 2024
Merged

Conversation

schlosna
Copy link
Contributor

@schlosna schlosna commented Mar 7, 2024

Before this PR

Begin to address #1895

Supersedes #1075

After this PR

==COMMIT_MSG==
Initial Caffeine CacheStats recorder

Example usage:

TaggedMetricRegistry taggedMetricRegistry = ...
Cache<Integer, String> cache = Caffeine.newBuilder()
        .recordStats(CacheStats.of(taggedMetricRegistry, "unique-cache-name"))
        .build();

LoadingCache<String, Integer> loadingCache = Caffeine.newBuilder()
        .recordStats(CacheStats.of(taggedMetricRegistry, "unique-loading-cache-name"))
        .build(key::length);

==COMMIT_MSG==

Possible downsides?

@changelog-app
Copy link

changelog-app bot commented Mar 7, 2024

Generate changelog in changelog/@unreleased

What do the change types mean?
  • feature: A new feature of the service.
  • improvement: An incremental improvement in the functionality or operation of the service.
  • fix: Remedies the incorrect behaviour of a component of the service in a backwards-compatible way.
  • break: Has the potential to break consumers of this service's API, inclusive of both Palantir services
    and external consumers of the service's API (e.g. customer-written software or integrations).
  • deprecation: Advertises the intention to remove service functionality without any change to the
    operation of the service itself.
  • manualTask: Requires the possibility of manual intervention (running a script, eyeballing configuration,
    performing database surgery, ...) at the time of upgrade for it to succeed.
  • migration: A fully automatic upgrade migration task with no engineer input required.

Note: only one type should be chosen.

How are new versions calculated?
  • ❗The break and manual task changelog types will result in a major release!
  • 🐛 The fix changelog type will result in a minor release in most cases, and a patch release version for patch branches. This behaviour is configurable in autorelease.
  • ✨ All others will result in a minor version release.

Type

  • Feature
  • Improvement
  • Fix
  • Break
  • Deprecation
  • Manual task
  • Migration

Description

Initial Caffeine CacheStats recorder

Check the box to generate changelog(s)

  • Generate changelog entry

Comment on lines 61 to 63
public static CacheStats of(CacheMetrics metrics, String name) {
return new CacheStats(metrics, name);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer if we made CacheMetrics private -- metric-schema doesn't make any guarantees around backward compatibility of generated code because it's expected to be consumed by the project which declares the metrics.

I suppose we may want to reuse the same metrics for guava and caffeine caches. If that's the case, I might recommend a new jar module with internal in the name, like tritium-cache-internal, where metrics may be defined and generated as public classes. The tritium-cache-internal module can be used as an implementation dep into the tritium modules which instrument specific cache implementations, so that the classes aren't directly visible to consumers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I can move the metric-schema definition and make package private.

Generally I'd like to deprecate all uses of Guava caches and migrate any remaining to Caffeine, so not sure its worth adding similar for Guava caches. We already have BanGuavaCaches in baseline

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

* @return Caffeine stats instance to register via
* {@link com.github.benmanes.caffeine.cache.Caffeine#recordStats(Supplier)}.
*/
public static CacheStats of(TaggedMetricRegistry taggedMetricRegistry, String name) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well sprinkle safety annotations:

Suggested change
public static CacheStats of(TaggedMetricRegistry taggedMetricRegistry, String name) {
public static CacheStats of(TaggedMetricRegistry taggedMetricRegistry, @Safe String name) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Comment on lines 8 to 11
hit.count:
type: counter
tags: [cache]
docs: Count of cache hits
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed this in the previous round of review, I think all the counters should be updated to type: meter. Counter is interpreted as a gauge in most backends because it doesn't guarantee monotonic increasing values, where our Meter, confusingly, uses the Prometheus Counter type for slightly easier indexing.

We may want to remove the .count name suffixes as well (otherwise we'll end up with queries for cache_hit_count_count), since those are described by both the counter and meter metric types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I'm trying to keep metric name compatibility with the existing metrics produced by com.palantir.tritium.metrics.caffeine.CaffeineCacheStats#registerCache(com.palantir.tritium.metrics.registry.TaggedMetricRegistry, com.github.benmanes.caffeine.cache.Cache<?,?>, java.lang.String) for now as I'm not sure how complex the migration will be across the fleet.

Comment on lines 16 to 23
load.success.count:
type: counter
tags: [cache]
docs: Count of successful cache loads
load.failure.count:
type: counter
tags: [cache]
docs: Count of failed cache loads
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking out loud here, entirely up to you either way:

We could declare a single metric along these lines, using a tag to distinguish success and failure:

load:
  type: meter
  tags:
    - name: cache
    - name: result
       values: [success, failure]

Sometimes this makes it easier to track total interactions

@Override
public void recordLoadFailure(@NonNegative long loadTime) {
loadFailureMeter.mark();
totalLoadNanos.add(loadTime);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we could make the meter into a timer, which also provides a “count” value in addition to percentiles, more timeseries though

carterkozak
carterkozak previously approved these changes Mar 7, 2024
@schlosna schlosna marked this pull request as ready for review March 7, 2024 20:59
@policy-bot policy-bot bot dismissed carterkozak’s stale review March 7, 2024 21:13

Invalidated by push of 1bcf527

@bulldozer-bot bulldozer-bot bot merged commit a672c71 into develop Mar 7, 2024
5 checks passed
@bulldozer-bot bulldozer-bot bot deleted the ds/cache-metrics branch March 7, 2024 21:17
@svc-autorelease
Copy link
Collaborator

Released 0.83.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants