Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log the sizes allocated through the SimpleMemoryPool for analysis. #380

Open
wants to merge 2 commits into
base: 3.0-li
Choose a base branch
from

Conversation

wyuka
Copy link

@wyuka wyuka commented Aug 29, 2022

[LI-HOTFIX]
TICKET = LIKAFKA-43625
LI_DESCRIPTION = We want to re-use memory buffers for request deserialization. To do this, we need to gather statistics on the most frequent allocation sizes on current clusters. We need histograms of the number of requests for every allocation size. This change enables us to specify Kafka configuration to log this histogram every few (configurable amount of) minutes.
EXIT_CRITERIA = This PR will be reverted once we determine the right range of sizes to use for the memory pool as part of the work on the aforementioned ticket.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@wyuka wyuka requested a review from lmr3796 August 29, 2022 20:54
@wyuka wyuka requested a review from gitlw August 29, 2022 23:22
public class MemoryPoolStatsStore {
private static final Logger log = LoggerFactory.getLogger(MemoryPoolStatsStore.class);

private final AtomicInteger[] histogram;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to not use the metrics object but implement this by hand?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Kafka metrics support publishing histograms only as percentiles. This will not be useful information for us to determine the range of most frequent memory allocations. For example, consider that around 60% of requests are of size 1 kb. Now, if we observe from metrics that the p90 request size is 400k and the p99 size is 1500k, we are unable to make any inference about the fact that most requests are within 1 kb range.

Hence, we are publishing the full histogram as logs.

@@ -114,6 +119,7 @@ public boolean isOutOfMemory() {
//allows subclasses to do their own bookkeeping (and validation) _before_ memory is returned to client code.
protected void bufferToBeReturned(ByteBuffer justAllocated) {
this.allocateSensor.record(justAllocated.capacity());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the metrics from this allocateSensor not good enough?
If so, can you please explain why?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sensor relies on the Stat class, and it appears that it can record statistics like max, sum, avg, and not the entire histogram.

memoryPoolStatsStore.clear()
}

val histogramPublisher = new KafkaScheduler(threads = 1, "histogram-publisher-")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse the same scheduler

var kafkaScheduler: KafkaScheduler = null

since the startup and shutting down of that scheduler is already taken care of.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we can do that. I was not sure myself, because we use this kafkaScheduler in some places inside KafkaServer, while in other places we allocate a new KafkaScheduler for other purposes. So I was confused as to which way to go.

I have fixed it.

class MemoryPoolStatsLogger extends Logging {
override lazy val logger = MemoryPoolStatsLogger.logger

def logStats(memoryPoolStatsStore: MemoryPoolStatsStore): Unit = {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems everything in this class can be moved into the object MemoryPoolStatsLogger

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the good suggestion.

log.debug("Requested bytes {} for allocation exceeds maximum recorded value {}", bytes, maxSizeBytes);
return -1;
} else {
return (bytes - 1) / segmentSizeBytes;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to record the bytes in log scale instead of linear scale?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants