[server] Do not insert records into transient record cache while the … by ZacAttack · Pull Request #14 · linkedin/venice

ZacAttack · 2022-09-27T01:28:22Z

…drainer queue is full

This is a tactical fix. This change prevents the transient record from growing if the memory budget for the drainer queue is close to full or exhausted.

Summary, imperative, start upper case, don't end with a period

Resolves #XXX

How was this PR tested?

Does this PR introduce any user-facing changes?

No. You can skip the rest of this section.
Yes. Make sure to explain your proposed changes and call out the behavior change.

…drainer queue is full This is a tactical fix. This change prevents the transient record from growing if the memory budget for the drainer queue is close to full or exhausted.

EOM

sushantmane · 2022-09-27T01:51:29Z

  }
+
+  public void blockOnDrainerCapacity() {
+    while (storeBufferService.getTotalRemainingMemory() < 1000) {


Can we replace it with wait-notify?

I agree... would be great to avoid sleeping, if possible. I've been meaning to get rid of sleeps (e.g., did it in 6db7c05) and I hope we continue getting rid of them, rather than add more.

sushantmane · 2022-09-27T02:00:04Z

        // either. So, there is no need to tell the follower replica to do anything.
      }
    } else {
+      blockOnDrainerCapacity();


CallingblockOnDrainerCapacity on almost every record will impact performance negatively as it calls AtomicLong::get and these atomic variables are being accessed/updated by threads on different cores. It will be interesting to see the ingestion perf after this change.

Yeah. We're already examining and updating the capacity for every submission to the drainer, though admittedly, this method calls every single drainer's atomic variable. We could make the argument of augmenting it to only refer to the specific drainer variable thats related to the ingestion task. That would free it up perhaps.

Nevermind, it's an array not a map, so getting the aggregate is probably the best way.

Maybe feasible to add a function to the StoreBufferService along the lines of boolean isDrainerNearCapacity(ConsumerRecord<KafkaKey, KafkaMessageEnvelope> consumerRecord, int subPartition) which would then be able to call SBS::getDrainerIndexForConsumerRecord and look up that specific drainer's capacity. Could also be achieved by passing just a String consumedTopic rather than a full record, given a minor refactoring within SBS.

We're already examining and updating the capacity for every submission to the drainer

IIUC, this happens in the drainer threads which are limited in number; however, we have far more shared consumer threads than the drainer threads. Hence the impact of updating atomic variables millions of times per second might be even more pronounced.

Maybe I'm being paranoid here. Let's keep this code change as one of the reference points if we see a substantial impact on ingestion performance.

FelixGV

Comment about the unmodified code (which unfortunately GH does not allow me to comment on directly...), I think we should consider renaming the other overload of PCS::setTransientRecord (the one where you do not call the new blocking function before) so that it's more clear that it sets the transient record to null...? It would also make the intent more clear as to why we're not blocking before that one (as presumably a delete may relieve memory pressure? – although that is actually uncertain... a delete heavy workload may in fact still cause a lot of garbage, even if the value part is null...).

FelixGV · 2022-09-27T14:08:02Z

  }
+
+  public void blockOnDrainerCapacity() {
+    while (storeBufferService.getTotalRemainingMemory() < 1000) {


I agree... would be great to avoid sleeping, if possible. I've been meaning to get rid of sleeps (e.g., did it in 6db7c05) and I hope we continue getting rid of them, rather than add more.

FelixGV · 2022-09-27T14:14:41Z

+      maybeBlockOnDrainerCapacity();
      partitionConsumptionState.setTransientRecord(


Is it ideal to precede every call to PartitionConsumptionState::setTransientRecord with this new function? Would it make sense to include this functionality inside of the setTransientRecord function itself? This may require passing a closure to evaluate the buffer service's remaining capacity into the PCS, which is maybe not ideal from an encapsulation standpoint. On the other hand, having this convention that calls to setTransientRecord should always be preceded by maybeBlockOnDrainerCapacity seems a bit fragile as well. Maybe there should be a function in LFSIT to wrap both of these functions?

Yeah, I think we should do the closure. Will add.

FelixGV · 2022-09-27T14:15:58Z

  }
+
+  public void maybeBlockOnDrainerCapacity() {
+    while (storeBufferService.getTotalRemainingMemory() < 1000) {


Should this 1000 be configurable? Or even if left un-configurable, should it at least be made into a constant?

I was torn on adding another tunable parameter. But maybe theres a happy middle? Maybe instead of using a strict measure, we can change it to wait if it's like 90-95% at capacity? That way it's at least tunable relative to the size of the configured drainer, but doesn't add extra noise.

Why block at some margin below the actual limit? Can we instead block if (remaining_capacity - incoming_payload_size < 0) or something like that? This is not fully precise since the memory overhead of the transient record is more than just the payload size, but probably close enough to not matter?

If the issue is due to drainer records being uncompressed and producer buffers are not. just checking drainer capacity still could lead very memory usage. We probably should make tunable and add a metric to check the transient cache usage.

This reverts commit 47e30e9.

…ile the drainer queue is full" This reverts commit dae0463.

ZacAttack · 2022-09-27T23:27:47Z

This method won't actually solve our problem. We'll need to do a more in depth fix.

…opilot comments Adds `checkRollbackOriginVersionCapacityForNewPush` which rejects a new push while any ROLLED_BACK (or parent-side rollback-origin PARTIALLY_ONLINE) version is still within its retention window. The check runs on both the parent (`VeniceParentHelixAdmin.incrementVersionIdempotent`) and the child (`VeniceHelixAdmin.addVersion`), so the rejection surfaces synchronously to the VPJ instead of failing only in async admin-message consumption on the child. Integration tests cover both the block (within retention) and the release (after retention expires) paths. Also addresses Copilot comments on PR linkedin#2688: - linkedin#11 `assumeRolledBackIfUnreachable` is now empty for full-cluster rollbacks so unreachable regions don't inflate the ROLLED_BACK count. - linkedin#14 If the region filter contains only unknown regions, skip the parent status update instead of falling through into full-cluster behavior. - linkedin#15 If zero regions confirm ROLLED_BACK, leave parent status unchanged rather than downgrading to PARTIALLY_ONLINE on no evidence.

[server] Do not insert records into transient record cache while the …

dae0463

…drainer queue is full This is a tactical fix. This change prevents the transient record from growing if the memory budget for the drainer queue is close to full or exhausted.

sixpluszero reviewed Sep 27, 2022

View reviewed changes

Comment thread ...ts/da-vinci-client/src/main/java/com/linkedin/davinci/kafka/consumer/StoreIngestionTask.java Outdated

[server] change function name to address comment on PR

47e30e9

EOM

sushantmane reviewed Sep 27, 2022

View reviewed changes

FelixGV reviewed Sep 27, 2022

View reviewed changes

Zac Policzer and others added 3 commits September 27, 2022 12:53

Revert "[server] change function name to address comment on PR"

6cdd370

This reverts commit 47e30e9.

Revert "[server] Do not insert records into transient record cache wh…

fc44488

…ile the drainer queue is full" This reverts commit dae0463.

Merge branch 'linkedin:master' into master

7311f8a

ZacAttack closed this Sep 27, 2022

		maybeBlockOnDrainerCapacity();
		partitionConsumptionState.setTransientRecord(

Conversation

ZacAttack commented Sep 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary, imperative, start upper case, don't end with a period

How was this PR tested?

Does this PR introduce any user-facing changes?

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sushantmane Sep 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sushantmane Sep 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

FelixGV left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ZacAttack commented Sep 27, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ZacAttack commented Sep 27, 2022 •

edited

Loading

sushantmane Sep 27, 2022 •

edited

Loading

sushantmane Sep 27, 2022 •

edited

Loading