Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redefine local low and high watermarks for enhanced log stream clarity #708

Open
ijsong opened this issue Feb 18, 2024 · 0 comments
Open
Assignees

Comments

@ijsong
Copy link
Member

ijsong commented Feb 18, 2024

Description

The StorageNode includes a LogStreamReplicaMetadata RPC that supplies metadata for a log stream replica. This RPC's response utilizes local low and high watermarks to denote log entries' first and last positions. These markers are pivotal in identifying the log range within a stream at any moment. Initially, both watermarks are set to {LLSN: 0, GLSN: 0}, indicating an empty log stream. As logs are appended, the watermarks are updated to reflect the stored log range accurately.

A notable limitation emerges when all logs within a stream are trimmed, leading both watermarks to revert to {LLSN: 0, GLSN: 0}. This reversion makes it challenging to differentiate between a newly created log stream and one that has undergone complete trimming since both instances display identical watermark values. As a result, means are absent for users to discern previously stored historical log sequence numbers (LSNs) or to predict future LSNs based on current watermark values.

Example for Clarification

  • Initial State: A new log stream starts with no logs, and both local low and high watermarks are {LLSN: 0, GLSN: 0}.
  • After Writing Logs: Adding ten logs to the stream adjusts the watermarks to indicate the new range, setting the local low watermark at {LLSN: 1, GLSN: 1} and the high watermark at {LLSN: 10, GLSN: 10}.
  • After Trimming Logs: Removing the first five logs changes the watermarks to {LLSN: 6, GLSN: 6} for the low and {LLSN: 10, GLSN: 10} for the high, indicating the presence of logs 6 through 10.
  • After Complete Trimming: Trimming all logs from the stream resets the watermarks to {LLSN: 0, GLSN: 0}. This obscures the history of operations and future log position.

Proposal for Improvement

To overcome this issue, I suggest redefining the local low and high watermarks as follows:

  • Local Low Watermark: Should indicate the position following the last trimmed log among stored logs. For a newly initiated log stream, it would be set to 1. After trimming all logs entirely, it would adjust to 11 in the given example, denoting the starting position for future logs.
  • Local High Watermark: Should mark the position succeeding the last stored log, indicating where the following log entry will be placed. Like the low watermark, it would adjust to 11 after complete trimming, ensuring consistency in the log range definition.

This redefinition guarantees that:

  • A local low and high watermark of {LLSN: 1, GLSN: 1} signals a newly initiated log stream.
  • Identical local low and high watermarks, aside from {LLSN: 1, GLSN: 1}, indicate a fully trimmed log stream.

Impact

Implementing these changes will substantially improve the clarity and utility of our log stream data. It will enable a more intuitive understanding of a log stream's current status and history, especially concerning trimming operations.

Alternatives

We could maintain the current semantics of local low and high watermarks and introduce an additional field in the LogStreamReplicaMetadataDescriptor to signal the following log sequence number for storage. This approach would keep the initial and fully trimmed log stream's local low and high watermarks at {LLSN: 0, GLSN: 0}, while the following log sequence number would start at {LLSN:1, GLSN: 1} for a new stream and shift to {LLSN: 11, GLSN: 11} post-trimming, facilitating differentiation between new and fully trimmed streams by this subsequent log sequence number. Although this requires adding a new field to the LogStreamReplicaMetadata response, it preserves the current watermark semantics.

Request for Comments

I invite other developers to offer feedback on this proposal, focusing on potential implications or enhancements that could further refine our approach to managing and representing log stream watermarks.

@ijsong ijsong self-assigned this Feb 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant