Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support record rotation for overwriting old records #186

Open
elon0823 opened this issue Jun 28, 2022 · 5 comments
Open

Support record rotation for overwriting old records #186

elon0823 opened this issue Jun 28, 2022 · 5 comments
Assignees
Labels
enhancement Enhancement of existing feature

Comments

@elon0823
Copy link
Contributor

elon0823 commented Jun 28, 2022

  • Currently, topic record is not �shrinking, but only growing.
  • Old record should be deleted or overwritten with new record
@elon0823 elon0823 added the enhancement Enhancement of existing feature label Jun 28, 2022
@1dennispark
Copy link
Contributor

1dennispark commented Jun 28, 2022

This is my proposal idea. I think that data has period at created time. Can the time be used by our deletion point?
For this work, we need additional configuration for data-priod value.

@elon0823
Copy link
Contributor Author

elon0823 commented Jun 28, 2022

I think, using a data-period as deletion strategy is convenient, but it could be dangerous like such case that massive data published without the data-period elapsed. Just to distinguish old data, using the offset is far enough.

But, it seems good when it comes to using the data-period to grouping the data as an age of it so that the broker can delete records with the oldest data-period when the disk is almost full.

@elon0823
Copy link
Contributor Author

elon0823 commented Jul 9, 2022

New column family for data-period

I had considered to append the data-period to value of the record, but I thought it is much more convenient to add new column family for data-period when it comes to finding expired records.

Proposed Idea

Add new column family(ExpCF) for data-period with designed row-key below.

byte-expression : [exp-timestamp][topic-name][fragment-id][offset]
byte-length     :      8(uint64)      any      1(uint8)   8(uint64)

When broker received a record with the data-period, calculate the timestamp of expiration date(exp-timestamp) from the data-period.
Then, the expiration detector will iterate the expiration-date-ordered ExpCF column family from start and easily find out sequence of records to erase.
And it takes O(1) time on finding each record to delete because the [topic-name][fragment-id][offset] is the row-key of each record.

In addition, the data-period field should be added to publish message. How much do you think the max and default period of data? @1dennispark

@elon0823
Copy link
Contributor Author

change terminology data-period to retention-period

@elon0823
Copy link
Contributor Author

elon0823 commented Jul 13, 2022

Is it okay to delete the oldest records when the disk is full?
We should consider additional policies for data-period to delete records properly when the system's disk is full. (Such as limiting the total number of records that can be stored per topic)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement of existing feature
Projects
None yet
Development

No branches or pull requests

2 participants