Safety of reducing PulsarV3 defaultRetentionTimeInMinutes from 7 days to 3 days in Milvus cluster #50339
Replies: 1 comment
-
|
In Pulsar, defaultRetentionTimeInMinutes is for "topic retention", ttlDurationDefaultInSeconds is for "message ttl/expiry".
If a datanode or streaming node crashes and restarts, it will read unacked messages from the consume checkpoint(we call recovery). Some different concepts here:
So, for your questions:
[BookKeeperClientWorker-OrderedExecutor-0-0] WARN org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://public/default/by-dev-rootcoord-dml_4] Failed to persist msg in store: Not enough non-faulty bookies available error code: -6 This error indicates too many bookleeper bookies are down, pulsar could not safely write the message. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Environment:
Current Pulsar broker config:
Problem statement:
We periodically run large-scale backfilling exercises (bulk-inserting historical data into multiple collections). During a recent backfill, the Pulsar BookKeeper ledger storage filled up, causing ingestion failures across the cluster. We have since added more storage, but we want a longer-term solution before the next backfill exercise.
We are considering reducing
defaultRetentionTimeInMinutesfrom10080 (7 days)to4320 (3 days)- matching our existingttlDurationDefaultInSeconds(3 days TTL). Our hypothesis is that since messages are already TTL'd at 3 days, the 7-day retention setting may be keeping acknowledged messages on disk unnecessarily.Questions for Zilliz team:
defaultRetentionTimeInMinutesfrom7 daysto3 dayswith no data loss or corruption risk? Specifically - does Milvus ever need to replay Pulsar messages older than 3 days for recovery (e.g. after a DataNode or StreamingNode crash)?ttlDurationDefaultInSecondsis already set to259200(3 days), are messages already being expired at 3 days regardless of the retention setting? If so, does the 7-daydefaultRetentionTimeInMinuteshave any practical effect on ledger disk usage in our setup?backlogQuotaDefaultLimitGB?
Related error from last ingestion failures:
[BookKeeperClientWorker-OrderedExecutor-0-0] WARN org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://public/default/by-dev-rootcoord-dml_4] Failed to persist msg in store: Not enough non-faulty bookies available error code: -6Since this is a production cluster exercise, we can't afford any data loss or data corruption hence we are reaching out to you for advise.
Beta Was this translation helpful? Give feedback.
All reactions