Save control_file to disk on compute disconnect #3836

petuhovskiy · 2023-03-16T17:54:56Z

There was an issue #3317 about lagging truncate_lsn. It was mostly fixed, but still can be reproduced with safekeeper restarts:

It's probably still can be reproduced by restarting safekeepers before flushing peer_horizon_lsn (truncate_lsn) to the disk, we should try to flush state to disk more often. To do that, we can trigger flushing when timeline becomes inactive (compute disconnects), on graceful shutdown, etc.

Before the fixes there were many timelines with MAX(backup_lsn) - MIN(disk_peer_horizon_lsn) around 16MB, and that triggered S3 download. We can try to get MIN(flush_lsn) - MIN(disk_peer_horizon_lsn) close to zero, should be not hard to do with additional flushes.

Originally posted by @petuhovskiy in #3317 (comment)

We should flush control_file to disk more aggresively:

when compute disconnects
Persist safekeeper control file once in a while. #4438
on safekeeper shutdown ?

Metrics that should be improved after fix (values given on the moment of issue creation):

MAX(backup_lsn) - MIN(disk_peer_horizon_lsn) – max=15223760, avg=3511177 (67 timelines in total where backup_lsn > disk_peer_horizon_lsn)
MIN(flush_lsn) - MIN(disk_peer_horizon_lsn) – max=16771592, avg=64808

The text was updated successfully, but these errors were encountered:

petuhovskiy · 2024-06-26T14:13:16Z

As an alternative, we can just set control file save interval to 10 seconds. It shouldn't affect disk performance and will make safekeepers always have fresh (~10 seconds) control file on disk.

petuhovskiy added t/bug Issue Type: Bug c/storage/safekeeper Component: storage: safekeeper labels Mar 16, 2023

petuhovskiy mentioned this issue May 6, 2024

log LSNs on walreceiver disconnections #7621

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save control_file to disk on compute disconnect #3836

Save control_file to disk on compute disconnect #3836

petuhovskiy commented Mar 16, 2023 •

edited

Loading

petuhovskiy commented Jun 26, 2024

Save control_file to disk on compute disconnect #3836

Save control_file to disk on compute disconnect #3836

Comments

petuhovskiy commented Mar 16, 2023 • edited Loading

petuhovskiy commented Jun 26, 2024

petuhovskiy commented Mar 16, 2023 •

edited

Loading