Skip to content

Commit

Permalink
chore(tsm1): skip WriteSnapshot during backup if snapshotter is busy
Browse files Browse the repository at this point in the history
When an InfluxDB database is very busy writing new points the backup
the process can fail because it can not write a new snapshot.

The error is: `operation timed out with error: create snapshot: snapshot in progress`.

This happens because InfluxDB takes almost "continuously" a snapshot
from the cache caused by the high number of points ingested.

This PR skips snapshots if the `snapshotter` does not come available
after three attempts when a backup is requested.

The backup won't contain the data in the cache or WAL.

Signed-off-by: Gianluca Arbezzano <gianarb92@gmail.com>
  • Loading branch information
Gianluca Arbezzano committed Jan 22, 2020
1 parent 3cf826f commit 4f359b3
Showing 1 changed file with 16 additions and 2 deletions.
18 changes: 16 additions & 2 deletions tsdb/engine/tsm1/engine.go
Expand Up @@ -1893,8 +1893,22 @@ func (e *Engine) WriteSnapshot() (err error) {
// CreateSnapshot will create a temp directory that holds
// temporary hardlinks to the underylyng shard files.
func (e *Engine) CreateSnapshot() (string, error) {
if err := e.WriteSnapshot(); err != nil {
return "", err
var err error
for i := 0; i < 3; i++ {
err = e.WriteSnapshot()
if err != nil {
switch err {
case ErrSnapshotInProgress:
backoff := time.Duration(math.Pow(3.8, float64(i))) * time.Millisecond
time.Sleep(backoff)
default:
return "", err
}
}
}

if err != nil {
e.logger.Info("WAL busy: Backup proceeding without WAL contents", zap.Error(err))
}

e.mu.RLock()
Expand Down

0 comments on commit 4f359b3

Please sign in to comment.