Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add flag for size based retention #5109

Merged
merged 4 commits into from Jan 18, 2019
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
22 changes: 18 additions & 4 deletions cmd/prometheus/main.go
Expand Up @@ -41,10 +41,10 @@ import (
"github.com/prometheus/common/model"
"github.com/prometheus/common/version"
prom_runtime "github.com/prometheus/prometheus/pkg/runtime"
"gopkg.in/alecthomas/kingpin.v2"
kingpin "gopkg.in/alecthomas/kingpin.v2"
"k8s.io/klog"

"github.com/mwitkow/go-conntrack"
conntrack "github.com/mwitkow/go-conntrack"
"github.com/prometheus/common/promlog"
promlogflag "github.com/prometheus/common/promlog/flag"
"github.com/prometheus/prometheus/config"
Expand Down Expand Up @@ -171,9 +171,15 @@ func main() {
"Size at which to split the tsdb WAL segment files (e.g. 100MB)").
Hidden().PlaceHolder("<bytes>").BytesVar(&cfg.tsdb.WALSegmentSize)

a.Flag("storage.tsdb.retention", "How long to retain samples in storage.").
a.Flag("storage.tsdb.retention", "[DEPRECATED] How long to retain samples in storage. This flag has been deprecated, use \"storage.tsdb.retention.time\" instead").
Default("15d").SetValue(&cfg.tsdb.Retention)

a.Flag("storage.tsdb.retention.time", "How long to retain samples in storage.").
Default("15d").SetValue(&cfg.tsdb.RetentionDuration)

a.Flag("storage.tsdb.retention.size", "[EXPERIMENTAL] Maximum number of bytes that can be stored for blocks. This flag is experimental and can be changed in future releases.").
Default("0").Int64Var(&cfg.tsdb.MaxBytes)

a.Flag("storage.tsdb.no-lockfile", "Do not create lockfile in data directory.").
Default("false").BoolVar(&cfg.tsdb.NoLockfile)

Expand Down Expand Up @@ -245,14 +251,22 @@ func main() {
cfg.web.RoutePrefix = "/" + strings.Trim(cfg.web.RoutePrefix, "/")

if cfg.tsdb.MaxBlockDuration == 0 {
cfg.tsdb.MaxBlockDuration = cfg.tsdb.Retention / 10
cfg.tsdb.MaxBlockDuration = cfg.tsdb.RetentionDuration / 10
}

promql.LookbackDelta = time.Duration(cfg.lookbackDelta)
promql.SetDefaultEvaluationInterval(time.Duration(config.DefaultGlobalConfig.EvaluationInterval))

logger := promlog.New(&cfg.promlogConfig)

defaultDuration, err := model.ParseDuration("15d")
if err != nil {
panic(err)
}
if cfg.tsdb.Retention != defaultDuration {
level.Warn(logger).Log("deprecation_notice", `"storage.tsdb.retention" flag is deprecated use "storage.tsdb.retention.time" instead.`)
}

// Above level 6, the k8s client would log bearer tokens in clear-text.
klog.ClampLevel(6)
klog.SetLogger(log.With(logger, "component", "k8s_client_runtime"))
Expand Down
6 changes: 5 additions & 1 deletion docs/storage.md
Expand Up @@ -52,7 +52,9 @@ For further details on file format, see [TSDB format](https://github.com/prometh
Prometheus has several flags that allow configuring the local storage. The most important ones are:

* `--storage.tsdb.path`: This determines where Prometheus writes its database. Defaults to `data/`.
* `--storage.tsdb.retention`: This determines when to remove old data. Defaults to `15d`.
* `--storage.tsdb.retention.time`: This determines when to remove old data. Defaults to `15d`.
* `--storage.tsdb.retention.size`: [EXPERIMENTAL] This determines the maximum number of bytes that storage blocks can use (note that this does not include the WAL size, which can be substantial). The oldest data will be removed first. Defaults to `0` or disabled. This flag is experimental and can be changed in future releases.
* `--storage.tsdb.retention`: This flag has been deprecated in favour of `storage.tsdb.retention.time`.

On average, Prometheus uses only around 1-2 bytes per sample. Thus, to plan the capacity of a Prometheus server, you can use the rough formula:

Expand All @@ -64,6 +66,8 @@ To tune the rate of ingested samples per second, you can either reduce the numbe

If your local storage becomes corrupted for whatever reason, your best bet is to shut down Prometheus and remove the entire storage directory. However, you can also try removing individual block directories to resolve the problem. This means losing a time window of around two hours worth of data per block directory. Again, Prometheus's local storage is not meant as durable long-term storage.

If both time and size retention policies are specified, whichever policy triggers first will be used at that instant.

## Remote storage integrations

Prometheus's local storage is limited by single nodes in its scalability and durability. Instead of trying to solve clustered storage in Prometheus itself, Prometheus has a set of interfaces that allow integrating with remote storage systems.
Expand Down
38 changes: 36 additions & 2 deletions storage/tsdb/tsdb.go
Expand Up @@ -108,6 +108,9 @@ type adapter struct {

// Options of the DB storage.
type Options struct {
// The interval at which the write ahead log is flushed to disc.
WALFlushInterval time.Duration

// The timestamp range of head blocks after which they get persisted.
// It's the minimum duration of any persisted block.
MinBlockDuration model.Duration
Expand All @@ -118,9 +121,15 @@ type Options struct {
// The maximum size of each WAL segment file.
WALSegmentSize units.Base2Bytes

// Duration for how long to retain data.
// Deprecated, use RetentionDuration.
Retention model.Duration

// Duration for how long to retain data.
RetentionDuration model.Duration

// Maximum number of bytes to be retained.
MaxBytes int64

// Disable creation and consideration of lockfile.
NoLockfile bool
}
Expand Down Expand Up @@ -167,6 +176,9 @@ func registerMetrics(db *tsdb.DB, r prometheus.Registerer) {

// Open returns a new storage backed by a TSDB database that is configured for Prometheus.
func Open(path string, l log.Logger, r prometheus.Registerer, opts *Options) (*tsdb.DB, error) {

retention := ChooseRetention(opts.Retention, opts.RetentionDuration)

if opts.MinBlockDuration > opts.MaxBlockDuration {
opts.MaxBlockDuration = opts.MinBlockDuration
}
Expand All @@ -183,7 +195,8 @@ func Open(path string, l log.Logger, r prometheus.Registerer, opts *Options) (*t

db, err := tsdb.Open(path, l, r, &tsdb.Options{
WALSegmentSize: int(opts.WALSegmentSize),
RetentionDuration: uint64(time.Duration(opts.Retention).Seconds() * 1000),
RetentionDuration: uint64(time.Duration(retention).Seconds() * 1000),
MaxBytes: opts.MaxBytes,
BlockRanges: rngs,
NoLockfile: opts.NoLockfile,
})
Expand All @@ -195,6 +208,27 @@ func Open(path string, l log.Logger, r prometheus.Registerer, opts *Options) (*t
return db, nil
}

// ChooseRetention is some roundabout code to support both RetentionDuration and Retention (for different flags).
// If Retention is 15d, then it means that the default value is set and the value of RetentionDuration is used.
func ChooseRetention(oldFlagDuration, newFlagDuration model.Duration) model.Duration {
defaultDuration, err := model.ParseDuration("15d")
gouthamve marked this conversation as resolved.
Show resolved Hide resolved
if err != nil {
panic(err)
}

retention := oldFlagDuration
if retention == defaultDuration {
retention = newFlagDuration
}

// Further if both the flags are set, then RetentionDuration takes precedence.
if newFlagDuration != defaultDuration {
retention = newFlagDuration
}

return retention
}

// StartTime implements the Storage interface.
func (a adapter) StartTime() (int64, error) {
var startTime int64
Expand Down
31 changes: 31 additions & 0 deletions storage/tsdb/tsdb_test.go
Expand Up @@ -61,3 +61,34 @@ func TestMetrics(t *testing.T) {
testutil.Equals(t, 0.003, metrics.Gauge.GetValue())

}

func TestChooseRetention(t *testing.T) {
defaultRetention, err := model.ParseDuration("15d")
testutil.Ok(t, err)

retention1, err := model.ParseDuration("20d")
testutil.Ok(t, err)
retention2, err := model.ParseDuration("30d")
testutil.Ok(t, err)

cases := []struct {
oldFlagRetention model.Duration
newFlagRetention model.Duration

chosen model.Duration
}{
// Case 1: both are default (unset flags).
{defaultRetention, defaultRetention, defaultRetention},
// Case 2: old flag is set and new flag is unset.
{retention1, defaultRetention, retention1},
// Case 3: old flag is unset and new flag is set.
{defaultRetention, retention2, retention2},
// Case 4: both flags are set.
{retention1, retention2, retention2},
}

for _, tc := range cases {
retention := tsdb.ChooseRetention(tc.oldFlagRetention, tc.newFlagRetention)
testutil.Equals(t, tc.chosen, retention)
}
}