Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-tweak checkpoint frequency based on the observed storage device performance profile #1922

Closed
RichiH opened this Issue Aug 25, 2016 · 17 comments

Comments

Projects
None yet
6 participants
@RichiH
Copy link
Member

RichiH commented Aug 25, 2016

While AWS etc might hide this information, the common case on Linux 2.6.29 and beyond is:

cat /sys/block/sda/queue/rotational

@beorn7

@beorn7 beorn7 closed this Aug 25, 2016

@beorn7 beorn7 reopened this Aug 25, 2016

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Aug 25, 2016

sorry, touchpads auto-clicks...

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Aug 25, 2016

es, that's the way once you have arrived at your "real" device.

However, on modern servers, there might be a whole stack of devices on top of each other.
For example, my home directory on the server I'm working with right now, is mounted on device /dev/mapper/cr_home. That doesn't even have an entry in /sys/block. I have to run dmsetup ls to find out which volume group backs that device. In this case vg0. Then I have run the lvm tool to find out what devices back that volume group. In this case /dev/md1. Which is part of a raid. It has an entry /sys/block/md1/queue/rotational, but then I'm not even sure if that represents the state of the underlying hardware device, which has to be inquired separately with mdadm.

And that's all just one possible setup. We didn't even have overlay FSs or containers in the game.

@RichiH

This comment has been minimized.

Copy link
Member Author

RichiH commented Aug 25, 2016

Hmm, meh. In my limited test across a few VMs, I found all of them to expose this.

I do wonder if this is a case of "if we can detect it easily, we should" and ignoring the rest for now. It might be worthwhile to give the user information about not being able to detect this, but that would be a second step.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Sep 5, 2016

This issue would profit from a description why we want this. Do we have any proposals for handling spinning discs differently, that address existing performance issues and can be implemented with reasonable effort?

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Sep 5, 2016

Yeah, context is missing. That was all stemming from the default value of the -storage.local.checkpoint-dirty-series-limit flag, which should be increased to much more than the default value if you are on SSD

@RichiH

This comment has been minimized.

Copy link
Member Author

RichiH commented Sep 5, 2016

@flaviusanton

This comment has been minimized.

Copy link

flaviusanton commented Sep 14, 2016

Disclaimer: Sorry if I am out of context, I found out what Prometheus is about 30mins ago. Thought it's interesting, so I took a glance over the open issues.

It seems like this issue is an XY problem. You don't really care if the underlying hardware is a pure old HDD or an SSD, you just care if it can perform the checkpoint under 1 minute, which, in my understanding, means lots of IOPS and/or throughput (I guess it's the former). If you really want to auto determine the right value for storage.local.checkpoint-dirty-series-limit, why don't you run a quick benchmark at start time or some other time. Adding to @beorn7's point, nowadays lots of people use cloud services which have totally weird setups for storage; hell, you can even run on a RAMdisk, if you want to.

My 2¢.
Flavius

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Sep 14, 2016

A quick benchmark would be defeated by a writeback cache, and a benchmark of sufficient length would take too long.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Sep 14, 2016

In general, the idea might work. But let's not do a separate benchmark but observe the real-life behavior. The Prometheus could for a while try to run series maintenance as fast as possible and measure the timing. (High sustained maintenance frequency hints towards a device with fast seeks.) Same, it could evaluate the time needed for checkpointing (which we are measuring anyway, and here a fast checkpoint hints towards a device with fast linear writes).

@beorn7 beorn7 changed the title Detect if Prometheus is running on spinning disk or SSD. Auto-tweak checkpoint frequency based on the observed storage device performance profile Sep 14, 2016

@beorn7 beorn7 self-assigned this Sep 14, 2016

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Sep 14, 2016

Changed title accordingly. This would be in a similar area as the #455 : Essentially relieving the user from tuning many flags depending on the given situation.

@mattbostock

This comment has been minimized.

Copy link
Contributor

mattbostock commented Sep 15, 2016

I wonder if detecting the storage type is better done in configuration
management (using facts/grains etc), which can then set the appropriate
Prometheus configuration options.

On Wed, 14 Sep 2016, 10:21 Björn Rabenstein, notifications@github.com
wrote:

In general, the idea might work. But let's not do a separate benchmark but
observe the real-life behavior. The Prometheus could for a while try to run
series maintenance as fast as possible and measure the timing. (High
sustained maintenance frequency hints towards a device with fast seeks.)
Same, it could evaluate the time needed for checkpointing (which we are
measuring anyway, and here a fast checkpoint hints towards a device with
fast linear writes).


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1922 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AEJbsB_1Nzcmw_7icIZDpX9ZlXfs50DPks5qp7yXgaJpZM4JtA0m
.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Sep 15, 2016

I wonder if detecting the storage type is better done in configuration
management (using facts/grains etc), which can then set the appropriate
Prometheus configuration options.

Yes, that's the currently recommended way.

The whole point of this issue is that people want to have fewer command line flags to understand and tweak.

The result might very well be that auto-tuning turns out to be harmful and we'll just stick with the current way. But it's at least worth a thought.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Mar 27, 2017

Another aspect here is to checkpoint less often if the checkpoint is already quite large. Another vicious cycle looms here: There are too many chunks waiting for persistence already. Which causes a large checkpoint. Which causes large checkpointing times. Which takes a lot of disk bandwidth away from persisting chunks. Which increases the number of chunks waiting for persistence even more. Which makes the checkpoint even larger.

I have seen SSDs locking completely up for minutes after a checkpoint. They really don't like the large sustained write of a checkpoint in combination with the many smaller random writes to series files. Thing become really bad if the SSD is more than 70% full. (Which is why some people overprovision SSDs anyway, i.e. they only use 70% of the available space for the filesystem and leave the rest for the device as breathing room.)

We already prevent early checkpoints (because of the dirty series count) in rushed mode, but we probably need to extend that heuristics, like scale up checkpointing intervals by persistence urgency, or wait at least as long between checkpoints as the last checkpoint took.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Mar 27, 2017

wait at least as long between checkpoints as the last checkpoint took.

That sounds nice and simple to me.

I'd avoid putting too much work in here, as the new storage will obsolete it.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Apr 5, 2017

I'll put an easy fix in place to always wait at least as long after each checkpoint as the checkpoint took.

As @brian-brazil said, in v2.0, there won't be checkpoints anymore.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Apr 7, 2017

Closing as #2591 is as much as we will do about checkpointing before everybody uses v2.0...

@beorn7 beorn7 closed this Apr 7, 2017

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.