Possible locking between multiple flashcache devices on one host #212

byo · 2015-09-25T14:57:45Z

Our test setup consists of 3 machines, each with 1 NVME and 12 HDD drives.
NVME is split into 12 parts, each one serving as caching layer for associated HDD drive.
Machines are used in ceph cluster, each NVME used as independently.

Non-default sysctl variables on all devices are:
skip_seq_thresh_kb = 256
reclaim_policy = 1

We've observed periodical stalls on flashcache devices during tests (basically no IO at all) lasting from few seconds to few minutes. Most of test io ops were crossing the sequential threshold. During such stall periods:

nr_queued value on all flashcache devices is large (>10k)
only one flashcache device is able to lower the nr_queued at a time - it's underlying HDD shows 100% util in iotop, all other HDD devices show some marginal values or no IO at all
if fallow_delay is set to 0, perf top shows flashcache_deq_pending at the top, the hot spot is inside it's internal for loop (near if (node->index == index) {)
if fallow_delay is set to 900, perf top shows either flachcache_clean_set or _raw_spin_lock_irq (called from flachcache_clean_set), hot spot is in one of flachcache_clean_set's for loops (near if (!(cacheblk->cache_state & DIRTY_FALLOW_2)))

From what I read in the code, I understand that flashcache uses global kernel thread pools for it's job. Is it possible all cleaning jobs are executed on one core only? That could explain why only one HDD can clean itself at the same time.

When skip_seq_thresh_kb = 0 and fallow_delay = 0, cluster can handle it's load without issues. If the dirty percent is near 99%, queues start filling up but they're cleaned up simultaneously and multiple HDDs are at 100% util.

Sysctl dump of a sample drive (configuration causing issues):

cache_all = 1
clean_on_read_miss = 0
clean_on_write_miss = 0
dirty_thresh_pct = 20
do_pid_expiry = 0
do_sync = 0
fallow_clean_speed = 2
fallow_delay = 900
fast_remove = 1
io_latency_hist = 0
lru_hot_pct = 75
lru_promote_thresh = 2
max_clean_ios_set = 2
max_clean_ios_total = 4
max_pids = 100
new_style_write_merge = 0
pid_expiry_secs = 60
reclaim_policy = 1
skip_seq_thresh_kb = 256
stop_sync = 0
zero_stats = 0

The text was updated successfully, but these errors were encountered:

byo closed this as completed Dec 8, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible locking between multiple flashcache devices on one host #212

Possible locking between multiple flashcache devices on one host #212

byo commented Sep 25, 2015

Possible locking between multiple flashcache devices on one host #212

Possible locking between multiple flashcache devices on one host #212

Comments

byo commented Sep 25, 2015