Skip to content
This repository has been archived by the owner on Jan 13, 2022. It is now read-only.

Possible locking between multiple flashcache devices on one host #212

Closed
byo opened this issue Sep 25, 2015 · 0 comments
Closed

Possible locking between multiple flashcache devices on one host #212

byo opened this issue Sep 25, 2015 · 0 comments

Comments

@byo
Copy link

byo commented Sep 25, 2015

Our test setup consists of 3 machines, each with 1 NVME and 12 HDD drives.
NVME is split into 12 parts, each one serving as caching layer for associated HDD drive.
Machines are used in ceph cluster, each NVME used as independently.

Non-default sysctl variables on all devices are:
skip_seq_thresh_kb = 256
reclaim_policy = 1

We've observed periodical stalls on flashcache devices during tests (basically no IO at all) lasting from few seconds to few minutes. Most of test io ops were crossing the sequential threshold. During such stall periods:

  • nr_queued value on all flashcache devices is large (>10k)
  • only one flashcache device is able to lower the nr_queued at a time - it's underlying HDD shows 100% util in iotop, all other HDD devices show some marginal values or no IO at all
  • if fallow_delay is set to 0, perf top shows flashcache_deq_pending at the top, the hot spot is inside it's internal for loop (near if (node->index == index) {)
  • if fallow_delay is set to 900, perf top shows either flachcache_clean_set or _raw_spin_lock_irq (called from flachcache_clean_set), hot spot is in one of flachcache_clean_set's for loops (near if (!(cacheblk->cache_state & DIRTY_FALLOW_2)))

From what I read in the code, I understand that flashcache uses global kernel thread pools for it's job. Is it possible all cleaning jobs are executed on one core only? That could explain why only one HDD can clean itself at the same time.

When skip_seq_thresh_kb = 0 and fallow_delay = 0, cluster can handle it's load without issues. If the dirty percent is near 99%, queues start filling up but they're cleaned up simultaneously and multiple HDDs are at 100% util.

Sysctl dump of a sample drive (configuration causing issues):

cache_all = 1
clean_on_read_miss = 0
clean_on_write_miss = 0
dirty_thresh_pct = 20
do_pid_expiry = 0
do_sync = 0
fallow_clean_speed = 2
fallow_delay = 900
fast_remove = 1
io_latency_hist = 0
lru_hot_pct = 75
lru_promote_thresh = 2
max_clean_ios_set = 2
max_clean_ios_total = 4
max_pids = 100
new_style_write_merge = 0
pid_expiry_secs = 60
reclaim_policy = 1
skip_seq_thresh_kb = 256
stop_sync = 0
zero_stats = 0
@byo byo closed this as completed Dec 8, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant