Skip to content
Permalink
Browse files

block_cache: Compute the timeout dynamically based on write speeds.

We now record how long it takes to write a block (on average), and then
utilize this information to reduce the timeout write thread's timeout
(to 2 * block_count * average_block_time, so we don't completely
congest the drive.) Remove the "TODO" about the I/O scheduler;
this new logic will be just fine even under an I/O scheduler.

Note that this change goes both ways: while faster writes mean more
writes and quicker, slower writes will increase the timeout before
we do another one also. This then also guards against queueing
another write while one is already in progress, which was
not handled before.

Tested in KVM. Even on a SATA-backed spinning HDD, this reduces
the timeout to around *200ms* on average (!!), so a 10x improvement.
On a ramdisk, it reduces the timeout to *10-30ms* (!!!) on average,
so a 100-200x improvement, so this change will benefit everyone
but SSDs especially.

Since BFS inode and journal writes always go through the block_cache,
this very dramatically improves inode-related write performance.
The "stop and start" stutters when emptying or moving items to Trash
seem totally gone, among a lot of other things.

Change-Id: I41f46a6432ce1f50f896a853abdfe22dde0ba327
  • Loading branch information...
waddlesplash committed Jul 11, 2019
1 parent 9d06690 commit 6d336fda4aca32649de3e1d91403da4452f4bef8
Showing with 50 additions and 8 deletions.
  1. +50 −8 src/system/kernel/cache/block_cache.cpp
@@ -226,6 +226,9 @@ struct block_cache : DoublyLinkedListLinkImpl<block_cache> {
uint32 busy_writing_count;
bool busy_writing_waiters;

bigtime_t last_block_write;
bigtime_t last_block_write_duration;

uint32 num_dirty_blocks;
bool read_only;

@@ -1203,6 +1206,8 @@ BlockWriter::Write(cache_transaction* transaction, bool canUnlock)
qsort(fBlocks, fCount, sizeof(void*), &_CompareBlocks);
fDeletedTransaction = false;

bigtime_t start = system_time();

for (uint32 i = 0; i < fCount; i++) {
status_t status = _WriteBlock(fBlocks[i]);
if (status != B_OK) {
@@ -1216,9 +1221,17 @@ BlockWriter::Write(cache_transaction* transaction, bool canUnlock)
}
}

bigtime_t finish = system_time();

if (canUnlock)
mutex_lock(&fCache->lock);

if (fStatus == B_OK && fCount >= 8) {
fCache->last_block_write = finish;
fCache->last_block_write_duration = (fCache->last_block_write - start)
/ fCount;
}

for (uint32 i = 0; i < fCount; i++)
_BlockDone(fBlocks[i], transaction);

@@ -1392,6 +1405,8 @@ block_cache::block_cache(int _fd, off_t numBlocks, size_t blockSize,
busy_reading_waiters(false),
busy_writing_count(0),
busy_writing_waiters(0),
last_block_write(0),
last_block_write_duration(0),
num_dirty_blocks(0),
read_only(readOnly)
{
@@ -2538,8 +2553,8 @@ get_next_locked_block_cache(block_cache* last)
static status_t
block_notifier_and_writer(void* /*data*/)
{
const bigtime_t kTimeout = 2000000LL;
bigtime_t timeout = kTimeout;
const bigtime_t kDefaultTimeout = 2000000LL;
bigtime_t timeout = kDefaultTimeout;

while (true) {
bigtime_t start = system_time();
@@ -2552,15 +2567,32 @@ block_notifier_and_writer(void* /*data*/)
continue;
}

// write 64 blocks of each block_cache every two seconds
// TODO: change this once we have an I/O scheduler
timeout = kTimeout;
// Write 64 blocks of each block_cache roughly every 2 seconds,
// potentially more or less depending on congestion and drive speeds
// (usually much less.) We do not want to queue everything at once
// because a future transaction might then get held up waiting for
// a specific block to be written.
timeout = kDefaultTimeout;
size_t usedMemory;
object_cache_get_usage(sBlockCache, &usedMemory);

block_cache* cache = NULL;
while ((cache = get_next_locked_block_cache(cache)) != NULL) {
// Give some breathing room: wait 2x the length of the potential
// maximum block count-sized write between writes, and also skip
// if there are more than 16 blocks currently being written.
const bigtime_t next = cache->last_block_write
+ cache->last_block_write_duration * 2 * 64;
if (cache->busy_writing_count > 16 || system_time() < next) {
if (cache->last_block_write_duration > 0) {
timeout = min_c(timeout,
cache->last_block_write_duration * 2 * 64);
}
continue;
}

BlockWriter writer(cache, 64);
bool hasMoreBlocks = false;

size_t cacheUsedMemory;
object_cache_get_usage(cache->buffer_cache, &cacheUsedMemory);
@@ -2573,10 +2605,11 @@ block_notifier_and_writer(void* /*data*/)

while (iterator.HasNext()) {
cached_block* block = iterator.Next();
if (block->CanBeWritten() && !writer.Add(block))
if (block->CanBeWritten() && !writer.Add(block)) {
hasMoreBlocks = true;
break;
}
}

} else {
TransactionTable::Iterator iterator(cache->transaction_hash);

@@ -2594,13 +2627,22 @@ block_notifier_and_writer(void* /*data*/)

bool hasLeftOvers;
// we ignore this one
if (!writer.Add(transaction, hasLeftOvers))
if (!writer.Add(transaction, hasLeftOvers)) {
hasMoreBlocks = true;
break;
}
}
}

writer.Write();

if (hasMoreBlocks && cache->last_block_write_duration > 0) {
// There are probably still more blocks that we could write, so
// see if we can decrease the timeout.
timeout = min_c(timeout,
cache->last_block_write_duration * 2 * 64);
}

if ((block_cache_used_memory() / B_PAGE_SIZE)
> vm_page_num_pages() / 2) {
// Try to reduce memory usage to half of the available

0 comments on commit 6d336fd

Please sign in to comment.
You can’t perform that action at this time.