New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

writeback mode? #23

Open
syadnom opened this Issue Dec 18, 2017 · 4 comments

Comments

Projects
None yet
2 participants
@syadnom

syadnom commented Dec 18, 2017

I have a use-case where writeback mode is ideal. To clarify so we both know what I mean, I want to consider the write complete when it's written to the rdX device then write that to disk async.

I'd also like to see a timed flush and delay write.

here is the scenario

many small writes come in, enough to overwhelm a spindled disks random writes
some reads come in and that hurts random writes even more.
data is at a high enough volume that it does need written pretty soon or the cache will overflow.

want:
write to rdX, rd waits to write that to disk for a timer, then flushes to disk creating a write-deferred cache. This turns overwhelming random writes into a comfortable amount of sequential writes.

This might look like a corner case, but it's a very common scenario in security camera storage systems. Might only have ~12MBps in writes, but those are random and overcome a WD Red quickly. A Write-Deferred option solves this 100%.

There is evidence of this too. The Windows program PrimoCache does exactly this and it allows a windows server to handle 4x as many streams easily.

@pkoutoupis pkoutoupis self-assigned this Dec 18, 2017

@pkoutoupis

This comment has been minimized.

Show comment
Hide comment
@pkoutoupis

pkoutoupis Dec 20, 2017

Owner

Before I rushed to reply to this one, I wanted to give it the proper amount of thought. As you are aware, RapidDisk-Cache is lacking in a write-back mode. To date, that has been by design. Traditional RAM is volatile and by enabling a write-back caching mode without some form of not losing cached data in RAM would result in user data loss or possibly corruption. If for instance, there were a BBU holding the PC in a powered state, an emergency "flush" could be invoked to persist all pending and cached data.

Although, even if that problem were easily solved, implementing a write-back solution is no trivial matter, even if you are only holding X amount of transactions at a time. If you wanted to accomplish something similar and assuming that you are using Ext4, you can easily reduce the pressure of the file system to commit journal contents less frequently. For instance, setting a commit interval of 10 seconds as opposed to the default 5 may show a nice bump in performance and will also allow the file system and in turn the block I/O subsystem to group contiguous writes into few and larger commands. This could potentially reduce the bottleneck when accessing those disk drives. Another thing worth considering is enabling delayed allocation for the same file system. What this feature does is, it holds off on allocating extents or block regions on the disk device by leaving the contents in memory cache for a bit longer. You have a higher probability of allocating larger contiguous block regions when time comes for that data to flush.

Either way, both cases, when mishandled, can and will result in data loss. At least for pending data between each sync interval. Such are the woes of leveraging DRAM...

Owner

pkoutoupis commented Dec 20, 2017

Before I rushed to reply to this one, I wanted to give it the proper amount of thought. As you are aware, RapidDisk-Cache is lacking in a write-back mode. To date, that has been by design. Traditional RAM is volatile and by enabling a write-back caching mode without some form of not losing cached data in RAM would result in user data loss or possibly corruption. If for instance, there were a BBU holding the PC in a powered state, an emergency "flush" could be invoked to persist all pending and cached data.

Although, even if that problem were easily solved, implementing a write-back solution is no trivial matter, even if you are only holding X amount of transactions at a time. If you wanted to accomplish something similar and assuming that you are using Ext4, you can easily reduce the pressure of the file system to commit journal contents less frequently. For instance, setting a commit interval of 10 seconds as opposed to the default 5 may show a nice bump in performance and will also allow the file system and in turn the block I/O subsystem to group contiguous writes into few and larger commands. This could potentially reduce the bottleneck when accessing those disk drives. Another thing worth considering is enabling delayed allocation for the same file system. What this feature does is, it holds off on allocating extents or block regions on the disk device by leaving the contents in memory cache for a bit longer. You have a higher probability of allocating larger contiguous block regions when time comes for that data to flush.

Either way, both cases, when mishandled, can and will result in data loss. At least for pending data between each sync interval. Such are the woes of leveraging DRAM...

@syadnom

This comment has been minimized.

Show comment
Hide comment
@syadnom

syadnom Dec 20, 2017

syadnom commented Dec 20, 2017

@pkoutoupis

This comment has been minimized.

Show comment
Hide comment
@pkoutoupis

pkoutoupis Dec 22, 2017

Owner

XFS is by far the preferred filesystem for these workloads.

In my experience, XFS also suffers with small file data, more specifically on reads. But I imagine that this is not the typical I/O profile hitting these disks anyway. They are probably very write heavy and read less frequently (on an-as-needed basis).

I think rapiddisk is the closest project to accomplishing
a true delayed-write cache because it's already able to do
regular caching that is seamless.

Well, it doesn't yet but let me give this some thought and see how much work it would take (i.e. to batch writes at set intervals).

Owner

pkoutoupis commented Dec 22, 2017

XFS is by far the preferred filesystem for these workloads.

In my experience, XFS also suffers with small file data, more specifically on reads. But I imagine that this is not the typical I/O profile hitting these disks anyway. They are probably very write heavy and read less frequently (on an-as-needed basis).

I think rapiddisk is the closest project to accomplishing
a true delayed-write cache because it's already able to do
regular caching that is seamless.

Well, it doesn't yet but let me give this some thought and see how much work it would take (i.e. to batch writes at set intervals).

@syadnom

This comment has been minimized.

Show comment
Hide comment
@syadnom

syadnom Dec 22, 2017

syadnom commented Dec 22, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment