Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read-only cache caused data corruption #59

Open
ad65190 opened this issue Oct 3, 2020 · 17 comments
Open

read-only cache caused data corruption #59

ad65190 opened this issue Oct 3, 2020 · 17 comments

Comments

@ad65190
Copy link

ad65190 commented Oct 3, 2020

I am writing to share my experience with this fork of EnhanceIO. After searching for available solutions around SSD caching on Linux, I decided to at least try it out because it seemed like a good match for my needs. My OS is Ubuntu 20.04.1, kernel 5.4.0-48-lowlatency, user home is on an iSCSI target and the SSD cache is an NVMe on USB.

Compilation went fine, the modules were loaded and the cache was created successfully. I thought I was being careful by using the read-only cache mode and specifying block devices with IDs. I executed a few test runs, the SSD was blinking and /proc had statistics showing that the cache was in use.

I then rebooted the system to see how that works out. It didn't. The source disk array was marked dirty and the filesystem check revealed extensive data corruption and non-recoverable errors. I actually then repeated the process once more for good measure. Clean reinstall and reinitialisation of the cache device. The results were the same. Comparing with my backups, the errors occurred around the areas of the filesystem I touched during my tests.

How could a read-only cache damage the source device is beyond me. Perhaps someone more familiar with the project and the underlying code could shed some light. Time permitting, I am willing to assist in debugging.

@Ristovski
Copy link

Ristovski commented Oct 3, 2020

I have had similar disk 'corruption' happen, but every time (happened twice so far, with months of use) it was only due to a kernel hang/unexpected shutdown. Not sure what causes the source disk to corrupt, but the corruption persisted through fsck runs, until I disengaged the cache - meaning you should never run fsck with the cache mounted. No idea why this can even happen - since again - it's a read only cache.
Once the cache was disabled, fsck was able to recover the corrupted blocks with ease.

EnhanceIO is pretty nice, but by no means perfect - not to mention a recent 5.9 kernel block subsystem update broke it (some BIO related changes, which I am not sure how to incorporate into EnhanceIO). I suppose a more permanent solution like LVM cache or Bcache would be more appropriate for sensitive data.

@onlyjob
Copy link

onlyjob commented Oct 4, 2020

I suppose a more permanent solution like LVM cache or Bcache would be more appropriate for sensitive data.

IMHO dm-writeboost is the best alternative.

Bcache is probably the worst as it requires specially formatted device so it can not be used with existing block devices (with data). At least it was the case last time I've checked it years ago...

@Ristovski
Copy link

Ristovski commented Oct 4, 2020

@onlyjob Nice! I have not heard of dm-writeboost until now. It does seem like a good alternative - but my only concern is what seems to be significantly higher memory usage compared to EnhanceIO (does dm-writeboost support read-only cache? Would that alleviate the higher memory usage somewhat, and by what margin?).

Another drawback of using dm-writeboost, I guess, would be its higher cache metadata loading times (after a reboot), which according to the docs could take up to "a long time" depending on the cache size, but I have not seen a specific metric (for example - for 512GB cache).

@tanantharaman
Copy link

tanantharaman commented Feb 26, 2021

dm-writeboost does NOT support existing block devices with data, though there is an open enhancement request since 2017 to add that feature, specifically because EnhanceIO has it!

With dm-writeboost, just as with Bcache, you combine a HDD block-device and SSD block-device to form a dm-writeboost block-device that then has to be formated as ext4 or xfs using mkfs. All prior data on the HDD block device is lost.

However dm-writeboost can be used by Centos 7, while Bcache requires at least Centos 8 (Centos 7 is Linux 3.10 with certain packages removed, hence called Linux 3.10.0, so it cannot support Bcache which requires at least the full Linux 3.10 kernel).

@onlyjob
Copy link

onlyjob commented Feb 26, 2021

dm-writeboost does NOT support existing block devices with data

@tanantharaman, stop that nonsense already. I'm using dm-writeboost with existing devices (with data) all the time, ever since early releases.

dm-writeboost is unlike Bcache which requires pre-formatted device and not using log-structured write.

You have utterly failed your comparison research and misunderstood very much. :(

@Ristovski
Copy link

Ristovski commented Feb 26, 2021

@onlyjob do you happen to have a comparison of memory usage? What I could gather from the docs, it would appear that writeboost requires more memory (~1% of the cache drive capacity equivalent of ram). While EnhanceIO needs significantly less, but I have yet to calculate the exact difference.

@tanantharaman
Copy link

dm-writeboost does NOT support existing block devices with data

@tanantharaman, stop that nonsense already. I'm using dm-writeboost with existing devices (with data) all the time, ever since early releases.

dm-writeboost is unlike Bcache which requires pre-formatted device and not using log-structured write.

You have utterly failed your comparison research and misunderstood very much. :(

Well the developer of dm-writeboost does not seem to think so ? [https://github.com/akiradeveloper/dm-writeboost/issues/170]

Also the Centos7 rpm install example shows as the only use example creating a dm-writeboost device and then running mkfs on it : [https://github.com/kazuhisya/dm-writeboost-rpm]

If you know how to take an existing HDD block device formated with an existing fs (eg ext4 or xfs) and add a dm-writeboost SSD cache to it without losing any data, could you please explain how you did it ? This would help a lot of folks, including the author of dm-writeboost who believes adoption of dm-writeboost is lagging because of the lack of this feature.

@Pix13
Copy link

Pix13 commented Feb 26, 2021

@tanantharaman i use dm-writeboost, and i have setup it over a 4-years old raid5 device, it does NOT requires to recreate the backed device, you can enable or disable it as you wish ( the only drawback i found is that it duplicates the UUID of the backed device, so i had to use a path-based fstab entry instead of an UUID one to ensure i was using dm-writeboost). So no, only Bcache requires a "fresh start "

@tanantharaman
Copy link

tanantharaman commented Feb 26, 2021

@tanantharaman i use dm-writeboost, and i have setup it over a 4-years old raid5 device, it does NOT requires to recreate the backed device, you can enable or disable it as you wish ( the only drawback i found is that it duplicates the UUID of the backed device, so i had to use a path-based fstab entry instead of an UUID one to ensure i was using dm-writeboost). So no, only Bcache requires a "fresh start "

So what step did the developer of dm-writeboost get wrong in the following quickstart summary ? [https://docs.google.com/presentation/d/1v-L8Ma138o7jNBFqRl0epyc1Lji3XhUH1RGj8p7DVe8/edit#slide=id.g9da4ead67_0_61]

Do you simply skip the last step (mkfs) and mount the dm-writeboost device /dev/mapper/mylv with the same fstab entry as the original backing device /dev/myraid ?

BTW Bcache has a package called blocks that can convert an existing formatted backing drive into a Bcache device without data loss, provided there is enough unused space before the start of the formatted partition of the original block device, OR you can use LVM to prepend a small extra partition to the original block device.

@onlyjob
Copy link

onlyjob commented Feb 26, 2021

I don't have comparison for RAM usage. IMHO it does not matter because the challenge is to use the right caching technology.
I've eliminated Bcache after brief trial then used EnhanceIO for some time but was never confident enough with it because of weak upstream support, long durations of breakage on upgraded kernels and ultimately not very safe method of attachment. (That's why EnhanceIO never left Debian/experimental). Back in those days I've experimented with several caching technologies and dm-writeboost was by far superior with excellent stability/reliability and very competent and responsive upstream. So dm-writeboost simply have no alternatives. IMHO its design is the most elegant and it is pragmatically more useful than any other SSD caching technology I'm aware of.

@tanantharaman, you just need to pay attention to dm-writeboost documentation. That's where we're all learned how to use it for disks with pre-existing data.

If you know how to take an existing HDD block device formated with an existing fs (eg ext4 or xfs) and add a dm-writeboost SSD cache to it without losing any data, could you please explain how you did it ?

Sure. :) I wrote a small wrapper script (which is mentioned on dm-writeboost page) and introduced it to Debian. My package comes with a .service file, self-documented /etc/writeboosttab example and man pages.

@tanantharaman
Copy link

tanantharaman commented Feb 26, 2021

@onlyjob : Your response is much appreciated. I am stuck with Centos7 and cannot use Bcache so dm-writeboost seemed to be the only option but I was dreading having to do a full backup and restore.

I alerted akiradeveloper that issue 170 (linked above) no longer requires a solution and can be closed with a link to your writeboost script.

@tanantharaman
Copy link

@Ristovski : EnhanceIO with basic settings (random cache replacement policy) uses 4 bytes per 4KB SSD block, so around 0.1% of SSD. However if you choose FIFO cache replacement policy (similar to dm-writeboost ?) it doubles to 8 bytes per 4KB SSD block. If you choose LRU cache replacement policy (best SSD utilization), it goes up to 12 bytes per 4KB SSD block (around 0.3%). Those are average values relying on approx. 2x RAM memory data compression utilized by EnhanceIO. So EnhanceIO uses anywhere from 1/10 to 3/10 as much RAM memory as dm-writeboost, on average.

@Ristovski
Copy link

@tanantharaman Indeed! The memory footprint of dm-writeboost sadly is not insignificant, especially for setups with low amounts of RAM. Looking at the code, a trivial (but sadly /very/ small) 'optimization' can be done, but only for read-only caches - keeping dirtiness (as far as I can tell) does not seem to be necessary for RO caching, which shaves off whopping 2 bytes per metadata block entry, which is essentially useless. The biggest contributor seems to be the hashtable itself - hlist_node comes in at 16 bytes.

@onlyjob
Copy link

onlyjob commented Feb 27, 2021

I've found read-only cache to be less useful than acceleration of write. Read-only cache tends to produce negative impact on performance under heavy I/O (especially multi-threaded) because almost every read from cached device adds a write operation to caching device and all expectations for performance improvement are easily eaten away if SSD remains busy processing writes that have low cache hit ratio.

Read caching on SSD helps to decrease access time only when file system is not actively used.
When file system is under heavy IO, flash-caching tends to make it slower and actually increase access time.

Write-only caching (buffering) on SSD takes away most of seek stress from rotational HDDs.

Test your configuration and you might be surprised how easy it is to make system slower by making your caching SSD a performance bottleneck.

@Ristovski
Copy link

@onlyjob I am aware, however my specific use-case has very little benefit from write-caching as there are very few writes being done in the first place, as opposed to a lot of frequent reads. dm-writeboost would seem like a good fit but sadly this is a memory-constrained system so the RAM usage difference between it and EnhanceIO is very apparent. Since my use-case is quite 'niche', I might end up experimenting with rolling my own caching driver ala EnhanceIO, but exclusively for read caching (might end up being a fun summer project).

@onlyjob
Copy link

onlyjob commented Mar 1, 2021

@Ristovski, have you actually measured the difference in RAM usage between dm-writeboost and EnhanceIO or your comparison is purely theoretical?
I think dm-writeboost might fit your use case well enough if you care to try it.

@Ristovski
Copy link

@onlyjob So far purely theoretical, I might conduct some testing later this week but from what I could gather it should be pretty close to my RAM usage estimate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants