-
Notifications
You must be signed in to change notification settings - Fork 797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LFSv2: inline files performance drop #203
Comments
Hi @pasi-ojala-tridonic, thanks for creating an issue. This is a part of how littlefs works. At its core, littlefs is built out of a bunch of small 2-block logs (metadata pairs, similar to journals). These logs are appended with new information as the filesystem is written to. Eventually they get full and need to be garbage collected to remove outdated entries. Unfortunately this garbage collection is expensive (O(erase size^2)). The main approach littlefs takes to prevent this from becoming unusable is to keep the logs small and limited to 2-erase blocks, building more scalable data structures out of the 2-block logs. But there's still a risk that this is no scalable as the erase size gets too large. Inline files are files that are small enough we can fit them directly in their parent directory's 2-block log, instead of just storing a pointer to the file's data structure. This can lead to more data being stored in the 2-block log. It's possible limiting the size of inline files could increase the performance of garbage collection, but there's still the risk of long-running spikes as the 2-block log will still have to garbage collect at some point. More info here:
That's interesting, generally the opposite is true due to how time consuming erase operations are. What chip is this? Also what is your device's geometry (read size/prog size/cache size/erase size)? The spike from garbage collection gets worse as the erase size increases.
littlefs does use recursion, but all recursion should be bounded. I have not measured the maximum stack usage, though that would be useful. Also there as a recent change that may have negatively impacted read performance: If you move back to v2.0.2, do you see an improvement? |
Hey @geky, thanks for the reply. Some answers below. The chip is the SST25PF040C, with 4k erase blocks. I didn't mean the erase is especially fast, but when that happens, it's surrounded by hundreds of reads (caused by the lfs_dir_traverse, I assume), so in comparison, the one erase really does not really matter when the system stalls for half a minute. I'm seeing a lot of the same 1k blocks read all over again. (I had a read size of 64 previously, that just amounted in proportionally more reads). The read size on line 1266 of the log obviously seems a bit suspicious too, but didn't debug into that. Also, looks to me the garbage collection time grows (linear, based on a whopping 2 observations) with the number of files: I tested with about 100 (inlined) files, and the collection was in the order of several minutes. Is that expected?
So just to be clear: when the inline files fill the block, the garbage collection will slow down the system, ok. Is the tens to hundreds of seconds of downtime expected? I'll need to check the change you mentioned at some point. (Un)fortunately I'll be starting my vacation tomorrow, but I have colleagues who'll be interested in this and monitoring the thread, so any info at any time will be appreciated. |
Hi! I ran into the same problem. I have tried to create 100 files, each with 256 bytes of data. It works great for the most part, but sometimes there's a massive spike in flash access. For example when closing ... and goes something like this:
I get it that sometimes the FS has to do some garbage collection and other things, but it seems strange that it needs this many operations. In this case creating a 256B file has caused 24 MB of flash operations, which would take some significant time on a real device. |
Thanks both of you for posting benchmarks. After running some adhoc tests of my own, I'm getting a better understanding of the issue. I think something we will need before we can say this and other performance issues are truely fixed is a benchmarking framework. It will take more time but time but I think it will be worth it. One of the issues is that I was primarily testing on an MX25R over QSPI with a relatively high SPI frequency, so reads were very fast, making erase time dominate. While I had theoretical numbers for different storage types, I haven't put them into in sort of practical context. @ksstms, interesting graph, is the x access time? Also do you know how long it takes to iterate over the directory when full? This should be equivalent to one of the vertical spikes in your graph. To share more information, the culprit function responsible for compacting the metadata pairs (one of the two garbage collectors) is The two sweeps are to:
I was also considering a previous solution that did only one pass, however I changed it in a548ce6 (explanation in commit message). This could possibly cut the cost in half, but I don't think it fully solves the problem. The compaction is While putting together v2, this was chosen as a tradeoff to enable better erase usage and a lower minimum storage size without increasing RAM consumption. But it sounds like this tradeoff should be reconsidered. If you have RAM to throw at the problem, increasing It may be possible to introduce an artificial I'm also looking into a better long-term solution based on some sort of append-tree, but this may take quite a bit of time to implement. |
It's just the number of the transactions. Sorry for the uninformative graph, I just wanted to illustrate that there's an enormous number of read passes before anything happens. Currently we are using QSPI with DMA. I'm working on a solution to use DMA for big transactions only, so I can lower my
For now all I know is that it's 64 2K reads (my flash has 64 2K pages per erasable block). I'm going to measure things on the real system and get back with the results. |
I have encountered this problem recently. In order to shorten the time, I can only load the data of the super block into the RAM beforehand, and directly read the data of the RAM. With this solution, the time I spent was reduced from 2s to 200ms. |
I have created a little test with MBed-os 6.1.0. It creates a number of short files and prints some statistics with the ProfilingBlockDevice. There I see also the very large performance drop, in this case after 71 files. Source code can be found here: https://github.com/JojoS62/testLFSv2/blob/master/source/main.cpp The columns in the report are: in the v2 diagram, the write times are 2x higher, there the RTOS was enabled, for the later report I used the bare-metal setting. |
Would a good interim solution be the ability to disable or limit the number of inline files? I'm using one of those 128kB-block NAND flashes (MT29F) and seeing terrible read performance. Forcing LFSv2 to always think there is no room for an inlined file drastically increases performance:
LFSv2 has provided a huge boost in write performance over LFSv1, so I don't want to ditch it, but hacking around the inline file logic feels bad. |
@mon I'm not sure, since littlefs still tries to pack as much data in a metadata-pair as possible. If you limited inline files, instead of filling up with file data, the metadata-pairs would fill up with metadata (these are logs, so outdated changes are not cleaned up until the expensive compact step). They would hit the same performance issue, just slower. I haven't had time to experiment with this much, but for another short-term idea has anyone tried artificially increasing the The tradeoff would be that you would need RAM ( |
@geky excellent point on the metadata pairs becoming the limiter, it explains my immediate performance gains. What about some sort of "metadata limit"? Where there is a maximum of, say, 4k allowed for inline files and metadata pairs, and the remaining bytes in the block are only used in the case of files big enough to fill it. I think this would bound the metadata/inlines enough to make the compaction pass more like O(1) regardless of the actual block size. Obviously that comes with drawbacks to total storeable files on these large NANDs, but I see it as a similar tradeoff to the existing cache sizes. At first glance it seems like it should be quite simple to implement? I'm thinking modify the I can't increase prog_size easily since we're almost near our RAM limit already. |
Oh! That is a better idea. Forcing the metadata-pair to compact early should keep the amount of metadata manageable without a RAM cost. That might be the best short-term solution for this bottleneck. You will also need to change the check in I believe there are the only three places that would need changing:
|
Running on a device with very little flash available, decided to give the inline files of v2 a go. Not sure if I got everything configured ok, tried to follow the instructions.
Creating a couple of files holding an integer, increment the value in a loop, things basically seem to work. I see lots of reads towards my flash, I assume it's the inline files that cause the need to read a lot. I could leave with that, the data seems valid, the file contents keep changing as expected.
However, after some loops, starting from line 1246 and lasting till 1446 in the log, the reads just explode when updating a value in file 8, the update taking about 25 seconds. There's an ERASE happening in the middle, but that's not taking much time compared to all the reads happening.
Two minutes later (search for ERASE), this happens with file 6, then later for file 4, then 2 and then back to 8.
I was looking around a bit in debugger: when these huge read blocks appear, the recursion in lfs_dir_traverse (https://github.com/ARMmbed/littlefs/blob/master/lfs.c#L700) seems to kick in, although I never saw the recursion go deeper than level 1.
In the test snippet, mountLittleFS() does the mount and printing out of directory contents.
lfstest.txt
lfsv2-8files.txt
The text was updated successfully, but these errors were encountered: