New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache fileLength for fully written files #9683
Conversation
We use fileLength extensively in our stats APIs etc. this can be a bottleneck if called very often due to monitoring etc. This commit adds simple caching of fully written files to reduce the load on the FS.
final long len = fileLength(source); | ||
super.renameFile(source, dest); | ||
fileLengthCache.put(dest, len); | ||
fileLengthCache.remove(source); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we call remove before put? (Was just trying to think about what would happen if source == dest)
@jpountz applied feedback |
+1 The only concern is some abuse case, where something outside of ES modifies the index directory. In this case fileLength() could be wrong. However, you cant trust this metadata to be really accurate, NFS caching seems to act the same way :) Also based on the usage: fileLength() today is/should only be used for estimates like this. We should still review its usages and make sure nobody is abusing it for other purposes (e.g. using fileLength() > 0 to check for existence) because thats much more risky now. |
I agree on shared FS this patch has problems and in-fact the estimated size is actually wrong... I will make sure I go back and use
agreed I added #9689 to address some of the issues. |
OK, this means e.g. retrieving disk usage will have to call listFiles() again, which i think is fine once we fix it (https://issues.apache.org/jira/browse/LUCENE-6241). That means we can fix master, but 1.5.0 is less obvious (We could listFiles() differently, so its not calling isDirectory() on each file internally... at least for our default directory implementation, as one option). Nothing in lucene cares about subdirectories except a copy-constructor in RAMDirectory, which elasticsearch does not use. But I am worried about some listAll()-using code in elasticsearch that would trip up on a .DS_Store or something. So we still have some things to fix before we see improvements, because we have to fix the listAll situation. |
@s1monw This patch definitely brings us back to 1.2 levels: The others, for comparison: |
the more I think about this patch, I think it might be a necessary evil. As mentioned on this thread, we need to still call listAll, but at least we can save on files that this store created and not call length as we know their length. I suggest only use the cache in estimateSize, and not override the I think we should also add some sort of stats caching similar to what we do here: https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/os/OsService.java#L60, this will help protecting against many concurrent calls not abusing listAll. |
this commit tries to reduce the filesystem calls to fetch metadata by using a simple cache on top of the stats call. Relates to elastic#9683
this commit tries to reduce the filesystem calls to fetch metadata by using a simple cache on top of the stats call. Relates to elastic#9683 Conflicts: src/main/java/org/elasticsearch/index/store/VerifyingIndexOutput.java src/main/java/org/elasticsearch/monitor/fs/FsService.java
closing this - we can reopen if needed |
this commit tries to reduce the filesystem calls to fetch metadata by using a simple cache on top of the stats call. Relates to elastic#9683 Closes elastic#9709
this commit tries to reduce the filesystem calls to fetch metadata by using a simple cache on top of the stats call. Relates to elastic#9683 Closes elastic#9709
We use fileLength extensively in our stats APIs etc. this can be a
bottleneck if called very often due to monitoring etc. This commit
adds simple caching of fully written files to reduce the load on the FS.