Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Cache fileLength for fully written files #9683
The only concern is some abuse case, where something outside of ES modifies the index directory. In this case fileLength() could be wrong. However, you cant trust this metadata to be really accurate, NFS caching seems to act the same way :)
Also based on the usage: fileLength() today is/should only be used for estimates like this. We should still review its usages and make sure nobody is abusing it for other purposes (e.g. using fileLength() > 0 to check for existence) because thats much more risky now.
I agree on shared FS this patch has problems and in-fact the estimated size is actually wrong... I will make sure I go back and use
agreed I added #9689 to address some of the issues.
OK, this means e.g. retrieving disk usage will have to call listFiles() again, which i think is fine once we fix it (https://issues.apache.org/jira/browse/LUCENE-6241).
That means we can fix master, but 1.5.0 is less obvious (We could listFiles() differently, so its not calling isDirectory() on each file internally... at least for our default directory implementation, as one option).
Nothing in lucene cares about subdirectories except a copy-constructor in RAMDirectory, which elasticsearch does not use. But I am worried about some listAll()-using code in elasticsearch that would trip up on a .DS_Store or something.
So we still have some things to fix before we see improvements, because we have to fix the listAll situation.
the more I think about this patch, I think it might be a necessary evil. As mentioned on this thread, we need to still call listAll, but at least we can save on files that this store created and not call length as we know their length.
I suggest only use the cache in estimateSize, and not override the
I think we should also add some sort of stats caching similar to what we do here: https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/os/OsService.java#L60, this will help protecting against many concurrent calls not abusing listAll.