Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache fileLength for fully written files #9683

Closed
wants to merge 4 commits into from

Conversation

s1monw
Copy link
Contributor

@s1monw s1monw commented Feb 12, 2015

We use fileLength extensively in our stats APIs etc. this can be a
bottleneck if called very often due to monitoring etc. This commit
adds simple caching of fully written files to reduce the load on the FS.

We use fileLength extensively in our stats APIs etc. this can be a
bottleneck if called very often due to monitoring etc. This commit
adds simple caching of fully written files to reduce the load on the FS.
final long len = fileLength(source);
super.renameFile(source, dest);
fileLengthCache.put(dest, len);
fileLengthCache.remove(source);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call remove before put? (Was just trying to think about what would happen if source == dest)

@s1monw
Copy link
Contributor Author

s1monw commented Feb 13, 2015

@jpountz applied feedback

@rmuir
Copy link
Contributor

rmuir commented Feb 13, 2015

+1

The only concern is some abuse case, where something outside of ES modifies the index directory. In this case fileLength() could be wrong. However, you cant trust this metadata to be really accurate, NFS caching seems to act the same way :)

Also based on the usage: fileLength() today is/should only be used for estimates like this. We should still review its usages and make sure nobody is abusing it for other purposes (e.g. using fileLength() > 0 to check for existence) because thats much more risky now.

@s1monw
Copy link
Contributor Author

s1monw commented Feb 13, 2015

The only concern is some abuse case, where something outside of ES modifies the index directory. In this case fileLength() could be wrong. However, you cant trust this metadata to be really accurate, NFS caching seems to act the same way :)

I agree on shared FS this patch has problems and in-fact the estimated size is actually wrong... I will make sure I go back and use listFiles()

Also based on the usage: fileLength() today is/should only be used for estimates like this. We should still review its usages and make sure nobody is abusing it for other purposes (e.g. using fileLength() > 0 to check for existence) because thats much more risky now.

agreed I added #9689 to address some of the issues.

@rmuir
Copy link
Contributor

rmuir commented Feb 13, 2015

I agree on shared FS this patch has problems and in-fact the estimated size is actually wrong... I will make sure I go back and use listFiles()

OK, this means e.g. retrieving disk usage will have to call listFiles() again, which i think is fine once we fix it (https://issues.apache.org/jira/browse/LUCENE-6241).

That means we can fix master, but 1.5.0 is less obvious (We could listFiles() differently, so its not calling isDirectory() on each file internally... at least for our default directory implementation, as one option).

Nothing in lucene cares about subdirectories except a copy-constructor in RAMDirectory, which elasticsearch does not use. But I am worried about some listAll()-using code in elasticsearch that would trip up on a .DS_Store or something.

So we still have some things to fix before we see improvements, because we have to fix the listAll situation.

@drewr
Copy link
Contributor

drewr commented Feb 13, 2015

@s1monw This patch definitely brings us back to 1.2 levels:

image

The others, for comparison:

image

@kimchy
Copy link
Member

kimchy commented Feb 15, 2015

the more I think about this patch, I think it might be a necessary evil. As mentioned on this thread, we need to still call listAll, but at least we can save on files that this store created and not call length as we know their length.

I suggest only use the cache in estimateSize, and not override the #fileLength to make it safer Lucene wise?

I think we should also add some sort of stats caching similar to what we do here: https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/os/OsService.java#L60, this will help protecting against many concurrent calls not abusing listAll.

s1monw added a commit to s1monw/elasticsearch that referenced this pull request Feb 18, 2015
this commit tries to reduce the filesystem calls to fetch metadata
by using a simple cache on top of the stats call.

Relates to elastic#9683
s1monw added a commit to s1monw/elasticsearch that referenced this pull request Feb 18, 2015
this commit tries to reduce the filesystem calls to fetch metadata
by using a simple cache on top of the stats call.

Relates to elastic#9683

Conflicts:
	src/main/java/org/elasticsearch/index/store/VerifyingIndexOutput.java
	src/main/java/org/elasticsearch/monitor/fs/FsService.java
@s1monw
Copy link
Contributor Author

s1monw commented Feb 23, 2015

closing this - we can reopen if needed

@s1monw s1monw closed this Feb 23, 2015
bleskes pushed a commit to bleskes/elasticsearch that referenced this pull request Mar 3, 2015
this commit tries to reduce the filesystem calls to fetch metadata
by using a simple cache on top of the stats call.

Relates to elastic#9683

Closes elastic#9709
@clintongormley clintongormley added :Distributed/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. and removed review labels Mar 19, 2015
@clintongormley clintongormley changed the title [STORE] Cache fileLength for fully written files Cache fileLength for fully written files Jun 7, 2015
mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015
this commit tries to reduce the filesystem calls to fetch metadata
by using a simple cache on top of the stats call.

Relates to elastic#9683

Closes elastic#9709
@clintongormley clintongormley added :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Distributed/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement v1.5.0 v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants