New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only do a single listAll from FileSwitchDir #9666

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
5 participants
@mikemccand
Contributor

mikemccand commented Feb 12, 2015

With #6636 listAll now calls it twice (once for the mmap dir, once for niofs dir) ... I think we should fix this to be a single call?

@mikemccand mikemccand self-assigned this Feb 12, 2015

@mikemccand mikemccand added the v0.12.0 label Feb 12, 2015

@rmuir

This comment has been minimized.

Show comment
Hide comment
@rmuir

rmuir Feb 12, 2015

Contributor

Patch is good to me. I still think the thing to fix would be removing the Files.isDirectory() call on each entry. Alternatively, since we are specialized here for the FSDir case, and if we dont need to filter out subdirectories, we could just implement the obvious list (java 8 Files.list().toArray but a little more for java7).

But, if this file listing is a hotspot for some reason, these changes will only make it Nx faster. Instead code should not list files unnecessarily. I am extremely concerned if we overoptimize here, that those problems will never get fixed.

Contributor

rmuir commented Feb 12, 2015

Patch is good to me. I still think the thing to fix would be removing the Files.isDirectory() call on each entry. Alternatively, since we are specialized here for the FSDir case, and if we dont need to filter out subdirectories, we could just implement the obvious list (java 8 Files.list().toArray but a little more for java7).

But, if this file listing is a hotspot for some reason, these changes will only make it Nx faster. Instead code should not list files unnecessarily. I am extremely concerned if we overoptimize here, that those problems will never get fixed.

@rmuir

This comment has been minimized.

Show comment
Hide comment
@rmuir

rmuir Feb 12, 2015

Contributor

I ran a benchmark, on a static directory for simplicity. Test folder has 36,400 files, which is a bit on the high end, but its not unrealistic. I run one hundred iterations of each method.

The files.isDirectory() check is really the bad guy:
With my "fast drive" (ext4 on an SSD): its 2.6x faster (21ms per-call instead of 57ms per call)
With my "slow drive" (ntfs on a spinning disk): its over 60x faster (22ms per-call instead of 1.3s per call)

Still, as i said before, we are talking about something on the order of milliseconds and we need to ensure its not called unnecessarily unless its needed.

Contributor

rmuir commented Feb 12, 2015

I ran a benchmark, on a static directory for simplicity. Test folder has 36,400 files, which is a bit on the high end, but its not unrealistic. I run one hundred iterations of each method.

The files.isDirectory() check is really the bad guy:
With my "fast drive" (ext4 on an SSD): its 2.6x faster (21ms per-call instead of 57ms per call)
With my "slow drive" (ntfs on a spinning disk): its over 60x faster (22ms per-call instead of 1.3s per call)

Still, as i said before, we are talking about something on the order of milliseconds and we need to ensure its not called unnecessarily unless its needed.

gpstathis added a commit to gpstathis/elasticsearch-river-mongodb that referenced this pull request Feb 14, 2015

Clear unused flags to reduce load on cluster.
A regression was introduced in the ES stats API call somewhere between ES 1.2.x. and 1.4.x that makes stats calls a lot more expensive. See elastic/elasticsearch#9681, https://groups.google.com/d/topic/elasticsearch/bOyBxgI9cMA/discussion and elastic/elasticsearch#9666 which are related issues. A ticket describing the exact issue is yet to be filed but I've been working offline on this with @imotov. The river makes frequent calls to the stats API which compounds the issue. Requesting only the needed thread_pool flags will contribute to mitigating the load on the cluster while the ES team fixes the stats issue on their side.

@spinscale spinscale added v1.4.5 and removed v1.4.4 labels Feb 19, 2015

gpstathis added a commit to gpstathis/elasticsearch-river-mongodb that referenced this pull request Feb 19, 2015

Clear unused flags to reduce load on cluster.
A regression was introduced in the ES stats API call somewhere between ES 1.2.x. and 1.4.x that makes stats calls a lot more expensive. See elastic/elasticsearch#9681, https://groups.google.com/d/topic/elasticsearch/bOyBxgI9cMA/discussion and elastic/elasticsearch#9666 which are related issues. A ticket describing the exact issue is yet to be filed but I've been working offline on this with @imotov. The river makes frequent calls to the stats API which compounds the issue. Requesting only the needed thread_pool flags will contribute to mitigating the load on the cluster while the ES team fixes the stats issue on their side.
@s1monw

This comment has been minimized.

Show comment
Hide comment
@s1monw

s1monw Feb 23, 2015

Contributor

LGTM

Contributor

s1monw commented Feb 23, 2015

LGTM

mikemccand added a commit that referenced this pull request Feb 23, 2015

Core: don't listAll twice
In #6636 we switched to a default FileSwitchDirectory that made
.listAll run twice on the same underlying file system directory.

This fixes listAll to do a single directory listing again.

Closes #9666

mikemccand added a commit that referenced this pull request Feb 23, 2015

Core: don't listAll twice
In #6636 we switched to a default FileSwitchDirectory that made
.listAll run twice on the same underlying file system directory.

This fixes listAll to do a single directory listing again.

Closes #9666

@clintongormley clintongormley changed the title from Core: only do a single listAll from FileSwitchDir to Only do a single listAll from FileSwitchDir Jun 6, 2015

mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015

Core: don't listAll twice
In #6636 we switched to a default FileSwitchDirectory that made
.listAll run twice on the same underlying file system directory.

This fixes listAll to do a single directory listing again.

Closes #9666
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment