New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make a hybrid directory default using `mmapfs` / `niofs` #6636

Merged
merged 1 commit into from Jul 9, 2014

Conversation

Projects
None yet
7 participants
@s1monw
Contributor

s1monw commented Jun 27, 2014

mmapfs is really good for random access but can have sideeffects if
memory maps are large depending on the operating system etc. A hybrid
solution where only selected files are actually memory mapped but others
mostly consumed sequentially brings the best of both worlds and
minimizes the memory map impact.
This commit mmaps only the dvd and tim file for fast random access
on docvalues and term dictionaries.

@s1monw

This comment has been minimized.

Contributor

s1monw commented Jun 27, 2014

FYI - I think the name here can be improved... I just wanted to get the code out asap.

@areek areek assigned rjernst and unassigned rjernst Jun 29, 2014

@s1monw

This comment has been minimized.

Contributor

s1monw commented Jul 2, 2014

@rmuir @kimchy any comment or better ideas for the name?

@rjernst

This comment has been minimized.

Member

rjernst commented Jul 2, 2014

The name file_switch doesn't mean anything to me. What about something like minimal_mmap or mixed_mmap_nio?

@rmuir

This comment has been minimized.

Contributor

rmuir commented Jul 2, 2014

One idea is just default. Its whatever we think is a good default...

@kimchy

This comment has been minimized.

Member

kimchy commented Jul 2, 2014

I like default

@rjernst

This comment has been minimized.

Member

rjernst commented Jul 3, 2014

default is good

@dakrone

This comment has been minimized.

Member

dakrone commented Jul 3, 2014

I don't think default is a good name, what happens if we make an improvement in the future and this is no longer the default? It also makes it harder to talk about in conversation:

"what directory type are you using?"
"oh, I'm using the default"
"... what version of ES are you on?"
"I'm on Elasticsearch 0.90/1.0/1.3"
"oh, then the default is actually niofs/mmapfs/mixed"
"okay, then I'm using that"
"wait, which operating system are you using?"
"windows"
"oh, then the default is mmapfs"
"I'm using a 32-bit JDK"
"oh, then the default is simplefs, you should really stop using a 32-bit windows JVM..."

I like mixed_mmap_nio personally.

@rmuir

This comment has been minimized.

Contributor

rmuir commented Jul 3, 2014

@dakrone but that situation already exists today, no? Even if we name it mixed_mmap_nio, we'd have the exact same conversation.

@dakrone

This comment has been minimized.

Member

dakrone commented Jul 3, 2014

@rmuir yep, it totally does, I do think that naming it default would make it worse though. It means an extra "wait, did you set it to 'default', or do you have it unset and you're using the defaults?" question all the time.

@rmuir

This comment has been minimized.

Contributor

rmuir commented Jul 3, 2014

but if our default is always default then there is no confusion?

@dakrone

This comment has been minimized.

Member

dakrone commented Jul 3, 2014

Yep, there is none, so default is a fine name for this. If it changes in the future though, we're deferring the conversation to give this a name (if/when this changes in ES version 3.4.1 or so)

@dakrone

This comment has been minimized.

Member

dakrone commented Jul 4, 2014

Okay, after more discussion we agreed on default for now.

@dakrone dakrone removed the review label Jul 4, 2014

@s1monw

This comment has been minimized.

Contributor

s1monw commented Jul 9, 2014

@rmuir @rjernst @dakrone @kimchy I updated this commit - would love to get another review

@s1monw s1monw added the review label Jul 9, 2014

@dakrone

View changes

docs/reference/index-modules/store.asciidoc Outdated
`niofs` for the rest.
operating environment will be automatically chosen: `mmap` on
Windows 64bit, `simplefs` on Windows 32bit, and `default`
(hybrid `niofs` and `mmap`) for the rest.

This comment has been minimized.

@dakrone

dakrone Jul 9, 2014

Member

This should still be mmapfs (in both places) since that's what the name of the setting is, otherwise someone could set it to mmap and it wouldn't take effect.

@dakrone

View changes

docs/reference/index-modules/store.asciidoc Outdated
The `default` type stores the shard index on the file system depending on
the file type by mapping a file into memory (mmap) or using Java NIO. Currenlty
only the Lucene term dictionary and doc values files are memory mapped to reduce

This comment has been minimized.

@dakrone

dakrone Jul 9, 2014

Member

I think you should mention the file extensions here, .dvd and .tim

This comment has been minimized.

@s1monw

s1monw Jul 9, 2014

Contributor

IMO too low level - if they cahnge with lucene version we need to update etc.

This comment has been minimized.

@rmuir

rmuir Jul 9, 2014

Contributor

Currenlty -> Currently

@dakrone

View changes

src/main/java/org/apache/lucene/store/StoreUtils.java Outdated
@@ -39,6 +39,11 @@ public static String toString(Directory directory) {
SimpleFSDirectory simpleFSDirectory = (SimpleFSDirectory)directory;
return "simplefs(" + simpleFSDirectory.getDirectory() + ")";
}
if (directory instanceof FileSwitchDirectory) {
FileSwitchDirectory fileSwitchDirectory = (FileSwitchDirectory) directory;
return "fileswitch(" + toString(fileSwitchDirectory.getPrimaryDir()) + "," + toString(fileSwitchDirectory.getSecondaryDir()) + ")";

This comment has been minimized.

@dakrone

dakrone Jul 9, 2014

Member

Should this still be "fileswitch(...)" since it has been renamed to default? How about DefaultFileSwitchDirectory and returning the string "default(...)"?

This comment has been minimized.

@s1monw

s1monw Jul 9, 2014

Contributor

I will fix that too sry...

@dakrone

View changes

src/test/java/org/elasticsearch/indices/store/SimpleDistributorTests.java Outdated
@@ -39,7 +39,7 @@
*/
public class SimpleDistributorTests extends ElasticsearchIntegrationTest {
public final static String[] STORE_TYPES = {"fs", "simplefs", "niofs", "mmapfs"};
public final static String[] STORE_TYPES = {"fs", "simplefs", "niofs", "mmapfs", "fileswitch"};

This comment has been minimized.

@dakrone

dakrone Jul 9, 2014

Member

This should be default now.

This comment has been minimized.

@s1monw

s1monw Jul 9, 2014

Contributor

we need constants... I will add

@dakrone

View changes

src/test/java/org/elasticsearch/indices/store/SimpleDistributorTests.java Outdated
@@ -91,6 +91,18 @@ public void testDirectoryToString() throws IOException {
}
assertThat(storeString, endsWith(", type=MERGE, rate=20.0)])"));
createIndexWithStoreType("test", "fileswitch", "least_used");

This comment has been minimized.

@dakrone

dakrone Jul 9, 2014

Member

fileswitch -> default here

@dakrone

View changes

src/test/java/org/elasticsearch/indices/store/SimpleDistributorTests.java Outdated
storeString = getStoreDirectory("test", 0).toString();
logger.info(storeString);
dataPaths = dataPaths();
assertThat(storeString.toLowerCase(Locale.ROOT), startsWith("store(least_used[rate_limited(fileswitch(mmapfs(" + dataPaths[0].getAbsolutePath().toLowerCase(Locale.ROOT)));

This comment has been minimized.

@dakrone

dakrone Jul 9, 2014

Member

if you change the .toString() method to not use fileswitch this will need to be changed also.

@kimchy

This comment has been minimized.

Member

kimchy commented Jul 9, 2014

LGTM, some comments were already made :)

@@ -46,26 +92,32 @@ public IndexStoreModule(Settings settings) {
@Override
public Iterable<? extends Module> spawnModules() {
Class<? extends Module> indexStoreModule = NioFsIndexStoreModule.class;
// Same logic as FSDirectory#open ...

This comment has been minimized.

@rmuir

rmuir Jul 9, 2014

Contributor

Just FYI this is a bit out of date with respect to FSDirectory.open. Actually we default to mmap on all 64-bit systems now, as its faster on the macos X too.

@s1monw

This comment has been minimized.

Contributor

s1monw commented Jul 9, 2014

I pushed several new commits

@s1monw s1monw added the review label Jul 9, 2014

@s1monw

This comment has been minimized.

Contributor

s1monw commented Jul 9, 2014

I think it's ready

@rmuir

This comment has been minimized.

Contributor

rmuir commented Jul 9, 2014

looks good.

@kimchy

This comment has been minimized.

Member

kimchy commented Jul 9, 2014

+1, LGTM

[STORE] Make a hybrid directory default using `mmapfs` and `niofs`
`mmapfs` is really good for random access but can have sideeffects if
memory maps are large depending on the operating system etc. A hybrid
solution where only selected files are actually memory mapped but others
mostly consumed sequentially brings the best of both worlds and
minimizes the memory map impact.
This commit mmaps only the `dvd` and `tim` file for fast random access
on docvalues and term dictionaries.

Closes #6636

@s1monw s1monw merged commit d82a434 into elastic:master Jul 9, 2014

@s1monw s1monw removed the review label Jul 9, 2014

@s1monw s1monw deleted the s1monw:file_switch_dir branch Jul 9, 2014

@s1monw s1monw changed the title from [STORE] Make `file_switch` directory default using `mmapfs` / `niofs` to [STORE] Make a hybrid directory default using `mmapfs` / `niofs` Jul 9, 2014

s1monw added a commit that referenced this pull request Jul 9, 2014

[STORE] Make a hybrid directory default using `mmapfs` and `niofs`
`mmapfs` is really good for random access but can have sideeffects if
memory maps are large depending on the operating system etc. A hybrid
solution where only selected files are actually memory mapped but others
mostly consumed sequentially brings the best of both worlds and
minimizes the memory map impact.
This commit mmaps only the `dvd` and `tim` file for fast random access
on docvalues and term dictionaries.

Closes #6636
@uschindler

This comment has been minimized.

Contributor

uschindler commented Jul 16, 2014

There is already a similar issue in Lucene: https://issues.apache.org/jira/browse/LUCENE-1743

This one was not about random access (it did not exist at that time), but the idea is the same. A file switch by file name is more natural to me.

@uschindler

This comment has been minimized.

Contributor

uschindler commented Jul 16, 2014

@s1monw
ElasticSearch has settings "index.compound_format" and "index.compound_on_flush" (see http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules.html). If this is used (not many people do, but some do because they run out of file descriptors), this would make all files use NIO only, although they contain the term dictionary or docvalues.

Maybe add "cfs" to the list of extensions?

@rmuir

This comment has been minimized.

Contributor

rmuir commented Jul 16, 2014

We should not add .cfs in my opinion. if such cfs options are enabled, it means we are mmaping the whole index again, which we want to avoid (purpose of this issue). By default, the only thing using .cfs are tiny segments flushed from indexwriter. Because they are small performance is not really critical there.

@uschindler

This comment has been minimized.

Contributor

uschindler commented Jul 16, 2014

@s1monw @rmuir @kimchy One cool thing for the WeakRef haters: This reduces also load to GC, because we dont create so many weak refs when cloning MMapIndexinputs: Random access inputs dont really need to be cloned all the time and on every request (no state involved, as position is not needed). Stuff like posting lists are cloned all the time, but those are now not weak ref'ed because they are read using NIO.

@clintongormley clintongormley changed the title from [STORE] Make a hybrid directory default using `mmapfs` / `niofs` to Internal: Make a hybrid directory default using `mmapfs` / `niofs` Jul 16, 2014

mikemccand added a commit that referenced this pull request Feb 23, 2015

Core: don't listAll twice
In #6636 we switched to a default FileSwitchDirectory that made
.listAll run twice on the same underlying file system directory.

This fixes listAll to do a single directory listing again.

Closes #9666

mikemccand added a commit that referenced this pull request Feb 23, 2015

Core: don't listAll twice
In #6636 we switched to a default FileSwitchDirectory that made
.listAll run twice on the same underlying file system directory.

This fixes listAll to do a single directory listing again.

Closes #9666

mikemccand added a commit that referenced this pull request Feb 23, 2015

Core: don't listAll twice
In #6636 we switched to a default FileSwitchDirectory that made
.listAll run twice on the same underlying file system directory.

This fixes listAll to do a single directory listing again.

Closes #9666

@clintongormley clintongormley changed the title from Internal: Make a hybrid directory default using `mmapfs` / `niofs` to Make a hybrid directory default using `mmapfs` / `niofs` Jun 7, 2015

mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015

Core: don't listAll twice
In elastic#6636 we switched to a default FileSwitchDirectory that made
.listAll run twice on the same underlying file system directory.

This fixes listAll to do a single directory listing again.

Closes elastic#9666

@bleskes bleskes referenced this pull request Mar 8, 2016

Closed

Use `mmapfs` by default #16983

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment