Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to current Lucene 5.0.0 snapshot #8588

Closed
wants to merge 4 commits into from

Conversation

Projects
None yet
5 participants
@mikemccand
Copy link
Contributor

commented Nov 21, 2014

A few recent lucene changes to fold in:

We've removed IndexWriter.unLock in Lucene, so I also removed "clear
lock on startup" in Elasticsearch. A couple test cases were missing
engine.close() before opening a new engine.

I exposed the new SerbianNormalizationFilterFactory.

I noticed that we silently use NoLockFactory if you have a typo in
your index.store.fs.fs_lock; I changed this to throw
StoreException instead. I think it's scary to be lenient
here?

The trickiest change was DistributorDirectory, and how it tracks the
file created by the lock factory. For the writeFiles boolean, I had
to remove the check that "lockFactory instanceof FSLockFactory"; maybe
we could use reflection to get this, if it's really necessary? (Do we
really support using NoLockFactory w/ FSDirectory?). Or maybe instead
we could check if the class name of the lock/factory is NoLock, but
MockDirWrapper wraps in AssertingLock...

I added some harmless "synchronized" to methods (all callers of these
private methods are already sync'd) just to make it clear on quick
look that all access/changes to nameDirMapping are in fact sync'd.

I also fixed ConcurrentMergeSchedulerProvider to disable Lucene's CMS
stalling (override w/ an empty maybeStall method). We can simplify
things here: remove EnableMergeScheduler, the separate MERGE thread
pool, let IndexWriter launch merges when they are needed (not
separately, once every second), etc., since Lucene won't stall
indexing threads anymore ... I'll open a separate issue for this.

@mikemccand mikemccand self-assigned this Nov 21, 2014

@mikemccand mikemccand added the review label Nov 21, 2014

@@ -82,10 +84,10 @@ public NodeEnvironment(Settings settings, Environment environment) throws IOExce
if (Files.exists(dir) == false) {
Files.createDirectories(dir);
}
logger.trace("obtaining node lock on {} ...", dir.toAbsolutePath());
Directory luceneDir = FSDirectory.open(dir);
try {

This comment has been minimized.

Copy link
@s1monw

s1monw Nov 23, 2014

Contributor

can we use try / with here?

@@ -348,6 +348,9 @@ public void testSegments() throws Exception {
}

public void testStartAndAcquireConcurrently() {
// Close engine from setUp (we create our own):
engine.close();

This comment has been minimized.

Copy link
@s1monw

s1monw Nov 23, 2014

Contributor

lol nice catch

@s1monw

This comment has been minimized.

Copy link
Contributor

commented Nov 23, 2014

left once comment - LGTM otherwise

@rmuir

This comment has been minimized.

Copy link
Contributor

commented Nov 23, 2014

looks good to me. Maybe @uschindler is curious about lockfactory changes.

NativeFSLockFactory lockFactory = new NativeFSLockFactory(dir);
Lock tmpLock = lockFactory.makeLock("node.lock");
logger.trace("obtaining node lock on {} ...", dir.toAbsolutePath());
Lock tmpLock = NativeFSLockFactory.INSTANCE.makeLock(luceneDir, "node.lock");

This comment has been minimized.

Copy link
@uschindler

uschindler Nov 23, 2014

Contributor

this should be: Lock tmpLock = luceneDir.makeLock("node.lock");
(using LockFactories directly is no longer wanted, they are no first class citizen anymore - the actual locking is responsibility of the Directory impl, BaseDirectory in Lucene just delegates to a lock factory, but thats an impl detail).
And if you want NativeFSLockFactory explicit, pass it to the FSDirectory ctor/factory.

private class DistributorLockFactoryWrapper extends LockFactory {
private final Directory dir;
private final LockFactory delegate;
private static class DistributorLockFactoryWrapper extends LockFactory {

This comment has been minimized.

Copy link
@uschindler

uschindler Nov 23, 2014

Contributor

I would not make this a LockFactory at all. This should just be a wrapper Directory and override makeLock() to wrap the lock returned by the delegate as DistributorLock. Wrapping LockFactories was completely removed in Lucene's test-framework. Check out, how MockDirWrapper handles this - you should just wrap directories and its locks inside makeLock(). Keep in mind, LockFactories should really have no state at all and be singletons.
This will in my opinion also make the reflection hacks obsolete, because you dont need to do instanceof checks. MockDir's LockFactory was as crazy as this, but now its super-simple and obsolete. MockDirWrapper just wraps the lock returned by the delegate directory, no type checking needed.

Ah now I know the problem: DistributorDirectory should extend FilterDirectory or just Directory, but not BaseDirectory! The makeLock in BaseDirectory is final, because this one uses LockFactory. If you extend the right class, you don't need a LockFactory.

mikemccand added some commits Nov 23, 2014

@mikemccand

This comment has been minimized.

Copy link
Contributor Author

commented Nov 23, 2014

Thank you for all the feedback, I folded it all in @uschindler

public synchronized String getLockID() {
return distributor.primary().getLockID();
/** Called by makeLock's wrapped lock, to record the lock file. */
synchronized void addNameDirMapping(String name, Directory dir) {

This comment has been minimized.

Copy link
@uschindler

uschindler Nov 23, 2014

Contributor

this method can be private now

final Directory primary = distributor.primary();
final Lock delegateLock = primary.makeLock(lockName);
if (DirectoryUtils.getLeaf(primary, FSDirectory.class) != null) {
// Wrap the delegate's lock just so we can monitor when it actually wrote a lock file. We assume that an FSDirectory writes its

This comment has been minimized.

Copy link
@uschindler

uschindler Nov 23, 2014

Contributor

Is it a problem if we just track a non-existing file? To me this looks like this is already broken for NativeFSLockFactory, because this one may reuse already created lock files (the existence of lock file does not mean its locked). So we can just record here "there may be a lock file to track".

This comment has been minimized.

Copy link
@s1monw

s1monw Nov 24, 2014

Contributor

I wanted this directory to be consistent with whatever was written through the delegates as well. If there was an existing lock file on the Filesystem then we "loaded" it on startup via #listAll()

This comment has been minimized.

Copy link
@uschindler

uschindler Nov 24, 2014

Contributor

OK, did not know this :-)

lockFactory = NoLockFactory.getNoLockFactory();
lockFactory = SimpleFSLockFactory.INSTANCE;
} else {
throw new StoreException(shardId, "unrecognized fs_lock \"" + fsLock + "\": must be native, simple or none");

This comment has been minimized.

Copy link
@uschindler

uschindler Nov 23, 2014

Contributor

If we no longer support NoLockFactory (see my comments above), remove it from the error message.
In my opinion, it is still fine to use NoLockFactory - just discouraged in production. Any comments?

This comment has been minimized.

Copy link
@s1monw

s1monw Nov 24, 2014

Contributor

I'd love to remove this to be honest but ideally without specialcasing #listAll to filter out files we don't know about like the write lock. It's just wrong that this doesn't get written through the top level directory itself... but I don't know a way around this?

This comment has been minimized.

Copy link
@uschindler

uschindler Nov 24, 2014

Contributor

One way to implement this:
Ignore the LockFactory of the inner directories and require the LockFactory to be set on DistributorDirectory itsself (by extending BaseDirectory and taking a external LockFactory on ctor). In that case the Distributor would know the exact type of lock, because it is responsible for locking. The underlying directories would just need no locking at all (NoLockFactory). But this would be a change out of the scope of this issue.

But I like the current impl better. Locking should be the responsibility of the directory that actually holds the lock. This implementation is now implemented the same way like FileSwitchDirectory. FileSwitchDirectory just has the additional "feature" that it delegates the makeLock() call by its file extension, too (and no longer places it in primary dir), see MIGRATE.txt in Lucene.

Please also keep in mind, that the lock file itsself is an implementation detail, so maybe we get another lockFactory in the future that does not create any files at all (e.g., by locking the directory itsself or writing some information to its metadata). So to me it looks wrong to see the lock file as required to be listed in listAll(). Lucene itsself does not depend on that (because lock files are unknown to lucene), Lucene just uses the makeLock() method, nothing more.

@uschindler

This comment has been minimized.

Copy link
Contributor

commented Nov 23, 2014

See my comments above. Otherwise its correct API-wise. It also took me a long time while refactoring LockFactory in Lucene that you don't need one in most cases. LockFactory is just an implementation detail of BaseDirectory which should only be used for actual "real" directory implementations. Everything that wraps or filters should use Directory (for complex delegation like FileSwitchDirectory) or FilterDirectory (for simple delagation).

@mikemccand

This comment has been minimized.

Copy link
Contributor Author

commented Nov 24, 2014

I pushed another commit w/ feedback from @uschindler (thanks Uwe!); I think it's ready.

@s1monw

This comment has been minimized.

Copy link
Contributor

commented Nov 24, 2014

LGTM

@uschindler

This comment has been minimized.

Copy link
Contributor

commented Nov 24, 2014

+1

mikemccand added a commit that referenced this pull request Dec 11, 2014

Core: don't forcefully release IndexWriter lock on engine start
It's dangerous to release this lock because it means another ES
somehow still has an IndexWriter open on the directory, which can
lead to corruption.

Instead, we don't release the lock, and the user will see an exception
on init.

This was already fixed on master with #8588; I'm backporting to 1.x
here.

Closes #8892

mikemccand added a commit that referenced this pull request Dec 11, 2014

Core: don't forcefully release IndexWriter lock on engine start
It's dangerous to release this lock because it means another ES
somehow still has an IndexWriter open on the directory, which can
lead to corruption.

Instead, we don't release the lock, and the user will see an exception
on init.

This was already fixed on master with #8588; I'm backporting to 1.x
here.

Closes #8892

mikemccand added a commit that referenced this pull request Dec 11, 2014

Core: don't forcefully release IndexWriter lock on engine start
It's dangerous to release this lock because it means another ES
somehow still has an IndexWriter open on the directory, which can
lead to corruption.

Instead, we don't release the lock, and the user will see an exception
on init.

This was already fixed on master with #8588; I'm backporting to 1.x
here.

Closes #8892

@clintongormley clintongormley removed the review label Mar 19, 2015

@clintongormley clintongormley changed the title Core: upgrade to current Lucene 5.0.0 snapshot Upgrade to current Lucene 5.0.0 snapshot Jun 8, 2015

mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015

Core: don't forcefully release IndexWriter lock on engine start
It's dangerous to release this lock because it means another ES
somehow still has an IndexWriter open on the directory, which can
lead to corruption.

Instead, we don't release the lock, and the user will see an exception
on init.

This was already fixed on master with elastic#8588; I'm backporting to 1.x
here.

Closes elastic#8892

mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015

Core: don't forcefully release IndexWriter lock on engine start
It's dangerous to release this lock because it means another ES
somehow still has an IndexWriter open on the directory, which can
lead to corruption.

Instead, we don't release the lock, and the user will see an exception
on init.

This was already fixed on master with elastic#8588; I'm backporting to 1.x
here.

Closes elastic#8892
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.