Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Untangle Engine Constructor logic #28245

Merged
merged 39 commits into from
Mar 14, 2018
Merged

Conversation

bleskes
Copy link
Contributor

@bleskes bleskes commented Jan 16, 2018

Currently we have a fairly complicated logic in the engine and translog's constructors logic to deal with all the various ways we want to mutate the lucene index and translog we're opening.

We can:

  1. Create an empty index
  2. Use the lucene but create a new translog
  3. Use both
  4. Force a new history uuid in all cases.

This leads complicated code flows which makes it harder and harder to make sure we cover all the corner cases. This PR tries to take another approach. Constructing an InternalEngine always opens things as they are and all needed modifications are done by static methods directly on the directory, one at a time.

The PR is still WIP (mostly around documentation) to get initial feedback about what people want think about it. If we like it I can break it up into a translog related PR and an a follow up for the engine.

@s1monw
Copy link
Contributor

s1monw commented Jan 17, 2018

the approach makes a lot of sense to me.

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me. I left some minor comments and a reminder.

config().getIndexSettings().getIndexVersionCreated().before(Version.V_6_0_0_rc1) :
"existing index was created after 6_0_0_rc1 but has no history uuid";
uuid = UUIDs.randomBase64UUID();
private String loadOrGenerateHistoryUUID(final IndexWriter writer) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method name is no longer correct as we now only load the history uuid.

}


public static void verifyHasHistoryUUID(final Directory dir) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I though this is an assertion but not. Should we give it a better name?

@@ -131,23 +133,19 @@
* translog file referenced by this generation. The translog creation will fail if this generation can't be opened.
*
* @param config the configuration of this translog
* @param expectedTranslogUUID the translog uuid to open, null for a new translog
* @param translogUUID the translog uuid to open, null for a new translog
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: reformat javadoc

@@ -181,22 +177,14 @@ public InternalEngine(EngineConfig engineConfig) {
assert translog.getGeneration() != null;
this.translog = translog;
final IndexCommit startingCommit = getStartingCommitPoint();
assert openMode != EngineConfig.OpenMode.OPEN_INDEX_AND_TRANSLOG || startingCommit != null :
"Starting commit should be non-null; mode [" + openMode + "]; startingCommit [" + startingCommit + "]";
assert startingCommit != null : "Starting commit should be non-null";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to remind you that we discussed whether to move startingCommitPoint to EngineDiskUtils.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to do this as a follow up.

@bleskes bleskes changed the title [POC] Untangle Engine & Translog Constructor logic Untangle Engine Constructor logic Mar 12, 2018
@bleskes bleskes requested review from dnhatn and s1monw and removed request for s1monw March 12, 2018 11:04
@bleskes
Copy link
Contributor Author

bleskes commented Mar 12, 2018

@dnhatn @s1monw I updated this PR and removed the POC label of it. Can you take another look? I think it's ready for full reviews. Thanks!

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a great change! LGTM

import java.util.List;
import java.util.Map;


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs javadocs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will add.


public final class EngineDiskUtils {

private EngineDiskUtils() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as an alternative you can mark it as final abstract then you don't need the private ctor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't care. I've seen requests to go both ways. I just want people to be happy. I'll go with abstract. I

final String translogUUID = store.readLastCommittedSegmentsInfo().getUserData().get(Translog.TRANSLOG_UUID_KEY);
final long globalCheckpoint = Translog.readGlobalCheckpoint(translogConfig.getTranslogPath(), translogUUID);
replicationTracker.updateGlobalCheckpointOnReplica(globalCheckpoint, "read from translog checkpoint");

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are these newlines for here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they seperate the flow into units that are related. Something like paragraphs. If you don't like them I'll remove.

…ning

# Conflicts:
#	test/framework/src/main/java/org/elasticsearch/index/engine/EngineTestCase.java
@bleskes bleskes merged commit bf65cb4 into elastic:master Mar 14, 2018
@bleskes bleskes deleted the engine_simple_opening branch March 14, 2018 20:00
@bleskes
Copy link
Contributor Author

bleskes commented Mar 14, 2018

Thx @s1monw & @dnhatn

@bleskes
Copy link
Contributor Author

bleskes commented Mar 15, 2018

While backporting this to 6.3 I found a flaw in the logic in EngineDiskUtils methods - even if the don't do anything they trigger lucenes deletion policy which messes up with our safe commits. I can disable deletion in these methods but this points a deeper flaw. Since it commits the writer, it also expects the commit it wrote to be used as a starting commit - otherwise it's pointless. For example, it it wants to add a history uuid, it will create a new commit, based on the last one, with the added uuid. We have no guarantee at that point that that commit is actually safe.

Instead of tweaking these methods, I rather push forward with moving the "remove all unsafe commits" logic to this class. Then I can make sure that the current method only use "safe" commits as a base line.

bleskes added a commit that referenced this pull request Mar 26, 2018
#28245 has introduced the utility class`EngineDiskUtils` with a set of methods to prepare/change
translog and lucene commit points. That util class bundled everything that's needed to create and
empty shard, bootstrap a shard from a lucene index that was just restored etc. 

In order to safely do these manipulations, the util methods acquired the IndexWriter's lock. That
would sometime fail due to concurrent shard store fetching or other short activities that require the
files not to be changed while they read from them. 

Since there is no way to wait on the index writer lock, the `Store` class has other locks to make
sure that once we try to acquire the IW lock, it will succeed. To side step this waiting problem, this
PR folds `EngineDiskUtils` into `Store`. Sadly this comes with a price - the store class doesn't and
shouldn't know about the translog. As such the logic is slightly less tight and callers have to do the
translog manipulations on their own.
dnhatn pushed a commit that referenced this pull request Mar 29, 2018
As follow up to #28245 , this PR removes the logic for selecting the 
right start commit from the Engine constructor in favor of explicitly
trimming them in the Store, before the engine is opened. This makes the
constructor in engine follow standard Lucene semantics and use the last
commit.

Relates #28245
Relates #29156
dnhatn pushed a commit that referenced this pull request Mar 31, 2018
Currently we have a fairly complicated logic in the engine constructor logic to deal with all the
various ways we want to mutate the lucene index and translog we're opening.

We can:
1) Create an empty index
2) Use the lucene but create a new translog
3) Use both
4) Force a new history uuid in all cases.

This leads complicated code flows which makes it harder and harder to make sure we cover all the
corner cases. This PR tries to take another approach. Constructing an InternalEngine always opens
things as they are and all needed modifications are done by static methods directly on the
directory, one at a time.
dnhatn pushed a commit that referenced this pull request Mar 31, 2018
#28245 has introduced the utility class`EngineDiskUtils` with a set of methods to prepare/change
translog and lucene commit points. That util class bundled everything that's needed to create and
empty shard, bootstrap a shard from a lucene index that was just restored etc. 

In order to safely do these manipulations, the util methods acquired the IndexWriter's lock. That
would sometime fail due to concurrent shard store fetching or other short activities that require the
files not to be changed while they read from them. 

Since there is no way to wait on the index writer lock, the `Store` class has other locks to make
sure that once we try to acquire the IW lock, it will succeed. To side step this waiting problem, this
PR folds `EngineDiskUtils` into `Store`. Sadly this comes with a price - the store class doesn't and
shouldn't know about the translog. As such the logic is slightly less tight and callers have to do the
translog manipulations on their own.
dnhatn pushed a commit that referenced this pull request Mar 31, 2018
As follow up to #28245 , this PR removes the logic for selecting the
right start commit from the Engine constructor in favor of explicitly
trimming them in the Store, before the engine is opened. This makes the
constructor in engine follow standard Lucene semantics and use the last
commit.

Relates #28245
Relates #29156
dnhatn added a commit that referenced this pull request Apr 1, 2018
This commit adds missing logic during backporting the untangle engine
constructor PR (#28245). These minor logic helps to handle an old indices
which do not have all 6.x commit tags such as max_unsafe_autoid_timestamp
or historyUUID or sequence numbers.
@jimczi jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement v6.3.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants