Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect remnants of path.data/default.path.data bug #24099

Merged
merged 2 commits into from Apr 17, 2017

Conversation

@jasontedor
Copy link
Member

commented Apr 14, 2017

In Elasticsearch 5.3.0 a bug was introduced in the merging of default settings when the target setting existed as an array. When this bug concerns path.data and default.path.data, we ended up in a situation where the paths specified in both settings would be used to write index data. Since our packaging sets default.path.data, users that configure multiple data paths via an array and use the packaging are subject to having shards land in paths in default.path.data when that is very likely not what they intended.

This commit is an attempt to rectify this situation. If path.data and default.path.data are configured, we check for the presence of indices there. If we find any, we log messages explaining the situation and fail the node.

Closes #23981, supersedes #24052, relates #24074, relates #24093

Detect remnants of path.data/default.path.data bug
In Elasticsearch 5.3.0 a bug was introduced in the merging of default
settings when the target setting existed as an array. When this bug
concerns path.data and default.path.data, we ended up in a situation
where the paths specified in both settings would be used to write index
data. Since our packaging sets default.path.data, users that configure
multiple data paths via an array and use the packaging are subject to
having shards land in paths in default.path.data when that is very
likely not what they intended.

This commit is an attempt to rectify this situation. If path.data and
default.path.data are configured, we check for the presence of indices
there. If we find any, we log messages explaining the situation and fail
the node.
@s1monw
Copy link
Contributor

left a comment

I am not sure this change is OK and I would like to discuss different solutions for this down the road. I'd like to not add the permission and find the possible issue earlier in the game. I think we can only do that if we ignore the nodeLockID which I think we should. It is an edgecase we are trying to solve. if we'd not look at that and only look if there is data in default.path.data we would do as good as this solution IMO but with much less trouble. I think this is over engineering we are trying to do here but I am ok with compromising on dropping the sophisticated method here in 6.0 since our recommendation is to upgrade to a latest version first before you upgrade even if it's a full cluster restart. We can still have a simpler check in 6.0 then. @rjernst WDYT can you review this as well?

@@ -724,6 +735,14 @@ public String nodeId() {
return nodePaths;
}

public int nodeLockId() {

This comment has been minimized.

Copy link
@s1monw

s1monw Apr 14, 2017

Contributor

can we call this getNodeLockId() pls

This comment has been minimized.

Copy link
@jasontedor

jasontedor Apr 14, 2017

Author Member

I pushed b21702e.

* @param logger a logger where messages regarding the detection will be logged
* @throws IOException if an I/O exception occurs reading the directory structure
*/
static void checkForIndexDataInDefaultPathData(

This comment has been minimized.

Copy link
@s1monw

s1monw Apr 14, 2017

Contributor

you have 140 chars use them :)

This comment has been minimized.

Copy link
@jasontedor

jasontedor Apr 14, 2017

Author Member

I pushed b21702e.

final Settings settings,
final NodeEnvironment nodeEnv,
final Logger logger) throws IOException {
if (!Environment.PATH_DATA_SETTING.exists(settings) || !Environment.DEFAULT_PATH_DATA_SETTING.exists(settings)) return;

This comment has been minimized.

Copy link
@s1monw

s1monw Apr 14, 2017

Contributor

please wrap with {}

This comment has been minimized.

Copy link
@jasontedor

jasontedor Apr 14, 2017

Author Member

I pushed b21702e.

}
}

if (clean) return;

This comment has been minimized.

Copy link
@s1monw

s1monw Apr 14, 2017

Contributor

please wrap {}

This comment has been minimized.

Copy link
@jasontedor

jasontedor Apr 14, 2017

Author Member

I pushed b21702e.

boolean clean = true;
for (final String defaultPathData : Environment.DEFAULT_PATH_DATA_SETTING.get(settings)) {
final Path nodeDirectory = NodeEnvironment.resolveNodePath(getPath(defaultPathData), nodeEnv.nodeLockId());
if (!Files.exists(nodeDirectory)) continue;

This comment has been minimized.

Copy link
@s1monw

s1monw Apr 14, 2017

Contributor

please wrap in {} and use == false

This comment has been minimized.

Copy link
@jasontedor

jasontedor Apr 14, 2017

Author Member

I pushed b21702e.

if (!Files.exists(nodeDirectory)) continue;
final NodeEnvironment.NodePath nodePath = new NodeEnvironment.NodePath(nodeDirectory);
final Set<String> availableIndexFolders = nodeEnv.availableIndexFoldersForPath(nodePath);
if (availableIndexFolders.isEmpty()) continue;

This comment has been minimized.

Copy link
@s1monw

s1monw Apr 14, 2017

Contributor

please wrap in {}

This comment has been minimized.

Copy link
@jasontedor

jasontedor Apr 14, 2017

Author Member

I pushed b21702e.

@s1monw
s1monw approved these changes Apr 17, 2017
@s1monw

This comment has been minimized.

Copy link
Contributor

commented Apr 17, 2017

@jasontedor lets merge this and I will look into improving this in master. I think the solution is good enough and we should make progress over perfection

@jasontedor jasontedor merged commit 8033c57 into elastic:master Apr 17, 2017

2 checks passed

CLA Commit author is a member of Elasticsearch
Details
elasticsearch-ci Build finished.
Details
jasontedor added a commit that referenced this pull request Apr 17, 2017
Detect remnants of path.data/default.path.data bug
In Elasticsearch 5.3.0 a bug was introduced in the merging of default
settings when the target setting existed as an array. When this bug
concerns path.data and default.path.data, we ended up in a situation
where the paths specified in both settings would be used to write index
data. Since our packaging sets default.path.data, users that configure
multiple data paths via an array and use the packaging are subject to
having shards land in paths in default.path.data when that is very
likely not what they intended.

This commit is an attempt to rectify this situation. If path.data and
default.path.data are configured, we check for the presence of indices
there. If we find any, we log messages explaining the situation and fail
the node.

Relates #24099
jasontedor added a commit that referenced this pull request Apr 17, 2017
Detect remnants of path.data/default.path.data bug
In Elasticsearch 5.3.0 a bug was introduced in the merging of default
settings when the target setting existed as an array. When this bug
concerns path.data and default.path.data, we ended up in a situation
where the paths specified in both settings would be used to write index
data. Since our packaging sets default.path.data, users that configure
multiple data paths via an array and use the packaging are subject to
having shards land in paths in default.path.data when that is very
likely not what they intended.

This commit is an attempt to rectify this situation. If path.data and
default.path.data are configured, we check for the presence of indices
there. If we find any, we log messages explaining the situation and fail
the node.

Relates #24099
jasontedor added a commit that referenced this pull request Apr 17, 2017
Detect remnants of path.data/default.path.data bug
In Elasticsearch 5.3.0 a bug was introduced in the merging of default
settings when the target setting existed as an array. When this bug
concerns path.data and default.path.data, we ended up in a situation
where the paths specified in both settings would be used to write index
data. Since our packaging sets default.path.data, users that configure
multiple data paths via an array and use the packaging are subject to
having shards land in paths in default.path.data when that is very
likely not what they intended.

This commit is an attempt to rectify this situation. If path.data and
default.path.data are configured, we check for the presence of indices
there. If we find any, we log messages explaining the situation and fail
the node.

Relates #24099

@jasontedor jasontedor deleted the jasontedor:default-path-data-detection branch Apr 17, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.