New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a tool to map static cluster map information into Helix #523
Conversation
- Add a tool to bootstrap and, from time to time, upgrade cluster map information in Helix from the static clustermap files. - Adds tests to verify the tool's claims. - Adds certain functionalities in the test classes of the static cluster map to help test the tool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just quickly went through the scaffolding around the upgrade tool. Have to take a look at the tool and its test.
for (int i = 0; i < datacenterCount; i++) { | ||
names.add(i, "DC" + i); | ||
dataNodes.add(i, getDataNodes(curBasePort, sslPort, getDisks())); | ||
curBasePort += dataNodeCount; | ||
curBasePort += 5000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason to move to an absolute number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have refactored this a bit. simply trying to partition things so that when new nodes are added at a later time (which is the new functionality added in these classes), the ports do not collide.
@@ -81,4 +83,15 @@ public static Properties createConnectionPoolProperties() { | |||
props.put("connectionpool.connect.timeout.ms", "2000"); | |||
return props; | |||
} | |||
|
|||
public static void ensureOrExit(List<OptionSpec> requiredArgs, OptionSet actualArgs, OptionParser parser) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@@ -17,4 +17,5 @@ ext { | |||
commonsVersion = "1.5" | |||
bouncycastleVersion = "1.52" | |||
javaxVersion = "3.0.1" | |||
helixVersion = "0.6.7-SNAPSHOT" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can't be merged in this state right? Do we have to wait until the relevant version of helix is published?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - will figure this out (mentioned this in the description).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will have to figure something out for the general case. For now, Helix is planning to cut a release later this week, so we are good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Continuing to review ..
* The tool does the following: | ||
* 1. bootstraps a static cluster map, adding nodes and partitions to Helix. | ||
* 2. upgrades that involve new nodes and new partitions. To avoid over-complicating things, it assumes that the | ||
* existing partition assignment do not change during an upgrade. Newly added partitions can be distributed in any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this also mean it assumes no new replicas are added for existing partitions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. We can make the change if required. At this time, I do not see a requirement as this never happens in the current usage of Ambry.
private final TreeMap<Long, Long> existingPartitions = new TreeMap<>(); | ||
private final TreeSet<Long> existingResources = new TreeSet<>(); | ||
private final String localDc; | ||
private String clusterName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haven't seen the code yet but just leaving a comment here in case i forget.
Can this be final
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, changed.
private final String localDc; | ||
private String clusterName; | ||
|
||
static final int MAX_PARTITIONS_PER_RESOURCE = 100; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this represent and why is it necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This states the maximum number of Ambry partitions that will be grouped under a single Helix resource. See this section in the design.
listOpt.add(zkLayoutPathOpt); | ||
ToolUtils.ensureOrExit(listOpt, options, parser); | ||
bootstrapOrUpgrade(hardwareLayoutPath, partitionLayoutPath, zkLayoutPath, options.valueOf(localDcOpt)); | ||
} catch (Exception e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this catch isn't required. Since this is the main function, it will simply exit the program and print the stack trace of the exception (the stack trace is lost if you print it this way).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, didn't know main()
could throw - was simply following what other tools did.
try { | ||
partitionLayoutString = Utils.readStringFromFile(partitionLayoutPath); | ||
} catch (FileNotFoundException e) { | ||
System.out.println("Partition layout path not found. Creating new file"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we instead do
String partitionLayoutString = null;
if (new File(partitionLayoutPath).exists()) {
partitionLayoutString = Utils.readStringFromFile(partitionLayoutPath);
}
?
I would vote against using exceptions for control flow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, exceptions are implicitly used for control flow of the kind you have in mind, unless they are bail-out exceptions (which in this case it is not). Look at it this way - if lines 204 to 207 were encapsulated in a method, would you have had the same comment?
It so happens that File
provides an exists()
method, but if it hadn't, this exception will have to be used for control flow (JSON parser class for example, simply throws a JSONException
if the contents are not valid JSON - and that might not be a bail-out case for the caller).
That said, your suggested change is much better, have made the change.
JSONArray all = (root).getJSONArray("zkHosts"); | ||
for (int i = 0; i < all.length(); i++) { | ||
JSONObject entry = all.getJSONObject(i); | ||
dataCenterToZkAddress.put(entry.getString("datacenter"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you define static
constants for these strings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
for (int i = 0; i < all.length(); i++) { | ||
JSONObject entry = all.getJSONObject(i); | ||
dataCenterToZkAddress.put(entry.getString("datacenter"), | ||
entry.getString("hostname") + ":" + entry.getString("port")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you are going to concatenate, instead of hostname and port, why not let the user provide an endpoint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a more structured and extensible way of reading in fields in my opinion (say, if tomorrow the tool decides to make use of an API that takes host and port as arguments, or it makes a REST based connection).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went over HelixBootstrapUpgradeTool. Will review the rest in the 2nd pass once the feedback is addressed
JSONObject jsonObject = jsonArray.getJSONObject(i); | ||
currentPartitions.add((long) jsonObject.get("id")); | ||
} | ||
for (long i = 0; i < partitionCount; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: we can just start from currentPartitions.size() right ? Why do you want to start from 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thanks. Had started off with something more complex which ended up like this. Refactored.
jsonArray.put(getJsonPartition(i, partitionState, replicaCapacityInBytes, jsonReplicas)); | ||
} | ||
return jsonArray; | ||
} | ||
|
||
static JSONArray getUpdatedJsonPartitions(JSONArray jsonArray, long partitionCount, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ca you add javva docs for the methods added as part of this patch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
@@ -269,32 +271,52 @@ public static JSONObject getJsonPartition(long id, PartitionState partitionState | |||
} | |||
|
|||
public static JSONArray getJsonPartitions(long partitionCount, PartitionState partitionState, | |||
long replicaCapacityInBytes, int replicaCount, TestHardwareLayout testHardwareLayout) throws JSONException { | |||
long replicaCapacityInBytes, int replicaCountPerDc, TestHardwareLayout testHardwareLayout) throws JSONException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you resolve the access specifiers for all the methods in this class using intellij. Most of can be made package private. Intellij will show them as yellow coded and will give suggestion as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is not really core to this patch. Just added a method or so to help test the tool. If I go down the route of correcting these, this will become a huge patch.
Also, IntelliJ is basing these on what it knows about. You could argue that whoever authored a public method in TestUtils
wanted it to be exposed outside of the package so that an outside component can make use of it (say if this tool was not part of the clustermap
package) - even though currently no component outside needs it.
jsonArray.put(getJsonPartition(i, partitionState, replicaCapacityInBytes, jsonReplicas)); | ||
} | ||
return jsonArray; | ||
} | ||
|
||
static JSONArray getUpdatedJsonPartitions(JSONArray jsonArray, long partitionCount, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
* @param dataNodeCountPerDc number of datanodes to get from each datacenter. | ||
* @return a list of datanodes. | ||
*/ | ||
public List<DataNode> getIndependentDataNodes(int dataNodeCountPerDc) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
package private
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if this was fixed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, fixed now.
if (!admin.getClusters().contains(clusterName)) { | ||
admin.addCluster(clusterName); | ||
admin.addStateModelDef(clusterName, "LeaderStandby", | ||
BuiltInStateModelDefinitions.LeaderStandby.getStateModelDefinition()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you know what is the diff between this call and LeaderStandbySMD.build()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Effectively the same, though seems like LeaderStandbySMD.build()
is the recommended way of defining this, so will use your suggestion (the other one eventually makes use of a deprecated method).
InstanceConfig instanceConfig = dcAdmin.getInstanceConfig(clusterName, instanceName); | ||
List<String> currentSealedPartitions = instanceConfig.getRecord().getListField("SEALED"); | ||
if (isSealed && !currentSealedPartitions.contains(partitionName)) { | ||
List<String> newSealedPartitionsList = new ArrayList<>(currentSealedPartitions); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
liens 363 to 366 is repeated in more than 1 place. do you think we can make it a private method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reorganized to avoid duplication - thanks.
* tool groups together partitions under resources, with a limit to the number of partitions that will be grouped | ||
* under a single resource. | ||
*/ | ||
private void updateHelix() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be "updateClusterMapInHelix"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure.
String[] instances = updateInstancesAndGetInstanceNames(dcAdmin, partitionName, replicaList, sealed); | ||
resourceISBuilder.assignPreferenceList(partitionName, instances); | ||
} | ||
resourceISBuilder.setNumReplica(numReplicas); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
each helix resource contains N partitions. So, what does num of replica means for a resource?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numReplicas
is the number of replicas for every partitions in a resource (yes, all partitions of a resource has to have the same number of replicas).
private String clusterName; | ||
|
||
static final int MAX_PARTITIONS_PER_RESOURCE = 100; | ||
static final String CAPACITY_STR = "capacityInBytes"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these could be private (93 to 100)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are used by the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed feedback.
@@ -269,32 +271,52 @@ public static JSONObject getJsonPartition(long id, PartitionState partitionState | |||
} | |||
|
|||
public static JSONArray getJsonPartitions(long partitionCount, PartitionState partitionState, | |||
long replicaCapacityInBytes, int replicaCount, TestHardwareLayout testHardwareLayout) throws JSONException { | |||
long replicaCapacityInBytes, int replicaCountPerDc, TestHardwareLayout testHardwareLayout) throws JSONException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is not really core to this patch. Just added a method or so to help test the tool. If I go down the route of correcting these, this will become a huge patch.
Also, IntelliJ is basing these on what it knows about. You could argue that whoever authored a public method in TestUtils
wanted it to be exposed outside of the package so that an outside component can make use of it (say if this tool was not part of the clustermap
package) - even though currently no component outside needs it.
jsonArray.put(getJsonPartition(i, partitionState, replicaCapacityInBytes, jsonReplicas)); | ||
} | ||
return jsonArray; | ||
} | ||
|
||
static JSONArray getUpdatedJsonPartitions(JSONArray jsonArray, long partitionCount, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
jsonArray.put(getJsonPartition(i, partitionState, replicaCapacityInBytes, jsonReplicas)); | ||
} | ||
return jsonArray; | ||
} | ||
|
||
static JSONArray getUpdatedJsonPartitions(JSONArray jsonArray, long partitionCount, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
JSONObject jsonObject = jsonArray.getJSONObject(i); | ||
currentPartitions.add((long) jsonObject.get("id")); | ||
} | ||
for (long i = 0; i < partitionCount; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thanks. Had started off with something more complex which ended up like this. Refactored.
for (int i = 0; i < datacenterCount; i++) { | ||
names.add(i, "DC" + i); | ||
dataNodes.add(i, getDataNodes(curBasePort, sslPort, getDisks())); | ||
curBasePort += dataNodeCount; | ||
curBasePort += 5000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have refactored this a bit. simply trying to partition things so that when new nodes are added at a later time (which is the new functionality added in these classes), the ports do not collide.
String[] instances = updateInstancesAndGetInstanceNames(dcAdmin, partitionName, replicaList, sealed); | ||
resourceISBuilder.assignPreferenceList(partitionName, instances); | ||
} | ||
resourceISBuilder.setNumReplica(numReplicas); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numReplicas
is the number of replicas for every partitions in a resource (yes, all partitions of a resource has to have the same number of replicas).
/** | ||
* Updates instances that hosts replicas of this partition with the replica information (including the mount points | ||
* on which these replicas should reside, which will be purely an instance level information). | ||
* @param dcAdmin the admin to the Zk server on which this operatin is to be done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
for (int i = 0; i < replicaList.size(); i++) { | ||
Replica replica = (Replica) replicaList.get(i); | ||
DataNodeId node = replica.getDataNodeId(); | ||
String instanceName = node.getHostname() + "_" + node.getPort(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@@ -81,4 +83,15 @@ public static Properties createConnectionPoolProperties() { | |||
props.put("connectionpool.connect.timeout.ms", "2000"); | |||
return props; | |||
} | |||
|
|||
public static void ensureOrExit(List<OptionSpec> requiredArgs, OptionSet actualArgs, OptionParser parser) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@@ -17,4 +17,5 @@ ext { | |||
commonsVersion = "1.5" | |||
bouncycastleVersion = "1.52" | |||
javaxVersion = "3.0.1" | |||
helixVersion = "0.6.7-SNAPSHOT" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will have to figure something out for the general case. For now, Helix is planning to cut a release later this week, so we are good.
@vgkholla @nsivabalan please take a look when you get a chance. thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. As discussed, it might be useful to have a verifier at the end of the tool itself to build confidence that the state in helix is consistent with the state in the static file.
parseZkJsonAndPopulateZkInfo(zkLayoutPath); | ||
|
||
ClusterMapConfig clusterMapConfig = new ClusterMapConfig(new VerifiableProperties(new Properties())); | ||
staticClusterMap = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ternary operator made this hard reading for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed.
* ] | ||
* } | ||
* | ||
* This tool should be run from an admin node that has access to the nodes in the hardware layout. The access is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think we should word it as "this tool should be run (in every datacenter) from an admin node..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be run only once - does not have to be run from every datacenter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops. got you. I don't know why I made that comment.
private final Map<String, String> dataCenterToZkAddress = new HashMap<>(); | ||
private final Map<String, HelixAdmin> adminForDc = new HashMap<>(); | ||
// A mapping from partitions to Helix resource, populated at the start with what is present in Helix. | ||
private final TreeMap<Long, Long> existingPartitions = new TreeMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is bit confusing. Can we name it partitionsToResourceMap. Also why do we need to append "existing" to this one and the next data structure. We don't have any other variable which stores upcoming partitions or resources. In that case, would like to keep it simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed this to just a set of existing partitions, as the partition to resource mapping was unnecessary.
These are the partitions and resource that are already present in Helix when the tool starts up. That way I think "existing" is appropriate. These are not modified after initialization.
private final String localDc; | ||
private final String clusterName; | ||
|
||
static final int MAX_PARTITIONS_PER_RESOURCE = 100; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think this value should be made configurable ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made this an optional argument.
// Add a cluster entry in every DC | ||
if (!admin.getClusters().contains(clusterName)) { | ||
admin.addCluster(clusterName); | ||
admin.addStateModelDef(clusterName, "LeaderStandby", LeaderStandbySMD.build()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace the string "LeaderStandby" with LeaderStandbySMD.name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
DataNodeId node = replicaId.getDataNodeId(); | ||
String instanceName = getInstanceName(node); | ||
InstanceConfig instanceConfig = dcAdmin.getInstanceConfig(clusterName, instanceName); | ||
List<String> currentSealedPartitions = instanceConfig.getRecord().getListField("SEALED"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SEALED_STR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
HelixBootstrapUpgradeTool clusterMapToHelixMapper = | ||
new HelixBootstrapUpgradeTool(hardwareLayoutPath, partitionLayoutPath, zkLayoutPath, localDc); | ||
clusterMapToHelixMapper.updateClusterMapInHelix(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after upgrade, can we close all the HelixAdmins that are currently open
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done with prod code. Reviewing test code now
* @param basePort the starting standard port number for nodes generated | ||
* @param sslPort the starting SSL port number for nodes generated | ||
* @param hardwareState a {@link HardwareLayout} value for each node | ||
* @param disks a {@link JSONArray} of disks for each node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does every data node added has the same set of disks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is how TestUtils
is written. I don't think that matters for the purposes of testing this tool.
} | ||
|
||
void shutdown() { | ||
zkServer.shutdown(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't we need to clean up the directories ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they get deleted because a deleteOnExit()
call is made on the root directory after creation.
*/ | ||
private void verifyDataNodeAndDiskEquivalencyInDc(Datacenter dc, String clusterName, HardwareLayout hardwareLayout, | ||
PartitionLayout partitionLayout) { | ||
ClusterMapManager staticClusterMap = new ClusterMapManager(partitionLayout); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its ironic that hardwareLayout is never used to check for data node and disk equivalency ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It gets used indirectly. the partitionLayout
contains a reference to the associated hardwareLayout
and that is what gets used. I have removed the unused variable from the signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes I knew how it works :) just called it out
@@ -17,4 +17,5 @@ ext { | |||
commonsVersion = "1.5" | |||
bouncycastleVersion = "1.52" | |||
javaxVersion = "3.0.1" | |||
helixVersion = "0.6.7" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a stable released version ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the latest version as of this writing from the 0.6.x branch, which is the stable branch. This version itself has not yet been officially announced, but this has the changes we need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly minor comments only.
IdealState resourceIS = admin.getResourceIdealState(clusterName, resourceName); | ||
Set<String> resourcePartitions = resourceIS.getPartitionSet(); | ||
for (String resourcePartition : resourcePartitions) { | ||
Assert.assertNull( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why assertNull ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validating that an Ambry partition is not seen under more than one resource.
Map<String, Set<String>> allPartitionsToInstancesInHelix = new HashMap<>(); | ||
long numResources = 0; | ||
for (String resourceName : admin.getResourcesInCluster(clusterName)) { | ||
IdealState resourceIS = admin.getResourceIdealState(clusterName, resourceName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you verify stateModelName and numReplicas for each resource?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry couple of minor comments
@nsivabalan @vgkholla added validation within the tool itself and tested it with a "real-life" cluster map; addressed other comments. Looks like Helix 0.6.7 is not yet published, will follow up with the Helix team while you review the latest commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments. LGTM
private static String getInstanceName(DataNodeId dataNode) { | ||
return dataNode.getHostname() + "_" + dataNode.getPort(); | ||
} | ||
|
||
void validateAndClose() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc
"Replica information not consistent for instance " + instanceName + " disk " + disk.getMountPath() | ||
+ "\n in Helix: " + replicaList + "\n in static clustermap: " + replicasInClusterMap); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check for diskInfos.empty()
?
Incorporated comments, built and test in OS X and Linux. |
@nsivabalan could you merge this PR once you take a look at the latest changes? |
[update] Made a clean build and run for multiple times. Error was not reproduced. I saw
|
@xiahome this patch does not have any changes that should fail that test. Could you
|
Collections.shuffle(Arrays.asList(instances)); | ||
resourceISBuilder.assignPreferenceList(partitionName, instances); | ||
} | ||
resourceISBuilder.setNumReplica(numReplicas); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does resource have information about no of partitions it has ? Or just get the list of partitions and get the size of them ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is the latter.
.describedAs("local_dc") | ||
.ofType(String.class); | ||
|
||
ArgumentAcceptingOptionSpec<String> maxPartitionsInOneResourceOpt = parser.accepts("maxPartitionsInOneResource", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any suggestion as to how to set this no? During different upgrades, the total partitions added might be different right. Do you think adding a line to documentation is required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not have to be set. This is an override option, so the user would know.
* @param hardwareLayout the {@link HardwareLayout} of the static clustermap. | ||
* @param partitionLayout the {@link PartitionLayout} of the static clustermap. | ||
*/ | ||
private void verifyEquivalencyWithStatic(HardwareLayout hardwareLayout, PartitionLayout partitionLayout) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be rename to "verifyEquivalencyWithStaticClusterMap" or "verifyEquivalencyWithStaticCS"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed.
@@ -199,15 +210,18 @@ static void bootstrapOrUpgrade(String hardwareLayoutPath, String partitionLayout | |||
* @throws JSONException if there is an error parsing the JSON content in any of the files. | |||
*/ | |||
private HelixBootstrapUpgradeTool(String hardwareLayoutPath, String partitionLayoutPath, String zkLayoutPath, | |||
String localDc) throws IOException, JSONException { | |||
String localDc, int maxPartitionsInOneResource) throws IOException, JSONException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
java doc for this param
* @param clusterName the cluster to be verified. | ||
* @param partitionLayout the {@link PartitionLayout} of the static clustermap. | ||
*/ | ||
private void verifyDataNodeAndDiskEquivalencyInDc(Datacenter dc, String clusterName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume these verify methods are copied from the test class and hence I am not reviewing once again. If there are many changes, let me know
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, they are the same.
+ replicaHostsInHelix); | ||
} | ||
ensureOrThrow(allPartitionsToInstancesInHelix.isEmpty(), | ||
"More partitions in Helix than in clustermap, additional" + " partitions: " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove unwanted "+" (string concatenation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
@pnarayanan : very few minor comments. Once you are done, will merge it |
@nsivabalan incorporated comments. built and formatted. |
Looks good. Merging. Can you confirm that you built using java 1.7 too? |
Yes, I always build with 1.7. Builds in both Linux and OS X. |
cluster map information in Helix from the static clustermap files.
cluster map to help test the tool.
Part of #524
--
Notes:
HelixBootstrapUpgradeTool.java
andHelixBootstrapUpgradeToolTest.java
.