Join GitHub today
New Feature: Adds support for External Volumes #489
Adds support for External Volumes to be used for both the config and data volumes for elastic search nodes. This support works for both Docker and Mesos containerizers.
This functionality works by assigning an elasticsearch node id to each node that is created. That node id metadata is saved in the form of an environment variable that can be accessed later on. When a node fails (crash or health check fails) for this example lets say node 2, that task is released thereby releasing the node id back into the pool. When the new task is launched to replace the failed node, the node id 2 will be free and the new task will assume the id of the old node perhaps on a completely different mesos agent and it will reattach the volumes and continue where it left off. This should yield a much quicker rebuild of the elastic search node since it isn't rebuilding from scratch.
In the Docker container case (default), the use of external volumes has been solved generically and it will enable all Docker Volume Drivers. In TaskInfoFactory.java on line 134, we add the docker run parameter "volume-driver" which enables the external support. You will also noticed that when "volume-driver" is set, the hostpath no longer references a path on the host node but rather the external volume name.
In the Mesos container case, the use of external volumes has be implemented using mesos isolator modules like the mesos-module-dvdi module. Example: github.com/emccode/mesos-module-dvdi You can find the environment variable interface specification there.
Also it is worth noting that the original OfferStrategy class has been made into a parent class. The class OfferStrategyNormal extends OfferStrategy and provides the exact same strategy that the original OfferStrategy provided. The new class OfferStrategyExternalStorage is enabled when an external storage driver is specified and removes the storage/disk checks from the strategy because the volumes are externally managed now.
I should add that if someone else needs to verify this before merging, you can find an example of docker volume driver below. Please install both the rexray driver and the DVDCLI. Configuration instructions found on that page.
To test the Mesos containerizer, you need to install the mesos-module-dvdi located below. Configuration instructions are also found on the page below.
Special configuration considerations:
Tested using the supported docker method for the elastic search framework.
I have created a demo that I posted to YouTube. You can view it here:
If you have any question about the functionality, feedback for the code, need more clarification, please let me know. You can also find me on Slack at codecommunity.slack.com.
David, thank you for the fantastic video. It looks great! Everyone at Container Solutions is very excited!
I only have two minor problems with the code.
The first is that because you are not storing the volume id information within the TaskInfo or TaskStatus messages, they are not being persisted to ZooKeeper. So if the scheduler dies, it will not know what volumes it should be attached to what executors.
The second, related to the first, is the use of a bitmask to store the volume number information. I would change this to a simple hashmap, which is easier to understand. But this will depend on where we can store it in ZooKeeper.
I will work on these two issues when I come to merging the code, and I also need to add some tests.
Again, thank you for this excellent work. I'm aiming to include it in the 0.8.0 release.
The first point you brought up is actually intentional. Since the config and data are no longer stored on the slave nodes direct attached storage, the executors represent nothing more than compute engines and don't need to be tied back to their original volumes. So for example, if we get a slave node failure where an executor needs to be brought up on a different mesos slave node AND we scale elastic search executor nodes from 3 to 4, the 3rd or the 4th node can take volumes for #3. If the executor 4th takes the volumes, it now assumes the persona as if it were node 3.
For the second, I can definitely make the change to move to a map if you would like. It does place the upper limit on ES nodes to 32. The map will not place an arbitrary limit on the number of nodes you could have.
@dvonthenen Hi David. I'm getting close.
The node id code is only generating an id of 1 for me. So all three nodes are trying to connect to
But I don't quite know the logic. Say something catastrophic has happened. Only the ID 3 is running. (assume not a bitmask, just a counter). The first one will come in and generate a list of what is running. So just 3. What should it connect to? 1? 0? There needs to be some logic around which nodes are created. I guess the bitmask provides that logic. 000001 will be created first. But I think it will be easier for other people to understand if we just use a counter. Start at 0. There should be 3 nodes, so go through and start 0, 1 and 2.
Does that make sense?
in that case, the isolator will mount (and create if you havent already done so) the volumes to each executor node via using environment variables that it will autocreate. It should autocreate them for node 1 as follows:
when you flip it into non-docker mode… the code that creates this lives at ExecutorEnvironmentVariables.java starting with function populateEnvMapForMesos() line 71
The bitmask does contain that logic, but using a simple counter wont work. The problem is you need to recycle the IDs when nodes fail. If you only have 3 ES nodes, it would be wrong on a node failure to have the new node come up with ID 4 because there doesnt exist volumes elasticsearch4data and elasticsearch4config.If node ID2 failed, you would want the new executor instance to come up with ID2.
You can still use the list or map approach, you just need to walk the list starting with 1 (or 0 if 0-based indexing is used) and see if index 1 is available. I think we might be saying the same thing.