-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add --ec2-master-instace-type option to support different master instance type #166
Conversation
3cbe01e
to
d252699
Compare
Hi @pragnesh and thank you for this PR. People have asked for this feature in the past, so I know there is some interest in it. 👍 When growing an existing cluster, Flintrock makes sure to read all possible configurations from the existing nodes so that newly added nodes are automatically configured the same way. This means that the slave instance type should automatically be selected based on the type of the existing slaves. The only case where the user should be able to specify the slave instance type when adding nodes to an existing cluster is when that cluster has just a master and no slaves. That said, I have to wonder out loud if it's better to just leave this feature out of Flintrock. The benefit is clear but marginal, and it adds a little bit of complexity. Since Flintrock is hard to test (the valuable tests -- i.e. the acceptance tests -- are slow and cost money), every little bit of added complexity will slowly make it harder to maintain the project over time. If it were practical to implement automated tests to cover all the possible use cases, I wouldn't be as concerned. What are your thoughts on that? I'm also curious if @serialx, @BenFradet, or anyone else watching the repo has any thoughts on the utility of this feature weighed against the long-term maintenance burden as the supported use cases inevitably grow over time. |
I understand your concern. It is ok if you don't merge this feature. |
Users with small clusters would be sensitive to master's instance type. Since the ratio of master instance cost would be larger when there are fewer slaves. Also for performance tests and experimentation -- which is Flintrock's main use case -- it would be tempting to use a smaller master instance type. I also understand the long-term maintenance burden. Looking at the patch, wouldn't |
Yes, that's true as long as we continue to disallow clusters from being created with 0 slaves. That would simplify the logic and supported use cases a bit. 👍 This comes up against another problem, though, which is that the manifest currently has no mechanism for persisting provider-specific settings like instance type. We'd need to add that in. It shouldn't be a big deal, but it may be beyond the scope of what @pragnesh was hoping to do in this PR. I believe this was the same deficiency in our manifest implementation that @SoyeonBaek-dev hit up against. I forget what her use case was exactly, but she also wanted to be able to persist some EC2-specific things in the manifest. Is that correct @SoyeonBaek-dev? Do you remember what exactly it was? |
Thanks for this PR @pragnesh 👍 This is a feature my team members and I are definitely interested in. |
Thanks for chiming in @dm-tran and @serialx. This kind of feedback from the community is valuable and gives me the confidence to steer Flintrock in the right direction. OK, I think given the feedback here we want to include a feature like this. To do it in a way that's consistent with Flintrock's design, we'll need to refine the manifest concept a bit to allow for provider-specific settings like EC2 instance type. @pragnesh - Would you like to take that on? No worries if not. I will likely work on that next, as soon as I fix this one bug I'm working on at the moment. |
yes, i can try to work on this, let me go through manifest concept first |
OK great. It's a little half-baked at the moment, unfortunately, but I can give guidance as needed. The idea is basically to add a section to the manifest that is dedicated to EC2-specific settings and then persist/load the EC2 slave instance-type to/from the manifest. |
As a i am using my change i found issue that having different instance type for master and slave create issue with hdfs cluster. hdfs failed to start if there is different type of disk configuration, i am not sure why. |
best way to avoid complexity and to use master instance as worker is updating slave_hosts template parameter to include master_host in core.py 'slave_hosts': '\n'.join([self.master_host] + self.slave_hosts), I have tested this change and it works without any issue. |
Thanks for reporting these additional details @pragnesh.
This is probably because on larger instances HDFS will use instance store storage, whereas on smaller instances there will be no instance store storage so HDFS will use the EBS root drive. There's probably something going wrong there, especially if you restart the cluster.
Are you saying that this fixes the issue with HDFS when the master and slave instance types are very different? |
No, I am saying another approach to use master instance instead of keeping it idle is just add master host into list of slave hosts, so spark and hdfs cluster will run on all slave + master instances, this is just one line of change i have mention in previous comment and work without issue, and is not adding too much complexity. No need to have have different master and slave instance type to save cost. |
This solutions looks elegant and simple. It's fine for Spark slaves. For HDFS, it means that the namenode and one datanode will be deployed on the master: I'm a bit worried that it might overload the master (usually, the namenode is deployed on the master and datanodes are deployed on slaves). |
If we are worried about overloading master, we can create flag for this, activate only if is set to true. |
Flag would be good. I think we should try experiment with having master to be one of the slaves. Let's add this flag as an experimental flag. |
Per the conversation above, it sounds like we want a new PR that implements a different solution where the master is flagged to simply be one of the slaves. This saves money -- which is the original impetus for this PR -- while not adding much complexity. Perhaps we can call the new option @pragnesh - I'm closing this PR, but feel free to open a new one along the lines discussed here. |
This PR makes the following changes:
I tested this PR by...
launching cluster of different instance type as well as tested add-slaves by adding slaves on ec2. It is working fine.
I have made this change since having spark cluster with different master instance type will save money and improve resource utilisation.