add --ec2-master-instace-type option to support different master instance type #166

pragnesh · 2016-12-08T16:14:47Z

This PR makes the following changes:

added ec2-master-instance-type option if specified change master instance type to passed value, else original behaviour of flintrock will continue to remain same.
also update add-slaves behaviour to read instance type from configuration file, since it is assumed to same as master instance type.

I tested this PR by...

launching cluster of different instance type as well as tested add-slaves by adding slaves on ec2. It is working fine.

I have made this change since having spark cluster with different master instance type will save money and improve resource utilisation.

…ance type

nchammas · 2016-12-11T01:06:30Z

Hi @pragnesh and thank you for this PR. People have asked for this feature in the past, so I know there is some interest in it. 👍

When growing an existing cluster, Flintrock makes sure to read all possible configurations from the existing nodes so that newly added nodes are automatically configured the same way. This means that the slave instance type should automatically be selected based on the type of the existing slaves.

The only case where the user should be able to specify the slave instance type when adding nodes to an existing cluster is when that cluster has just a master and no slaves.

That said, I have to wonder out loud if it's better to just leave this feature out of Flintrock. The benefit is clear but marginal, and it adds a little bit of complexity. Since Flintrock is hard to test (the valuable tests -- i.e. the acceptance tests -- are slow and cost money), every little bit of added complexity will slowly make it harder to maintain the project over time. If it were practical to implement automated tests to cover all the possible use cases, I wouldn't be as concerned.

What are your thoughts on that?

I'm also curious if @serialx, @BenFradet, or anyone else watching the repo has any thoughts on the utility of this feature weighed against the long-term maintenance burden as the supported use cases inevitably grow over time.

pragnesh · 2016-12-11T04:53:53Z

I understand your concern. It is ok if you don't merge this feature.

serialx · 2016-12-11T06:27:27Z

Users with small clusters would be sensitive to master's instance type. Since the ratio of master instance cost would be larger when there are fewer slaves. Also for performance tests and experimentation -- which is Flintrock's main use case -- it would be tempting to use a smaller master instance type.

I also understand the long-term maintenance burden. Looking at the patch, wouldn't add_slaves ec2_instance_type parameter be avoidable if we save the ec2_instance_type variable in the cluster metadata?

nchammas · 2016-12-11T21:23:27Z

I also understand the long-term maintenance burden. Looking at the patch, wouldn't add_slaves ec2_instance_type parameter be avoidable if we save the ec2_instance_type variable in the cluster metadata?

Yes, that's true as long as we continue to disallow clusters from being created with 0 slaves. That would simplify the logic and supported use cases a bit. 👍

This comes up against another problem, though, which is that the manifest currently has no mechanism for persisting provider-specific settings like instance type. We'd need to add that in. It shouldn't be a big deal, but it may be beyond the scope of what @pragnesh was hoping to do in this PR.

I believe this was the same deficiency in our manifest implementation that @SoyeonBaek-dev hit up against. I forget what her use case was exactly, but she also wanted to be able to persist some EC2-specific things in the manifest. Is that correct @SoyeonBaek-dev? Do you remember what exactly it was?

dm-tran · 2016-12-12T15:17:33Z

Thanks for this PR @pragnesh 👍

This is a feature my team members and I are definitely interested in.
Our daily Spark jobs are launched automatically using the Spark REST API which needs --deploy-mode cluster. With deploy mode "cluster", the driver is deployed on one of the slaves and the master is kind of "useless" (it doesn't need an instance as large as slaves). This would allow us to reduce costs.

nchammas · 2016-12-17T17:40:09Z

Thanks for chiming in @dm-tran and @serialx. This kind of feedback from the community is valuable and gives me the confidence to steer Flintrock in the right direction.

OK, I think given the feedback here we want to include a feature like this. To do it in a way that's consistent with Flintrock's design, we'll need to refine the manifest concept a bit to allow for provider-specific settings like EC2 instance type.

@pragnesh - Would you like to take that on? No worries if not. I will likely work on that next, as soon as I fix this one bug I'm working on at the moment.

pragnesh · 2016-12-17T17:45:07Z

yes, i can try to work on this, let me go through manifest concept first

nchammas · 2016-12-17T17:52:31Z

OK great. It's a little half-baked at the moment, unfortunately, but I can give guidance as needed. The idea is basically to add a section to the manifest that is dedicated to EC2-specific settings and then persist/load the EC2 slave instance-type to/from the manifest.

pragnesh · 2017-01-24T05:26:42Z

As a i am using my change i found issue that having different instance type for master and slave create issue with hdfs cluster. hdfs failed to start if there is different type of disk configuration, i am not sure why.

pragnesh · 2017-01-24T11:15:48Z

best way to avoid complexity and to use master instance as worker is updating slave_hosts template parameter to include master_host in core.py

'slave_hosts': '\n'.join([self.master_host] + self.slave_hosts),

I have tested this change and it works without any issue.

nchammas · 2017-01-24T16:59:51Z

Thanks for reporting these additional details @pragnesh.

hdfs failed to start if there is different type of disk configuration, i am not sure why.

This is probably because on larger instances HDFS will use instance store storage, whereas on smaller instances there will be no instance store storage so HDFS will use the EBS root drive. There's probably something going wrong there, especially if you restart the cluster.

best way to avoid complexity and to use master instance as worker is updating slave_hosts template parameter to include master_host in core.py

Are you saying that this fixes the issue with HDFS when the master and slave instance types are very different?

pragnesh · 2017-01-25T02:06:27Z

Are you saying that this fixes the issue with HDFS when the master and slave instance types are very different?

No, I am saying another approach to use master instance instead of keeping it idle is just add master host into list of slave hosts, so spark and hdfs cluster will run on all slave + master instances, this is just one line of change i have mention in previous comment and work without issue, and is not adding too much complexity. No need to have have different master and slave instance type to save cost.

nchammas · 2017-01-25T02:41:40Z

Ah, I see. That makes sense.

@serialx / @dm-tran - What do you think?

Having the master be one of the slaves would take care of the problems related to mismatched HDFS storage caused by different instance sizes.

dm-tran · 2017-01-25T13:49:21Z

This solutions looks elegant and simple.

It's fine for Spark slaves. For HDFS, it means that the namenode and one datanode will be deployed on the master: I'm a bit worried that it might overload the master (usually, the namenode is deployed on the master and datanodes are deployed on slaves).

pragnesh · 2017-01-29T03:17:40Z

If we are worried about overloading master, we can create flag for this, activate only if is set to true.

serialx · 2017-03-29T08:01:24Z

Flag would be good. I think we should try experiment with having master to be one of the slaves. Let's add this flag as an experimental flag.

nchammas · 2017-05-23T02:15:03Z

Per the conversation above, it sounds like we want a new PR that implements a different solution where the master is flagged to simply be one of the slaves. This saves money -- which is the original impetus for this PR -- while not adding much complexity. Perhaps we can call the new option --colocate-master/--no-colocate-master?

@pragnesh - I'm closing this PR, but feel free to open a new one along the lines discussed here.

add --ec2-master-instace-type option to support different master inst…

d252699

…ance type

pragnesh force-pushed the master_instance_type branch from 3cbe01e to d252699 Compare December 9, 2016 04:13

nchammas closed this May 23, 2017

nchammas mentioned this pull request Nov 14, 2017

2. how to add/launch a hybrid cluster #215

Closed

This was referenced Mar 26, 2018

Separate Configurations for Head and Worker Nodes #199

Open

Support different instance type of slave #241

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add --ec2-master-instace-type option to support different master instance type #166

add --ec2-master-instace-type option to support different master instance type #166

pragnesh commented Dec 8, 2016

nchammas commented Dec 11, 2016

pragnesh commented Dec 11, 2016

serialx commented Dec 11, 2016

nchammas commented Dec 11, 2016

dm-tran commented Dec 12, 2016

nchammas commented Dec 17, 2016

pragnesh commented Dec 17, 2016

nchammas commented Dec 17, 2016

pragnesh commented Jan 24, 2017

pragnesh commented Jan 24, 2017

nchammas commented Jan 24, 2017

pragnesh commented Jan 25, 2017 •

edited

Loading

nchammas commented Jan 25, 2017

dm-tran commented Jan 25, 2017

pragnesh commented Jan 29, 2017

serialx commented Mar 29, 2017

nchammas commented May 23, 2017

add --ec2-master-instace-type option to support different master instance type #166

add --ec2-master-instace-type option to support different master instance type #166

Conversation

pragnesh commented Dec 8, 2016

nchammas commented Dec 11, 2016

pragnesh commented Dec 11, 2016

serialx commented Dec 11, 2016

nchammas commented Dec 11, 2016

dm-tran commented Dec 12, 2016

nchammas commented Dec 17, 2016

pragnesh commented Dec 17, 2016

nchammas commented Dec 17, 2016

pragnesh commented Jan 24, 2017

pragnesh commented Jan 24, 2017

nchammas commented Jan 24, 2017

pragnesh commented Jan 25, 2017 • edited Loading

nchammas commented Jan 25, 2017

dm-tran commented Jan 25, 2017

pragnesh commented Jan 29, 2017

serialx commented Mar 29, 2017

nchammas commented May 23, 2017

pragnesh commented Jan 25, 2017 •

edited

Loading