Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add --ec2-master-instace-type option to support different master instance type #166

Closed
wants to merge 1 commit into from

Conversation

pragnesh
Copy link
Contributor

@pragnesh pragnesh commented Dec 8, 2016

This PR makes the following changes:

  • added ec2-master-instance-type option if specified change master instance type to passed value, else original behaviour of flintrock will continue to remain same.
  • also update add-slaves behaviour to read instance type from configuration file, since it is assumed to same as master instance type.

I tested this PR by...

launching cluster of different instance type as well as tested add-slaves by adding slaves on ec2. It is working fine.

I have made this change since having spark cluster with different master instance type will save money and improve resource utilisation.

@nchammas
Copy link
Owner

Hi @pragnesh and thank you for this PR. People have asked for this feature in the past, so I know there is some interest in it. 👍

When growing an existing cluster, Flintrock makes sure to read all possible configurations from the existing nodes so that newly added nodes are automatically configured the same way. This means that the slave instance type should automatically be selected based on the type of the existing slaves.

The only case where the user should be able to specify the slave instance type when adding nodes to an existing cluster is when that cluster has just a master and no slaves.

That said, I have to wonder out loud if it's better to just leave this feature out of Flintrock. The benefit is clear but marginal, and it adds a little bit of complexity. Since Flintrock is hard to test (the valuable tests -- i.e. the acceptance tests -- are slow and cost money), every little bit of added complexity will slowly make it harder to maintain the project over time. If it were practical to implement automated tests to cover all the possible use cases, I wouldn't be as concerned.

What are your thoughts on that?

I'm also curious if @serialx, @BenFradet, or anyone else watching the repo has any thoughts on the utility of this feature weighed against the long-term maintenance burden as the supported use cases inevitably grow over time.

@pragnesh
Copy link
Contributor Author

I understand your concern. It is ok if you don't merge this feature.

@serialx
Copy link
Contributor

serialx commented Dec 11, 2016

Users with small clusters would be sensitive to master's instance type. Since the ratio of master instance cost would be larger when there are fewer slaves. Also for performance tests and experimentation -- which is Flintrock's main use case -- it would be tempting to use a smaller master instance type.

I also understand the long-term maintenance burden. Looking at the patch, wouldn't add_slaves ec2_instance_type parameter be avoidable if we save the ec2_instance_type variable in the cluster metadata?

@nchammas
Copy link
Owner

I also understand the long-term maintenance burden. Looking at the patch, wouldn't add_slaves ec2_instance_type parameter be avoidable if we save the ec2_instance_type variable in the cluster metadata?

Yes, that's true as long as we continue to disallow clusters from being created with 0 slaves. That would simplify the logic and supported use cases a bit. 👍

This comes up against another problem, though, which is that the manifest currently has no mechanism for persisting provider-specific settings like instance type. We'd need to add that in. It shouldn't be a big deal, but it may be beyond the scope of what @pragnesh was hoping to do in this PR.

I believe this was the same deficiency in our manifest implementation that @SoyeonBaek-dev hit up against. I forget what her use case was exactly, but she also wanted to be able to persist some EC2-specific things in the manifest. Is that correct @SoyeonBaek-dev? Do you remember what exactly it was?

@dm-tran
Copy link
Contributor

dm-tran commented Dec 12, 2016

Thanks for this PR @pragnesh 👍

This is a feature my team members and I are definitely interested in.
Our daily Spark jobs are launched automatically using the Spark REST API which needs --deploy-mode cluster. With deploy mode "cluster", the driver is deployed on one of the slaves and the master is kind of "useless" (it doesn't need an instance as large as slaves). This would allow us to reduce costs.

@nchammas
Copy link
Owner

Thanks for chiming in @dm-tran and @serialx. This kind of feedback from the community is valuable and gives me the confidence to steer Flintrock in the right direction.

OK, I think given the feedback here we want to include a feature like this. To do it in a way that's consistent with Flintrock's design, we'll need to refine the manifest concept a bit to allow for provider-specific settings like EC2 instance type.

@pragnesh - Would you like to take that on? No worries if not. I will likely work on that next, as soon as I fix this one bug I'm working on at the moment.

@pragnesh
Copy link
Contributor Author

yes, i can try to work on this, let me go through manifest concept first

@nchammas
Copy link
Owner

OK great. It's a little half-baked at the moment, unfortunately, but I can give guidance as needed. The idea is basically to add a section to the manifest that is dedicated to EC2-specific settings and then persist/load the EC2 slave instance-type to/from the manifest.

@pragnesh
Copy link
Contributor Author

As a i am using my change i found issue that having different instance type for master and slave create issue with hdfs cluster. hdfs failed to start if there is different type of disk configuration, i am not sure why.

@pragnesh
Copy link
Contributor Author

best way to avoid complexity and to use master instance as worker is updating slave_hosts template parameter to include master_host in core.py

'slave_hosts': '\n'.join([self.master_host] + self.slave_hosts),

I have tested this change and it works without any issue.

@nchammas
Copy link
Owner

Thanks for reporting these additional details @pragnesh.

hdfs failed to start if there is different type of disk configuration, i am not sure why.

This is probably because on larger instances HDFS will use instance store storage, whereas on smaller instances there will be no instance store storage so HDFS will use the EBS root drive. There's probably something going wrong there, especially if you restart the cluster.

best way to avoid complexity and to use master instance as worker is updating slave_hosts template parameter to include master_host in core.py

Are you saying that this fixes the issue with HDFS when the master and slave instance types are very different?

@pragnesh
Copy link
Contributor Author

pragnesh commented Jan 25, 2017

Are you saying that this fixes the issue with HDFS when the master and slave instance types are very different?

No, I am saying another approach to use master instance instead of keeping it idle is just add master host into list of slave hosts, so spark and hdfs cluster will run on all slave + master instances, this is just one line of change i have mention in previous comment and work without issue, and is not adding too much complexity. No need to have have different master and slave instance type to save cost.

@nchammas
Copy link
Owner

Ah, I see. That makes sense.

@serialx / @dm-tran - What do you think?

Having the master be one of the slaves would take care of the problems related to mismatched HDFS storage caused by different instance sizes.

@dm-tran
Copy link
Contributor

dm-tran commented Jan 25, 2017

This solutions looks elegant and simple.

It's fine for Spark slaves. For HDFS, it means that the namenode and one datanode will be deployed on the master: I'm a bit worried that it might overload the master (usually, the namenode is deployed on the master and datanodes are deployed on slaves).

@pragnesh
Copy link
Contributor Author

If we are worried about overloading master, we can create flag for this, activate only if is set to true.

@serialx
Copy link
Contributor

serialx commented Mar 29, 2017

Flag would be good. I think we should try experiment with having master to be one of the slaves. Let's add this flag as an experimental flag.

@nchammas
Copy link
Owner

Per the conversation above, it sounds like we want a new PR that implements a different solution where the master is flagged to simply be one of the slaves. This saves money -- which is the original impetus for this PR -- while not adding much complexity. Perhaps we can call the new option --colocate-master/--no-colocate-master?

@pragnesh - I'm closing this PR, but feel free to open a new one along the lines discussed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants