Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new option for an alternate mirror for spark binaries #104

Closed
wants to merge 3 commits into from

Conversation

rmessner
Copy link
Contributor

@rmessner rmessner commented Apr 6, 2016

Fixes #101.

@nchammas
Copy link
Owner

nchammas commented Apr 6, 2016

@rmessner thank you for tackling this. As you experienced, it is useful in times when, for whatever reason, the default Spark packages on S3 that Flintrock uses are corrupt.

We will probably want to use this pattern here to also solve #71. As such, I'd like the option name and URL template to be consistent with what's proposed there. Is that OK with you?

@@ -1,6 +1,7 @@
services:
spark:
version: 1.6.0
# preferred-mirror: # optional; default to 'https://s3.amazonaws.com/spark-related-packages/${file}'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As proposed in #71, I prefer {v} or even {version} instead of ${file}, since the latter's exact meaning to the user is not clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so to be quite clear, i will use these variables in my default value in the script :
spark_version
hadoop_distribution

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're talking about the config template, I think we just need a {version} variable. It seems unnecessary to say spark_... inside the Spark config.

As for the Hadoop distribution, we currently don't support specifying that, unless you are also intending to tackle #88.

@rmessner
Copy link
Contributor Author

rmessner commented Apr 6, 2016

Okay, i will use mirror instead of preferred_mirror to be consistent across flintrock

@nchammas
Copy link
Owner

nchammas commented Apr 6, 2016

#71 uses download-source, so I'd prefer that over mirror. I think it's clearer, though more verbose.

@rmessner
Copy link
Contributor Author

rmessner commented Apr 6, 2016

Okay, i'll use download-source instead of mirror.

For the variable names, it's okay for you as well ?

@nchammas
Copy link
Owner

nchammas commented Apr 6, 2016

For the variable names, it's okay for you as well ?

Not sure what you're referring to. I think the proposal laid out in #71 for Hadoop should give you a good template for what to name things.

If you're still not sure, it might be easier to just update the PR and I'll comment on the line items as necessary.

@@ -1,6 +1,8 @@
services:
spark:
version: 1.6.0
# distribution: # optional; default to '2.6'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nitpick: Two spaces before the #; "defaults" and not "default"

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, can we leave out the ability to specify distribution for now? I'm not sure about how best to name this option (e.g. there are non-Hadoop distributions like CDH, but we are assuming Hadoop) and, more importantly, I haven't fully considered the implications of supporting user-specified distributions.

@BenFradet
Copy link
Contributor

@rmessner are you still working on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants