Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Factor out all common spark/hadoop properties #87

Open
gerashegalov opened this issue Aug 24, 2018 · 8 comments
Open

Factor out all common spark/hadoop properties #87

gerashegalov opened this issue Aug 24, 2018 · 8 comments

Comments

@gerashegalov
Copy link
Contributor

gerashegalov commented Aug 24, 2018

Problem
We currently largely overlapping spark.gradle files especially in terms of spark properties.

$ git ls-files | grep spark.gradle
gradle/spark.gradle
helloworld/gradle/spark.gradle
templates/simple/spark.gradle

Solution
Provide a way to have a single spark.gradle or at least a single spark-transmogrifai.conf file with common properties that is passed via --properties-file to Spark.

Alternatives

  • common properties file
  • refactored spark.gradle

Additional context
DRY

@py-ranoid
Copy link
Contributor

Hey @gerashegalov
I'm guessing this issue is still concern since helloworld/gradle/spark.gradle and templates/simple/spark.gradle are duplicates of gradle/spark.gradle but are still being tracked by git.

In order to keep a single spark.gradle file, can we simply replace the spark.gradle paths in build.gradle to reference spark.gradle as ../gradle/spark.gradle ?

PS. I'm fairly new to the project. Pardon me if I'm missing something. 😅

@gerashegalov
Copy link
Contributor Author

gerashegalov commented Mar 4, 2019

Hi @py-ranoid, thanks for looking into this issue. It makes sense, however if possible we should strive to use absolute paths built from project properties (to avoid dealing with relative path attacks with symlinks etc).

@py-ranoid
Copy link
Contributor

How about keeping keeping only spark.gradle in the repository but copying it to helloworld/gradle/ and templates/simple/ during installation ?

@py-ranoid
Copy link
Contributor

@tovbinm @gerashegalov Could you suggest a solution?

  1. Removing helloworld/gradle/spark.gradle and templates/simple/spark.gradle and referring to gradle/spark.gradle using relative paths
  2. Keeping only spark.gradle but copying it to helloworld/gradle/ and templates/simple/ during installation

@gerashegalov
Copy link
Contributor Author

since helloworld is a source-controlled directory rather than installed than 1 seems better, (and I think you should be able to construct an absolute path.

@py-ranoid
Copy link
Contributor

@gerashegalov In that case, can I replace
apply from: 'gradle/spark.gradle
with
apply from: "${rootProject.projectDir}/../gradle/spark.gradle"
in helloworld/build.gradle ?
Would this still be vulnerable to a relative path attack ?

Also, I noticed that the following are duplicates too.

  1. helloworld/gradle/scalastyle-config.xml and gradle/scalastyle-config.xml.
  2. helloworld/gradle/wrapper/* and gradle/wrapper/*

Would you suggest factoring these out as well ?

@py-ranoid
Copy link
Contributor

@gerashegalov @tovbinm Thoughts?

@gerashegalov
Copy link
Contributor Author

Hi @py-ranoid I suggest you try it out and don't hesitate to submit a PR. We can discuss it more concretely on the PR. It does not have to be perfect, just something to iterate on. The preference is to avoid '..'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants