Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Look for Spark's make-distribution.sh script in its new location (plus its current one) #93

Closed
wants to merge 2 commits into from
Closed

Look for Spark's make-distribution.sh script in its new location (plus its current one) #93

wants to merge 2 commits into from

Conversation

BenFradet
Copy link
Contributor

fixes #91

@nchammas
Copy link
Owner

Looks good to me.

Have you tested this both against a recent Spark commit, like 99b7187c2dce4c73829b9b32de80b02a053763cc, as well as an older Spark commit from before the move of make-distribution.sh, like f19228eed89cf8e22a07a7ef7f37a5f6f8a3d455?

@BenFradet
Copy link
Contributor Author

I tested the script itself but not as a part of flintrock, should I?

@BenFradet
Copy link
Contributor Author

While trying to test my change, I'm getting:

paramiko.ssh_exception.SSHException: not a valid EC private key file

despite having a properly formatted .pem file.

Do you have any idea what could be causing this?

@nchammas
Copy link
Owner

Hmm, I've never seen that error before. It seems to be ultimately coming from EC2?

Are you able to use that same private key file to log into EC2 instances outside of Flintrock?

@BenFradet
Copy link
Contributor Author

Found my problem: the user was misconfigured.

I tested the change against today's commit: apache/spark@4eace4d
and the latest in the 1.6 branch: apache/spark@db4795a

The build is correctly started. However, the spark core project won't compile (it might be because I'm using t2.micro instances).

@nchammas
Copy link
Owner

Yeah, to build Spark in a reasonable amount of time you'd need at least
m3.xlarge instances.

Thanks for contributing this patch and testing it out! I'll merge this in.

@nchammas
Copy link
Owner

Hmm, actually I'm having trouble getting this to work against the latest commit of Spark. I get this error:

<snipped>
+ VERSION='[ERROR] Re-run Maven using the -X switch to enable full debug logging.'

Do you get the same error? This may be a subtle change on Spark's side that we have to handle.

@BenFradet
Copy link
Contributor Author

Trying on m3.xlarge I get the same error as you which I didn't get on t2.micro, weird.
I'll keep investigating and keep you posted.

@BenFradet
Copy link
Contributor Author

after cding into the dev dir before calling make-distribution.sh, I get the following when trying to compile:

[info] Error occurred during initialization of VM
[info] java.lang.Error: Properties init: Could not determine current working directory.
[info] at java.lang.System.initProperties(Native Method)
[info] at java.lang.System.initializeSystemClass(System.java:1166)

@BenFradet
Copy link
Contributor Author

Apparently the parallel build option (-T 1C) is causing it to fail.

The first maven instruction, which is: /tmp/spark/build/mvn help:evaluate -X -Dexpression=project.version -T 1C -Phadoop-2.6, fails with:

[ERROR] java.util.concurrent.ExecutionException: java.lang.NullPointerException
java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.NullPointerException
at org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder.multiThreadedProjectTaskSegmentBuild(MultiThreadedBuilder.java:170)
at org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder.build(MultiThreadedBuilder.java:91)
at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:307)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:193)
at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:106)
at org.apache.maven.cli.MavenCli.execute(MavenCli.java:863)
at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:288)
at org.apache.maven.cli.MavenCli.main(MavenCli.java:199)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
Caused by: java.util.concurrent.ExecutionException: java.lang.NullPointerException
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder.multiThreadedProjectTaskSegmentBuild(MultiThreadedBuilder.java:166)
... 16 more
Caused by: java.lang.NullPointerException
at org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call(MultiThreadedBuilder.java:185)
at org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call(MultiThreadedBuilder.java:181)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Also, there is a warning regarding parallel execution which might be causing the failure:

[WARNING] *****************************************************************
[WARNING] * Your build is requesting parallel execution, but project *
[WARNING] * contains the following plugin(s) that have goals not marked *
[WARNING] * as @threadsafe to support parallel building. *
[WARNING] * While this /may/ work fine, please look for plugin updates *
[WARNING] * and/or request plugins be made thread-safe. *
[WARNING] * If reporting an issue, report it against the plugin in *
[WARNING] * question, not against maven-core *
[WARNING] *****************************************************************
[WARNING] The following goals are not marked @threadsafe in Spark Project Parent POM:
[WARNING] org.apache.maven.plugins:maven-help-plugin:2.2:evaluate
[WARNING] *****************************************************************

Are you ok with removing it?

@BenFradet
Copy link
Contributor Author

Btw, I tested the script as a part of flintrock with the two previously mentioned commits and it worked in both cases (having removed -T 1C from the 2.0 script).

@nchammas
Copy link
Owner

I think something else is going on.

If I clone Spark locally and run

./dev/make-distribution.sh -T 1C -Phadoop-2.6

it works fine against the latest commit. This smells like something related to the shell environment over SSH.

Interestingly, it seems that the commit that moved make-distribution.sh (0eea12a3d956b54bbbd73d21b296868852a04494) is not responsible for the problem we are seeing, since I was just able to launch a cluster at that commit.

I think a good next step would be to try to find the exact Spark commit that breaks this. I'll poke around more myself later this week to try to find it.

Sorry this turned into more than a simple change @BenFradet!

I'd really like to keep the -T 1C working since people building Spark during cluster launches will really benefit from the shorter build times. It can be the difference between a 30 minute build and a 10 minute or even shorter build, depending on how many cores your cluster instances have.

@BenFradet
Copy link
Contributor Author

For me apache/spark@4eace4d fails to build both locally and remotely with ./dev/make-distribution.sh -T 1C -Phadoop-2.6

I'll investigate later commits.

@nchammas
Copy link
Owner

I found it. This is the commit that breaks -T 1C: apache/spark@6ca990f

Source PR: apache/spark#11178

@BenFradet
Copy link
Contributor Author

Mmh interesting

@nchammas
Copy link
Owner

Revisiting the error message you posted above @BenFradet, it looks like some project changes are interfering with the parallel build option, as you pointed out. 😞 That PR I linked to is probably just where this change was introduced.

So I now agree with your earlier suggestion: The simplest thing to do is to simply remove the -T 1C.

@BenFradet
Copy link
Contributor Author

ok, will do

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Look for Spark's make-distribution.sh script in its new location (plus its current one)
2 participants