Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compilation CH06 #27

Closed
christopher5106 opened this issue May 28, 2015 · 2 comments
Closed

compilation CH06 #27

christopher5106 opened this issue May 28, 2015 · 2 comments

Comments

@christopher5106
Copy link

Hi,

I'm trying chapter 6 and i have 2 questions:
First,
cd aas
mvn install
cd ch06-lsa
mvn package
cd ..
./spark/bin/spark-submit --class com.cloudera.datascience.lsa.RunLSA aas/ch06-lsa/target/ch06-lsa-1.0.0.jar

but I get an error :

15/05/28 18:07:33 INFO cluster.SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
Exception in thread "main" java.lang.NoClassDefFoundError: edu/umd/cloud9/collection/wikipedia/WikipediaPage
at com.cloudera.datascience.lsa.RunLSA$.preprocessing(RunLSA.scala:54)
at com.cloudera.datascience.lsa.RunLSA$.main(RunLSA.scala:33)
at com.cloudera.datascience.lsa.RunLSA.main(RunLSA.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: edu.umd.cloud9.collection.wikipedia.WikipediaPage
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

I'm launching from the master node of an EC2 Spark install (https://spark.apache.org/docs/latest/ec2-scripts.html).

Secondly, how do I launch the main function from RunLSA in the SparkShell ?

./spark/bin/spark-shell --jars aas/ch06-lsa/target/ch06-lsa-1.0.0.jar

I have been trying

import com.cloudera.datascience.lsa.RunLSA
RunLSA.main(Array("100","1000","0.1"))

but I get the error

15/05/28 18:14:21 WARN spark.SparkContext: Multiple running SparkContexts detected in the same JVM!
org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true. The currently running SparkContext was created at:
org.apache.spark.SparkContext.(SparkContext.scala:80)

Just looking for your best practice.

Thanks a lot.

@srowen
Copy link
Collaborator

srowen commented May 28, 2015

That jar only has the classes from ch06; you need that plus all of its dependencies. That is you need an assembly jar. Use ch06-lsa-1.0.0-jar-with-dependencies.jar

@srowen srowen closed this as completed May 28, 2015
@christopher5106
Copy link
Author

Great thanks everything works!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants