Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No Module in PySpark #37

Closed
shsmonteiro opened this issue Apr 28, 2017 · 4 comments
Closed

No Module in PySpark #37

shsmonteiro opened this issue Apr 28, 2017 · 4 comments

Comments

@shsmonteiro
Copy link

Is this Module compatible to PySpark. Every time I try to import it, it fails. It works ok on Scala.

@wnagele
Copy link
Contributor

wnagele commented May 2, 2017

What sort of error or log messages do you get when it fails?

@shsmonteiro
Copy link
Author

Sorry, I thought I had attached the error messages:

Using Python version 2.7.12 (default, Nov 19 2016 06:48:10)
SparkSession available as 'spark'.

import net.ripe.hadoop.pcap.io.PcapInputFormat
Traceback (most recent call last):
File "", line 1, in
ImportError: No module named net.ripe.hadoop.pcap.io.PcapInputFormat
import net.ripe.hadoop.pcap.io.CombinePcapInputFormat
Traceback (most recent call last):
File "", line 1, in
ImportError: No module named net.ripe.hadoop.pcap.io.CombinePcapInputFormat

from net.ripe.hadoop.pcap.io import CombinePcapInputFormat
Traceback (most recent call last):
File "", line 1, in
ImportError: No module named net.ripe.hadoop.pcap.io

I have started the pyspark with this options:

$SPARK_HOME/bin/pyspark --master spark://base:7077 --executor-memory 1000m --executor-cores 4 --conf "spark.locality.wait.node=0" --conf "spark.executor.extraJavaOptions=-XX:MaxDirectMemorySize=1000m" --conf "spark.default.parallelism=3" --driver-memory=600m --jars $SPARK_HOME/mysql-connector-java-5.1.41-bin.jar,$SPARK_HOME/hadoop-pcap-serde-1.1-jar-with-dependencies.jar,$SPARK_HOME/hadoop-pcap-lib-1.1.jar --verbose --num-executors 10 --driver-class-path=/usr/local/spark/*

using the same cmd line in spark-shell works good.

@wnagele
Copy link
Contributor

wnagele commented May 2, 2017

That won't work. hadoop-pcap is a Java library. So you will have to figure out how to use it with something like Py4J or similar. Haven't done that myself yet so cannot help on how.

@wnagele wnagele closed this as completed May 2, 2017
@shsmonteiro
Copy link
Author

I have tried using py4j. It imports the library.
The real issue is when I pass the class to the newAPIhadoopFile. It throws a classnotFound.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants