No Module in PySpark #37

shsmonteiro · 2017-04-28T05:25:45Z

Is this Module compatible to PySpark. Every time I try to import it, it fails. It works ok on Scala.

wnagele · 2017-05-02T08:21:12Z

What sort of error or log messages do you get when it fails?

shsmonteiro · 2017-05-02T17:09:03Z

Sorry, I thought I had attached the error messages:

Using Python version 2.7.12 (default, Nov 19 2016 06:48:10)
SparkSession available as 'spark'.

import net.ripe.hadoop.pcap.io.PcapInputFormat
Traceback (most recent call last):
File "", line 1, in
ImportError: No module named net.ripe.hadoop.pcap.io.PcapInputFormat
import net.ripe.hadoop.pcap.io.CombinePcapInputFormat
Traceback (most recent call last):
File "", line 1, in
ImportError: No module named net.ripe.hadoop.pcap.io.CombinePcapInputFormat

from net.ripe.hadoop.pcap.io import CombinePcapInputFormat
Traceback (most recent call last):
File "", line 1, in
ImportError: No module named net.ripe.hadoop.pcap.io

I have started the pyspark with this options:

$SPARK_HOME/bin/pyspark --master spark://base:7077 --executor-memory 1000m --executor-cores 4 --conf "spark.locality.wait.node=0" --conf "spark.executor.extraJavaOptions=-XX:MaxDirectMemorySize=1000m" --conf "spark.default.parallelism=3" --driver-memory=600m --jars $SPARK_HOME/mysql-connector-java-5.1.41-bin.jar,$SPARK_HOME/hadoop-pcap-serde-1.1-jar-with-dependencies.jar,$SPARK_HOME/hadoop-pcap-lib-1.1.jar --verbose --num-executors 10 --driver-class-path=/usr/local/spark/*

using the same cmd line in spark-shell works good.

wnagele · 2017-05-02T20:22:43Z

That won't work. hadoop-pcap is a Java library. So you will have to figure out how to use it with something like Py4J or similar. Haven't done that myself yet so cannot help on how.

shsmonteiro · 2017-05-03T15:54:46Z

I have tried using py4j. It imports the library.
The real issue is when I pass the class to the newAPIhadoopFile. It throws a classnotFound.

wnagele closed this as completed May 2, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No Module in PySpark #37

No Module in PySpark #37

shsmonteiro commented Apr 28, 2017

wnagele commented May 2, 2017

shsmonteiro commented May 2, 2017

wnagele commented May 2, 2017

shsmonteiro commented May 3, 2017

No Module in PySpark #37

No Module in PySpark #37

Comments

shsmonteiro commented Apr 28, 2017

wnagele commented May 2, 2017

shsmonteiro commented May 2, 2017

wnagele commented May 2, 2017

shsmonteiro commented May 3, 2017