Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to add custom jars to jupyter notebook? #154

Closed
drdwitte opened this issue Mar 13, 2016 · 6 comments
Labels

Comments

@drdwitte
Copy link

@drdwitte drdwitte commented Mar 13, 2016

Hi, I would like to run a spark streaming application in the all-spark notebookconsuming from Kafka. This requires spark-submit with custom parameters (-jars and the kafka-consumer jar). I do not completely understand how I could do this from the jupyter notebook. Has any of you tried this? The alternative is to add it with --packages. Is this easier?

I just submitted the same question to stackoverflow if you'd like more details: http://stackoverflow.com/questions/35946868/adding-custom-jars-to-pyspark-in-jupyter-notebook/35971594#35971594

@parente

This comment has been minimized.

Copy link
Member

@parente parente commented Mar 14, 2016

The techniques about Using spark packages on the docker-stacks recipes page might work. Can you give that approach a shot?.

@drdwitte

This comment has been minimized.

Copy link
Author

@drdwitte drdwitte commented Mar 29, 2016

@parente Unfortunately that doesn't seem to work (apologies for the big delay, I had an in-between project).

I was having a look in $SPARK_HOME/bin since there PYSPARK_SUBMIT_ARGS can be set. Basically I have to run the notebook with some custom flags to ./bin/spark-submit but it's not entirely clear to me when this command gets executed, I assume it is executed the moment you start a new notebook? In that case adding jars or a mvn ref won't work.

I tried the following in my notebook:
os.environ['PYSPARK_SUBMIT_ARGS'] = '--master local[*] pyspark-shell --jars $SPARK_HOME/spark-streaming-kafka-assembly_2.10-1.6.1.jar'

and then create a context, but later on I get the error:
Spark Streaming's Kafka libraries not found in class path

Another try was to use the --packages flag:

os.environ['PYSPARK_SUBMIT_ARGS'] = '--master local[*] pyspark-shell --packages org.apache.spark:spark-streaming-kafka:1.6.0'

But also no succes.

Might it be the right way to modify the pyspark.cmd files in the $SPARK_HOME/bin directory?

@drdwitte

This comment has been minimized.

Copy link
Author

@drdwitte drdwitte commented Mar 29, 2016

@parente Seems to be working, maybe something interesting to add to the recipes...
I've been doing some elimination on the possible problems. The spark csv example you provided was actually working but that was present in the spark packages repository while the kafka consumer wasn't. This seemed to imply that I had to add the kafka consumer jar to the environment via the --jars flag.
As far as I can see I have something working right now: (note that the pyspark-shell is also very important!)

import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars /home/jovyan/spark-streaming-kafka-assembly_2.10-1.6.1.jar pyspark-shell'
import pyspark
from pyspark.streaming.kafka import KafkaUtils
from pyspark.streaming import StreamingContext
sc = pyspark.SparkContext()
ssc = StreamingContext(sc,1)
broker = "<my_broker_ip>"
directKafkaStream = KafkaUtils.createDirectStream(ssc, ["test1"], {"metadata.broker.list": broker})
directKafkaStream.pprint()
ssc.start()

And this seems to be working. Probably in my previous try the SPARK_HOME might not have been resolved?

@parente

This comment has been minimized.

Copy link
Member

@parente parente commented May 5, 2016

It would be great to get this on the recipes page if you did not hit any further problems with the approach you took above.

@drdwitte

This comment has been minimized.

Copy link
Author

@drdwitte drdwitte commented May 5, 2016

For now no issues, but since then I did not work further on this. I will resume my work on this probably in june, if I would encounter any new issues then you'll be the first to be informed!

@parente

This comment has been minimized.

Copy link
Member

@parente parente commented May 6, 2016

https://github.com/jupyter/docker-stacks/wiki/Docker-recipes#using-local-spark-jars has the recipe for posterity. Closing this one as resolved.

@parente parente closed this May 6, 2016
rochaporto pushed a commit to rochaporto/docker-stacks that referenced this issue Jan 23, 2019
Replace access_token with token_response.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.