Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unrecognized alias: '--profile=xxx', it will probably have no effect. #309

Closed
k-dahl opened this issue Aug 18, 2015 · 34 comments · Fixed by #310
Closed

Unrecognized alias: '--profile=xxx', it will probably have no effect. #309

k-dahl opened this issue Aug 18, 2015 · 34 comments · Fixed by #310

Comments

@k-dahl
Copy link

k-dahl commented Aug 18, 2015

The --profile option in jupyter appears to be ignored now when it's run with the notebook command. The usage for it still lists:

Examples

ipython notebook                       # start the notebook
ipython notebook --profile=sympy       # use the sympy profile
ipython notebook --certfile=mycert.pem # use SSL/TLS certificate
@minrk
Copy link
Member

minrk commented Aug 18, 2015

Sorry, missed that in the examples. Fixed by #310.

@k-dahl
Copy link
Author

k-dahl commented Aug 19, 2015

Out of curiosity, and to possibly clear up some confusion that I have seen on stackoverflow and such, how would one now specify startup initialization type options for Jupyter?

A specific scenario I am thinking of is with pySpark.

@Carreau
Copy link
Member

Carreau commented Aug 19, 2015

See the ML discussion:
https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!topic/jupyter/7q02jjksvFU

but you shouldn't need a profile for that. PySpark can just be a kernel, if you really want it to be.
a kernel is basically just a way of launching a process, so it can be different envs, or different location.
some national lab have name kernel depending on which physical machine the process will run for example.

@vherasme
Copy link

Hi. I am trying to create a profile for pyspark too. Could you please tell me how to proceed ? Thanks

@Carreau
Copy link
Member

Carreau commented Aug 25, 2015

There is no notion of profile in jupyter and for the notebook.

It's roughly like asking to dual boot a computer because you want to use vim and emacs,
and get 2 hard drive just to set your $EDITOR differently.

As stated in the mailing list thread, you can if you like, it would be something like

$ JUPTYER_CONFIG_DIR=~/jupyter_pyspark_foo jupyter notebook

If should auto create the needed files in ~/jupyter_pyspark_foo, but it is likely not what you want.

You most likely just want a separate kernel, or just import pySpark as a library. Still without knowing more of what you want to do it's hard to give you an answer...

@vherasme
Copy link

I would like to use pySpark in the ipython notebook. Either by calling it
as a library or by creating a profile/kernel/ etc.

On 25 August 2015 at 11:00, Matthias Bussonnier notifications@github.com
wrote:

There is no notion of profile in jupyter and for the notebook.

It's roughly like asking to dual boot a computer because you want to use
vim and emacs,
and get 2 hard drive just to set your $EDITOR differently.

As stated in the mailing list thread, you can if you like, it would be
something like

$ JUPTYER_CONFIG_DIR=~/jupyter_pyspark_foo jupyter notebook

If should auto create the needed files in ~/jupyter_pyspark_foo, but it
is likely not what you want.

You most likely just want a separate kernel, or just import pySpark as a
library. Still without knowing more of what you want to do it's hard to
give you an answer...


Reply to this email directly or view it on GitHub
#309 (comment).

@Carreau
Copy link
Member

Carreau commented Aug 25, 2015

Ok, here is what I just did durring the last 1/2 h, for me on OS X

  • Install apache-spark ($ brew install apache-spark)
  • install findspark ( pip install -e . after cloning https://github.com/minrk/findspark, and cd findspark)
  • install java (from here)
  • fire a notebook (jupyter notebook)

enter the following:

import findspark
import os
findspark.init()

import pyspark
sc = pyspark.SparkContext()
lines = sc.textFile(os.path.exapnduser('~/dev/ipython/setup.py'))
lines_nonempty = lines.filter( lambda x: len(x) > 0 )
lines_nonempty.count()

execute :

221

Yayyyyy !

@Carreau
Copy link
Member

Carreau commented Aug 25, 2015

(Note, installing/downloading java took 20 minutes)

@vherasme
Copy link

After running:
import findspark
import os
findspark.init()

import pyspark
sc = pyspark.SparkContext()

I get this error:

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-1-0e2dcc62fef1> in <module>()
      4 
      5 import pyspark
----> 6 sc = pyspark.SparkContext()

/Users/victor/Downloads/spark-1.4.1/python/pyspark/context.pyc in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
    108         """
    109         self._callsite = first_spark_call() or CallSite(None, None, None)
--> 110         SparkContext._ensure_initialized(self, gateway=gateway)
    111         try:
    112             self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,

/Users/victor/Downloads/spark-1.4.1/python/pyspark/context.pyc in _ensure_initialized(cls, instance, gateway)
    227         with SparkContext._lock:
    228             if not SparkContext._gateway:
--> 229                 SparkContext._gateway = gateway or launch_gateway()
    230                 SparkContext._jvm = SparkContext._gateway.jvm
    231 

/Users/victor/Downloads/spark-1.4.1/python/pyspark/java_gateway.pyc in launch_gateway()
     87                 callback_socket.close()
     88         if gateway_port is None:
---> 89             raise Exception("Java gateway process exited before sending the driver its port number")
     90 
     91         # In Windows, ensure the Java child processes do not linger after Python has exited.

Exception: Java gateway process exited before sending the driver its port number

@Carreau
Copy link
Member

Carreau commented Aug 26, 2015

You should get this error if you get the wrong java (the 60M download instead of the 200+M download)

@vherasme
Copy link

I actually got jdk-8u60-macosx-x64.dmg which is 238.1 MB. Maybe I should restart the machine

@Carreau
Copy link
Member

Carreau commented Aug 26, 2015

hum, I did not had to restart IIRC.

@Carreau
Copy link
Member

Carreau commented Aug 26, 2015

does the following works?

$ java -version
java version "1.8.0_60"
Java(TM) SE Runtime Environment (build 1.8.0_60-b27)
Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)

Python 2.7 or 3 ?

@vherasme
Copy link

It works now:

In [2]: sc
Out[2]: pyspark.context.SparkContext at 0x106296cd0

Thanks a lot for your help. I've spent a loooong time trying to fix this

@Carreau
Copy link
Member

Carreau commented Aug 26, 2015

🍰 🍸 🎉 !

Happy Sparking !

@ajschumacher
Copy link

@vherasme What did you do to make it work in the end? Thanks!

@vherasme
Copy link

I followed the steps @Carreau recommends above:

.....

  1. Install apache-spark ($ brew install apache-spark) In my case I had
    Spark installed already
  2. install findspark ( pip install . after cloning
    https://github.com/minrk/findspark, and cd findspark)
  3. install java (from here
    http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html)
    This must be java 1.8.0_60
  4. fire a notebook (jupyter notebook)

enter the following:

import findspark
findspark.init()
import pyspark
sc = pyspark.SparkContext()

I also had these two in .bash_profile:

export SPARK_HOME="/Users/victor/Downloads/spark-1.4.1"
export PYSPARK_SUBMIT_ARGS="--master local[2] pyspark-shell"

On 26 August 2015 at 20:16, Aaron Schumacher notifications@github.com
wrote:

@vherasme https://github.com/vherasme What did you do to make it work
in the end? Thanks!


Reply to this email directly or view it on GitHub
#309 (comment).

@eulerreich
Copy link

I hope this change of policy about profiles is mentioned (more explicitly?) in the docs. I tried to put up a server on an Amazon EC2 image but following the instructions on the ipython docs didn't work because with ipython==4.0 it no longer accepted the --profile option.

@Carreau
Copy link
Member

Carreau commented Aug 31, 2015

I hope this change of policy about profiles is mentioned (more explicitly?) in the docs. I tried to put up a server on an Amazon EC2 image but following the instructions on the ipython docs didn't work because with ipython==4.0 it no longer accepted the --profile option.

IPython 4.0 still have profile, you are just mistaking Notebook for IPython.

If you want different configuration for the notebook, you need to set the jupyter config dir environement variable, if you want profile for your kernel, you can set it in your kernelspec.

@eulerreich
Copy link

I tried both ipython notebook --profile=xxx and jupyter notebook --profile=xxx and both have the same error. (While the --help options for
both have the same erroneous suggestion that --profile still works)

I think a separate tutorial for setting up jupyter remote server would
help, since I'm sure currently people would just go look at the ipython doc
and be confused like I was. At least note in the ipython doc that this is
now different for jupyter.
On Aug 31, 2015 7:33 AM, "Matthias Bussonnier" notifications@github.com
wrote:

I hope this change of policy about profiles is mentioned (more
explicitly?) in the docs. I tried to put up a server on an Amazon EC2 image
but following the instructions on the ipython docs didn't work because with
ipython==4.0 it no longer accepted the --profile option.

IPython 4.0 still have profile, you are just mistaking Notebook for
IPython.

If you want different configuration for the notebook, you need to set the
jupyter config dir environement variable, if you want profile for your
kernel, you can set it in your kernelspec.


Reply to this email directly or view it on GitHub
#309 (comment).

@Carreau
Copy link
Member

Carreau commented Sep 1, 2015

How did you get help to give you hints about profile ?

And again, --profile does not work with the notebook application, only with ipython/ipython kernel.
If you want profile for your kernel you need to modify your kernelspec. Use jupyter kernelspec list --debug to see where your kernelspec are.

@eulerreich
Copy link

I did ipython notebook --help and jupyter notebook --help and both gave the same thing. The examples OP posted are listed at the end of the help output.

Right I get it now that --profile no longer works with the notebook, but I'm saying the doc should be made clearer so that the in the future, people switching from lower versions of ipython shouldn't have to look far to get an answer.

For example, if I google 'set up remote server jupyter' the first result is http://ipython.org/ipython-doc/1/interactive/public_server.html, and nowhere in there does it say that --profile no longer works for ipython/jupyter 4. Indeed, one of the instructions is

"You can then start the notebook and access it later by pointing your browser to https://your.host.com:9999 with ipython notebook --profile=nbserver."

Other top results are about jupyter hub, which requires python3. I don't think I saw a single mention that the --profile option no longer works for ipython/jupyter 4 among them.

Maybe you guys wrote a doc, but google is just being dumb for the moment. Nevertheless I never find it, and I searched for a long time before finding this issue posted here.

@Carreau
Copy link
Member

Carreau commented Sep 1, 2015

I did ipython notebook --help and jupyter notebook --help and both gave the same thing. The examples OP posted are listed at the end of the help output.

O_o do you have both IPython 4.x and notebook 4.x ?

Right I get it now that --profile no longer works with the notebook, but I'm saying the doc should be made clearer so that the in the future, people switching from lower versions of ipython shouldn't have to look far to get an answer.

Well it's hard to bias google. For whatever reason people are still looking referencing docs for 1.0 and google put it on top. We'll try to find a solution.

@eulerreich
Copy link

I had ipython 4 initially but that kept giving errors as I said, so I
installed jupyter, but that didn't solve anything.

On Tue, Sep 1, 2015 at 3:43 AM, Matthias Bussonnier <
notifications@github.com> wrote:

I did ipython notebook --help and jupyter notebook --help and both gave
the same thing. The examples OP posted are listed at the end of the help
output.

O_o do you have both IPython 4.x and notebook 4.x ?

Right I get it now that --profile no longer works with the notebook, but
I'm saying the doc should be made clearer so that the in the future, people
switching from lower versions of ipython shouldn't have to look far to get
an answer.

Well it's hard to bias google. For whatever reason people are still
looking referencing docs for 1.0 and google put it on top. We'll try to
find a solution.


Reply to this email directly or view it on GitHub
#309 (comment).

@Ablomis
Copy link

Ablomis commented Nov 7, 2015

Is there a way to avoid typing following code in each notebook:
import findspark
findspark.init()
import pyspark
sc = pyspark.SparkContext()

and just make sure whenever you launch notebook it already hooked up the spark?

It isn't too hard but it feels like a jury-rigging which i hate.

@minrk
Copy link
Member

minrk commented Nov 9, 2015

You can add it to a startup file, e.g. ~/.ipython/profile_default/startup/initspark.py

@wlsherica
Copy link

I got the same issue, and the steps from @vherasme didn't work.

Python 2.7.10
Spark 1.4.1
java version "1.8.0_65"
Java(TM) SE Runtime Environment (build 1.8.0_65-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode)

2015-11-21 12 09 58

@sdlin
Copy link

sdlin commented Dec 5, 2015

@wlsherica, I had that same issue. For me, this was being caused by a bad spark configuration.

Specifically, I had:

export PYSPARK_SUBMIT_ARGS="--master local[2]"

So I just removed that.

@Digital2Slave
Copy link

@Carreau amazing work. thanks so much.

Carreau commented on 25 Aug 2015
Ok, here is what I just did durring the last 1/2 h, for me on OS X

Install apache-spark ($ brew install apache-spark)
install findspark ( pip install -e . after cloning https://github.com/minrk/findspark, and cd findspark)
install java (from here)
fire a notebook (jupyter notebook)
enter the following:

import findspark
import os
findspark.init()

import pyspark
sc = pyspark.SparkContext()
lines = sc.textFile(os.path.exapnduser('~/dev/ipython/setup.py'))
lines_nonempty = lines.filter( lambda x: len(x) > 0 )
lines_nonempty.count()
execute :

221
Yayyyyy !

@M2shad0w
Copy link

@Carreau thanks your answer

@hanxue
Copy link

hanxue commented Jan 27, 2016

Thanks @Carreau for the step-by-step instructions! Stumbled upon this issue when following instructions for IPython 3.x

In case anyone want more detailed instruction and explanation, I have wrote http://flummox-engineering.blogspot.com/2016/01/how-to-configure-ipython4-for-apache-spark.html

@fabboe
Copy link

fabboe commented Feb 3, 2016

Using the findspark setup, are you able to use jars which are added via SparkConf spark.jars?

from pyspark import SparkConf, SparkContext
from pyspark.sql import HiveContext
conf = SparkConf().set("spark.jars","/usr/local/opt/spark-csv_2.10-1.3.0.jar")
sc = SparkContext(conf=conf)
sqlContext = HiveContext(sc)

gets loaded when SparkContext is started:

16/02/03 11:03:47 INFO SparkContext: Added JAR /usr/local/opt/spark-csv_2.10-1.3.0.jar at http://1.2.3.4:49318/jars/spark-csv_2.10-1.3.0.jar with timestamp 1454526227905

still:

df = sqlContext.read.format('com.databricks.spark.csv')\
.options(header='true', delimiter=',', inferschema=True)\
.load(csvpath)

Py4JJavaError: An error occurred while calling o247.load.
: java.lang.ClassNotFoundException: Failed to load class for data source: com.databricks.spark.csv.
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:67)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:87)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:104)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: com.databricks.spark.csv.DefaultSource
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:60)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:60)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:60)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:60)
at scala.util.Try.orElse(Try.scala:82)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scal

@versemonger
Copy link

using the shell in spark tutorial is also a good solution to the issue.

@marianobilli
Copy link

I am having the problem that I am running spark jobs in a hadoop cluster triggered by Jupyter notebook. The problem is that each cell of code consumes the number of configured executors but they are never released. so after a number of executed cell blocks all the resources of the cluster are blocked.

Has anyone had this problem?

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 4, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.