-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raydp.init_spark fails #89
Comments
Hi @yanivg10. In RayDP, we running spark on ray which means we do not need another resource manager (eg: standalone, yarn). # first step, connect or init local ray cluster
ray.init(...)
# second step, startup spark cluster on top of the ray cluster
spark = raydp.init_spark(...)
# third step, using spark as normal
spark....
# stop spark
raydp.stop_spark(...) And also, you should keep your |
I already tried these steps using the spark on ray setup and could not get init_spark to work. ray.init() works fine. Exception Traceback (most recent call last) /opt/anaconda3/envs/raydp/lib/python3.6/site-packages/raydp/context.py in init_spark(app_name, num_executors, executor_cores, executor_memory, configs) /opt/anaconda3/envs/raydp/lib/python3.6/site-packages/raydp/context.py in get_or_create_session(self) /opt/anaconda3/envs/raydp/lib/python3.6/site-packages/raydp/context.py in _get_or_create_spark_cluster(self) /opt/anaconda3/envs/raydp/lib/python3.6/site-packages/raydp/spark/ray_cluster.py in init(self) /opt/anaconda3/envs/raydp/lib/python3.6/site-packages/raydp/spark/ray_cluster.py in _set_up_master(self, resources, kwargs) /opt/anaconda3/envs/raydp/lib/python3.6/site-packages/raydp/spark/ray_cluster_master.py in start_up(self, popen_kwargs) /opt/anaconda3/envs/raydp/lib/python3.6/site-packages/raydp/spark/ray_cluster_master.py in _launch_gateway(self, class_path, popen_kwargs) Exception: Java gateway process exited before sending its port number |
@yanivg10 could you check whether there are any other pyspark programming? And check the port 25333 whether is occupied (with |
This port is not occupied and there are no pyspark programs running |
This seems like you have an incompatible spark with raydp. raydp requires spark 3.0.0 or 3.0.1. |
I am using Pyspark 3.0.1
|
Could you provide more logs or messages? I tested locally with ubuntu and works fine.
|
I tried your example and here is what I get:
Usage: spark-submit [options] <app jar | python file | R file> [app arguments] Options: --conf, -c PROP=VALUE Arbitrary Spark configuration property. --driver-memory MEM Memory for driver (e.g. 1000M, 2G) (Default: 1024M). --executor-memory MEM Memory per executor (e.g. 1000M, 2G) (Default: 1G). --proxy-user NAME User to impersonate when submitting the application. --help, -h Show this help message and exit. Cluster deploy mode only: Spark standalone or Mesos with cluster deploy mode only: Spark standalone and Mesos only: Spark standalone and YARN only: YARN-only: Traceback (most recent call last): |
Oh, I see. You can not submit the job with |
I am running the python script directly, not using spark-submit. Could it be related to the ray version? Which ray version do you recommend? I'm using 1.2.0, but it looks like you are using 2.0.0.dev0 |
Did you set |
@yanivg10 , it seems there is a problem to start the java process in your environment. Do you have JDK installed and have the environment variable JAVA_HOME set properly? Please try to run "java -version" and then run "pyspark" to see if pyspark itself can work properly. |
Thanks. This was indeed related to the JAVA_HOME variable. It should not be set to any path for the code to run. I am able to run your taxi fare prediction example, but Torch is throwing errors while the training session runs. It seems related to issue #74 and I added a comment there with the output log. Can you take a look at that? |
Closes this. You can reopen if has further problems @yanivg10. |
I have a similar error at: spark = raydp.init_spark("test", 1, 1, "2g"). (1) If I run it at ipython3, I have the following error:
|
@dalgual , we don't have yaml or docker image published right now. Did you create your own image based on Ray's docker image and have RayDP and java installed properly? There is an example RayDP docker file for reference https://github.com/oap-project/raydp/tree/master/docker |
@carsonwang ------ config.raydp.yaml ---------------- max_workers: 2 provider: Whether to allow node reuse. If set to False, nodes will be terminatedinstead of stopped.cache_stopped_nodes: True # If not present, the default is True. autoscaling_mode: default auth: available_node_types: head_node_type: ray.head.default setup_commands: |
Unfortunately, we don't have such yaml now. |
@kira-lin ubuntu@ip-172-30-0-170:~/examples$ cat test_ray_init.py raydp.version import pyspark pyspark.version ray.init(address='auto', _redis_password='5241590000000000') ubuntu@ip-172-30-0-170:~/examples$ python test_ray_init.py |
I've verified that indeed raydp-0.4.1 does not work with ray 1.11.0. Can you please try to use previous versions of ray? Or you can use raydp-nightly, too. |
@kira-lin Thanks it works ;-) |
@kira-lin @carsonwang I run XGBoost using raydp by reading a dataset at s3. Do you have any example code to read S3 using ray? I got the following error Traceback (most recent call last): (RayDPSparkMaster pid=6477) 2022-03-25 01:52:02,284 INFO RayAppMaster [Thread-2]: Stopping RayAppMaster |
Are you able to read it using pyspark? Seems like reading/writing s3 requires some config, and some extra package, like hadoop-aws. Maybe you can refer to this |
Does ray and raydp access data at the same node/server? |
raydp is spark on ray. Anything you can do on spark should also be possible with raydp. Reading s3 should be possible with the library I provided above. This is another compact tutorial, hope it helps. |
@kira-lin Thanks. I found out that the following works: data = ray.data.read_csv("s3://my_bucket/test.csv") |
I cannot open ray dashboard when I launch a cluster with raydp and xgboost_ray together using "pip install xgboost_ray raydp-nightly" - I can with raydp only. Do you have any idea how I can open ray dashboard or make it run? |
That's strange. Do you mean ray dashboard or spark dashboard? Ray dashboard should be available without raydp. It should start once you create the cluster |
@kira-lin I can run PySpark xgboost-ray code now with CPU. However, even though I have the following code with "gpu_hist", I dont have any clue if the code runs using GPU - the code actually works well with CPU but not with GPU. Dashboard only shows the resouces of CPU. How I can tell if the code runs in GPU? Do you have any example code link with GPU? #g4dn.2xlarge, 1 GPU, 8 vCPUs, 32 GiB of memory, 225 NVMe SSD, up to 25 Gbps network performance spark = raydp.init_spark(app_name, num_executors, cores_per_executor, memory_per_executor) ubuntu@ip-172-30-0-247:~/tutorials$ ray status
|
@dalgual |
I got a similar issue, I would appreciate it if you could check: Environment$java --version
$echo $JAVA_HOME
$pip freeze | grep ray
$pip freeze | grep spark
Error messageswhen run raydp.init_spark(app_name='RayDP Example',
|
@yunju63 You should use raydp-nightly to work with Ray 2.4.0. Spark 3.1.3 is fine. Run |
@kira-lin It works, thank you :) |
This line of code fails:
spark = raydp.init_spark(app_name="RayDP example",
num_executors=2,
executor_cores=2,
executor_memory="4GB")
I am getting the following errors (linux server with a standalone Spark cluster):
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/internal/Logging
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at org.apache.spark.deploy.raydp.AppMasterEntryPoint.main(AppMasterEntryPoint.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.internal.Logging
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 13 more
Traceback (most recent call last):
File "/home/guryaniv/try_raydp_simple.py", line 4, in
spark = raydp.init_spark(app_name="RayDP example", num_executors=2, executor_cores=2, executor_memory="4GB")
File "/opt/anaconda3/envs/raydp/lib/python3.6/site-packages/raydp/context.py", line 122, in init_spark
return _global_spark_context.get_or_create_session()
File "/opt/anaconda3/envs/raydp/lib/python3.6/site-packages/raydp/context.py", line 68, in get_or_create_session
spark_cluster = self._get_or_create_spark_cluster()
File "/opt/anaconda3/envs/raydp/lib/python3.6/site-packages/raydp/context.py", line 62, in _get_or_create_spark_cluster
self._spark_cluster = SparkCluster()
File "/opt/anaconda3/envs/raydp/lib/python3.6/site-packages/raydp/spark/ray_cluster.py", line 31, in init
self._set_up_master(None, None)
File "/opt/anaconda3/envs/raydp/lib/python3.6/site-packages/raydp/spark/ray_cluster.py", line 37, in _set_up_master
self._app_master_bridge.start_up()
File "/opt/anaconda3/envs/raydp/lib/python3.6/site-packages/raydp/spark/ray_cluster_master.py", line 52, in start_up
self._gateway = self._launch_gateway(extra_classpath, popen_kwargs)
File "/opt/anaconda3/envs/raydp/lib/python3.6/site-packages/raydp/spark/ray_cluster_master.py", line 115, in _launch_gateway
raise Exception("Java gateway process exited before sending its port number")
Exception: Java gateway process exited before sending its port number
To run the code I am using spark-submit with the spark master:
spark-submit --master spark://:7077 raydp_example.py
The text was updated successfully, but these errors were encountered: