Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate running R notebook on Yarn Cluster mode #8

Closed
sherryxg opened this issue May 12, 2017 · 4 comments
Closed

Investigate running R notebook on Yarn Cluster mode #8

sherryxg opened this issue May 12, 2017 · 4 comments

Comments

@sherryxg
Copy link
Contributor

No description provided.

@akchinSTC
Copy link
Collaborator

Installed IOP with HDFS/YARN/SPARK
yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
yum install -y python2-pip.noarch openssl-devel.x86_64 R libcurl-devel.x86_64
python -m pip install --upgrade --force pip
pip install setuptools==33.1.1
pip install paramiko
pip install wheel

Download Elyra and cd into root dir
python setup.py bdist_wheel
pip install jupyter_kernel_gateway-2.0.0.dev0-py2.py3-none-any.whl
Start a R session and run:
install.packages('devtools')
install.packages('RCurl') # Need RCurl for install_github
library(devtools)
install_github('IRkernel/repr')
install_github('IRkernel/IRdisplay')
install_github('IRkernel/IRkernel')
install_github('apache/spark@v2.1.0', subdir='R/pkg')
IRkernel::installspec(user = FALSE) # installs for all users
Startup jupyter under spark user
Open Notebook in browser
Set to R-kernel
Sys.setenv(SPARK_HOME='/usr/iop/current/spark2-client')
.libPaths(c(file.path(Sys.getenv('SPARK_HOME'), 'R', 'lib'), .libPaths()))
library(SparkR)

@sherryxg sherryxg added this to the Sprint 1 milestone May 16, 2017
@lresende lresende modified the milestones: Sprint 2, Sprint 1 May 23, 2017
@akchinSTC
Copy link
Collaborator

No new updates. Still working on the chicken/egg scenario where the R kernel executes spark submit prior to setting cluster mode

@akchinSTC
Copy link
Collaborator

Following the comment 2,
Open Notebook in browser
Create notebook and set to R-kernel so that it generates a connection file in the host file system
Login to the system hosting the notebook and search for the connection file
On the command line, type 'R' to start a new R shell
import the IRkernel library with
"library(IRkernel)"
start the IRkernel main method while passing it the connection file to the kernel
"IRkernel::main("[local fs path to connection file)"
The R shell should be connected to the kernel now..check the json in the connection file and run a "netstat -nlp" to verify that the ports are currently in use.

@akchinSTC
Copy link
Collaborator

Make sure to halt the notebook to free up the ports in use before trying to pass the connection file otherwise youll see the following
R_zmq_bind errno: 98 strerror: Address already in use
R_zmq_bind errno: 98 strerror: Address already in use
R_zmq_bind errno: 98 strerror: Address already in use
R_zmq_bind errno: 98 strerror: Address already in use
R_zmq_bind errno: 98 strerror: Address already in use

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants