Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create kernel launch application for Toree Scala kernels to enable pull-mode #34

Closed
kevin-bates opened this issue Jun 19, 2017 · 14 comments

Comments

@kevin-bates
Copy link
Member

Similar to the launch script for the py spark kernel (see launch_ipkernel.py), the Toree/Scala kernel invocations could also utilize a launch application that is responsible for constructing the necessary information for location-relative connection files thereby dramatically decreasing the likelihood of port conflicts. This would include the identification of local ports and public IP - as well as generation of the key used to sign socket traffic.

One approach could be to modify Toree itself to detect a non-existent connection file and produce the applicable information at startup - writing out that information to the specified file, but a launch application provides an area of placing other forms of logic (e.g., workspace management) that we might not have otherwise.

@LK-Tmac1 LK-Tmac1 self-assigned this Jun 19, 2017
@LK-Tmac1
Copy link
Contributor

If we want to modify the Toree, do we want to open an Issue on JIRA and later a PR?

@kevin-bates kevin-bates added this to the Sprint 4 milestone Jun 19, 2017
@kevin-bates
Copy link
Member Author

Yes - that is correct. However, I believe we should implement a launch application instead so as to provide us with unlimited flexibility and consistency (with no kernel update necessary). Once we produce a few such launch scripts (these are probably language specific items - not kernel specific), then a template can be documented for kernels of other languages to be produced.

@ckadner
Copy link
Collaborator

ckadner commented Jun 20, 2017

These "launchers" have 2 aspects:

  1. finding available local ports and creating the connection file (different per language)
  2. launching the actual kernel process (different per kernel)

In Spark we have compiled languages (Java, Scala) and scripted languages (Python, R). For the compiled languages we need to build and package a small launcher application as opposed to mere launcher scripts. For that we will need to add some build infrastructure, i.e. Maven or SBT to build and package jar files.

@kevin-bates
Copy link
Member Author

+1 @ckadner
Regarding item 2, I was hoping the kernel process to launch would be conveyed as a parameter to the launch script/application, thereby only requiring language-specific launchers. I suspect you meant the same, but just want to be sure we only have an order of N thing and not NxM.

Also, it seems like the 'language' aspect of these launchers is really a function of the kernel's implementation language. For example, the Python and R kernels in Toree require that a Scala application be launched. As a result, the launcher for these kernels would still be written in Scala.

@LK-Tmac1
Copy link
Contributor

LK-Tmac1 commented Jun 20, 2017

Yes, I agree that we'd better create a launch application instead of changing the Toree kernel.

Would it be the easiest way if we do the similar thing as the launch_ipykernel.py script?

That is, simply add another Python script that handles the non-existent connection file and let the run.sh call this script before the spark-submit.

@kevin-bates
Copy link
Member Author

Generation of the connection file needs to take place on the destination node and its via spark-submit in which that occurs. I don't know spark-submit well enough to know if it can launch a script/application in one language then have that thing start something in another language. I suspect spark-submit determines the interpreter based on the launch target - so I doubt switching languages would work. (Then again, I could be way off-base.)

@ckadner
Copy link
Collaborator

ckadner commented Jun 20, 2017

Kevin is correct about this:

spark-submit determines the interpreter based on the launch target

@LK-Tmac1
Copy link
Contributor

LK-Tmac1 commented Jun 20, 2017

I see your idea. Since connection file should be in the node where kernel will be running, we need to create the connection file after run.sh is called.

I am going to develop a layer (let's call it ToreeLauncher for now) around the Main method of Toree to generate the missing connection file, after spark-submit issued in run.sh but before the kernel starts.

@ckadner
Copy link
Collaborator

ckadner commented Jun 21, 2017

@liukun1016 -- to better understand how the IPython kernel does find empty ports and create the connection file, start by taking a look at IPKernelApp.initialize(). For finding free ports you could also look at Akka (there your code is already in Scala), but for the session key I think we need to replicate what IPython does in ... jupyter_client/session.py

Just a nit-pick on terminology 😉 ... the "launch application" you are writing is not really a "wrapper" ... it will simply call the ToreeMain application once it is done creating the connection file.

@LK-Tmac1
Copy link
Contributor

@ckadner I see. Yes wrapper might not be a good name though. (updated the comment)

Will take a look at your references and thanks a lot!

@kevin-bates
Copy link
Member Author

Please take a look at PR #43. That PR makes a change to the kernelspec format (where connection_file_mode can be specified) - so just wanted to bring that to your attention since it might affect your merge. Also, once pull mode is implemented, socket mode is a small step away.

@LK-Tmac1
Copy link
Contributor

The source code of the Toree Launcher application: https://github.com/liukun1016/ToreeLauncher/blob/master/src/main/scala/launcher/ToreeLauncher.scala

Pull mode is already finished. Will test the socket mode.

@kevin-bates
Copy link
Member Author

@liukun1016 - the launcher looks good! Just have a couple comments/suggestions...

  1. We should make sure the close calls for the BufferedWriter and Socket are in finally blocks in case the preceding write calls fail.
  2. Let's move the block of code for sending the file back on the socket into the block of code that detects the file doesn't exist (so just after creating the file). The reason for this is because the user may change the connection mode from 'socket' to 'push' (although extremely unlikely), but if they don't remove the --response-address parameter from the kernel.json file, then the code will try to bind to a socket that won't exist - since Elyra has pushed the file. (Actually, it looks like code will exit(-1) with an invalid socket address message since the value of that parameter will be ERROR__NO__KERNEL_RESPONSE_ADDRESS, but same effect.)

Also, have you had a chance to see if creating a Spark Context allows us to get around the Lazy load issue?

kevin-bates pushed a commit that referenced this issue Jul 6, 2017
…#49)

* Toree Launcher application supports pull and socket mode

* update run.sh

* use play.lib.json for json serialization; update run.sh + kernel.json for launcher opt/jar

* update println in writeToSocket method
@LK-Tmac1
Copy link
Contributor

LK-Tmac1 commented Jul 6, 2017

PR merged. We are going to maintain the Toree Launcher app source codes separately. So we don't have to include that in elyra building process.

Project created under https://github.com/SparkTC/elyra-toree-launcher

Close this issue now.

@LK-Tmac1 LK-Tmac1 closed this as completed Jul 6, 2017
@kevin-bates kevin-bates modified the milestones: Sprint 4, v0.6 Mar 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants