Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to connect to skein driver #165

Closed
vdechand opened this issue Apr 19, 2019 · 11 comments

Comments

Projects
None yet
2 participants
@vdechand
Copy link

commented Apr 19, 2019

Hello,

as requested by @jcrist, this issue is on the problem I posted on stack overflow. Not sure if I should copy-paste the entire thing here.

However, the environment I am trying to get dask-yarn/skein working on is a 3 Node EMR v.5.21.0 cluster.

Kind regards
Vadim


Edit: copied from stackoverflow below:

I am trying to get dask-yarn running. I figured out that, the skein module is used for submitting applications to the YARN cluster and it is failing to do so.

import skein
c = skein.Client()

This code fails with:

raise ConnectionError("Unable to connect to %s" % self._server_name)
skein.exceptions.ConnectionError: Unable to connect to driver

Starting the driver from CLI with skein driver start --log-level debug --log log.txt produces the following log while printing localhost:someport to the console:

19/04/18 10:34:04 DEBUG skein.Driver: Logging in using ticket cache
19/04/18 10:34:05 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
19/04/18 10:34:05 INFO client.RMProxy: Connecting to ResourceManager at ip/ip:8032
19/04/18 10:34:06 INFO skein.Driver: Driver started, listening on 36963
19/04/18 10:34:06 DEBUG skein.Driver: Reporting gRPC server port back to 
the launching process

Stopping the driver or trying to get an empty list of running applications writes Error: Unable to connect to driver into the shell.

Some time ago dask-yarn was based on knit, so I tried that one out for applications submission. In knit I successfully submitted some hello world app, but with some manual configuration of the name node/resource manager in the client. Autodetection would not work. Maybe it is a similar problem? The skein driver is written in Java and I can't find a way to configure it the same way as knit.

Thanks

@jcrist

This comment has been minimized.

Copy link
Owner

commented Apr 19, 2019

Thanks for the issue report. Dask-Yarn and Skein have worked on EMR for myself and others, I'm not sure what's going on here. Could you run the following script and report back with the full output?

import skein
print(skein.__version__)

client = skein.Client(log_level='debug')

# From your above description the error should happen before
# you get here, but just in case lets try an operation
client.get_applications()

Autodetection would not work. Maybe it is a similar problem? The skein driver is written in Java and I can't find a way to configure it the same way as knit.

Knit tried to find and parse the hadoop configuration files in Python, which was error prone. Skein just relies on the Hadoop Java libraries for this obviating the need for configuration. The issue you're running into is that the background java process that Skein starts is failing to communicate with the Python process that started it.

@vdechand

This comment has been minimized.

Copy link
Author

commented Apr 19, 2019

Failing to create a client, so no operations on that one.

Python 3.6.7 (default, Dec 21 2018, 20:31:01)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import skein
>>> print(skein.__version__)
0.6.1

>>> client = skein.Client(log_level='debug')
19/04/19 17:26:11 DEBUG skein.Driver: Logging in using ticket cache
19/04/19 17:26:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/04/19 17:26:12 INFO client.RMProxy: Connecting to ResourceManager at AWS DNS/IP ADDRESS:8032
19/04/19 17:26:12 INFO skein.Driver: Driver started, listening on 38509
19/04/19 17:26:12 DEBUG skein.Driver: Reporting gRPC server port back to the launching process
19/04/19 17:26:12 DEBUG skein.Driver: Starting process disconnected, shutting down
19/04/19 17:26:12 INFO skein.Driver: Driver shut down
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/USERNAME/.local/lib/python3.6/site-packages/skein/core.py", line 363, in __init__
    self._call('ping', proto.Empty())
  File "/home/USERNAME/.local/lib/python3.6/site-packages/skein/core.py", line 287, in _call
    raise ConnectionError("Unable to connect to %s" % self._server_name)
skein.exceptions.ConnectionError: Unable to connect to driver

>>> client.get_applications()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'client' is not defined
@jcrist

This comment has been minimized.

Copy link
Owner

commented Apr 19, 2019

That's odd. We start the Java driver process with a pipe from Python connected to stdin of that process. The java process blocks on reading from this pipe, and shutsdown when it closes. This lets the Java process know when the Python process has exited so that it can shutdown as well. For some reason the pipe is being closed unexpected, resulting in the Java process exiting early. I've never seen this error before.

When run as a background process a different exit method is used - lets see if that works.

import skein

skein.Client.stop_global_driver(force=True)
skein.Client.start_global_driver(log_level='debug', log='driver.log')
client = skein.Client.from_global_driver()
client.get_applications()

If the java process starts, you'll want to shut it down later:

$ skein driver stop --force

Can you report back here with the results of that script and the driver.log output? Thanks.

@vdechand

This comment has been minimized.

Copy link
Author

commented Apr 19, 2019

Your code snippet fails in the same way and the driver.log contains the exact same logs from the post before.

Java on the machine is:

openjdk version "1.8.0_191"
OpenJDK Runtime Environment (build 1.8.0_191-b12)
OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)
@jcrist

This comment has been minimized.

Copy link
Owner

commented Apr 19, 2019

Can you post the log anyway?

@vdechand

This comment has been minimized.

Copy link
Author

commented Apr 19, 2019

Sure!

Shell output:

>>> import skein
>>> skein.Client.stop_global_driver(force=True)
>>> skein.Client.start_global_driver(log_level='debug', log='driver.log')
'localhost:45165'
>>> client = skein.Client.from_global_driver()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/USERNAME/.local/lib/python3.6/site-packages/skein/core.py", line 379, in from_global_driver
    return Client(address=address, security=security)
  File "/home/USERNAME/.local/lib/python3.6/site-packages/skein/core.py", line 363, in __init__
    self._call('ping', proto.Empty())
  File "/home/USERNAME/.local/lib/python3.6/site-packages/skein/core.py", line 287, in _call
    raise ConnectionError("Unable to connect to %s" % self._server_name)
skein.exceptions.ConnectionError: Unable to connect to driver
>>> client.get_applications()

driver.log:

19/04/19 20:25:18 DEBUG skein.Driver: Logging in using ticket cache
19/04/19 20:25:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/04/19 20:25:19 INFO client.RMProxy: Connecting to ResourceManager at AWS DNS/IP ADDRESS:8032
19/04/19 20:25:19 INFO skein.Driver: Driver started, listening on 45165
19/04/19 20:25:19 DEBUG skein.Driver: Reporting gRPC server port back to the launching process
@jcrist

This comment has been minimized.

Copy link
Owner

commented Apr 19, 2019

The driver doesn't report itself as shutting down like it did above - is it still running after doing this (check using ps)? I wonder if localhost isn't binding correctly - you could try using 127.0.0.1 manually instead:

import skein
client = skein.Client(address='127.0.0.1:45165')  # assuming same port as in the logs above
@vdechand

This comment has been minimized.

Copy link
Author

commented Apr 19, 2019

Thanks Jim!

That one did it. Any ideas what caused this problem?
In the /etc/hosts file localhost is mapped.

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost6 localhost6.localdomain6
@jcrist

This comment has been minimized.

Copy link
Owner

commented Apr 20, 2019

Hmmm, I'm not sure. It doesn't look like localhost should ever resolve to the ipv6 address, but I'm not a networking expert.

I've pushed #166 to force use of 127.0.0.1 instead of localhost. If you could try installing that version to see if things are fixed that'd be appreciated.

git clone git@github.com:jcrist/skein.git
git checkout no-localhost
# Requires maven to already be installed
pip install .
@vdechand

This comment has been minimized.

Copy link
Author

commented Apr 20, 2019

I've built and tested creating a client. Works.
Thanks for your help again.

@jcrist

This comment has been minimized.

Copy link
Owner

commented Apr 21, 2019

Glad to hear it. I'll push a release sometime this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.