Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Spark's Hadoop token file available to Python method #1532

Merged
merged 4 commits into from Dec 2, 2019

Conversation

@EnricoMi
Copy link
Contributor

EnricoMi commented Nov 22, 2019

Python code run via horovod.spark.run() needs the
Hadoop token file to access secure HDFS. This is
made available to the Spark executor via environment
variable HADOOP_TOKEN_FILE_LOCATION. SparkTaskService
needs to pass this to orted, which runs the Python code.

EnricoMi added 2 commits Nov 22, 2019
Python code run via horovod.spark.run() needs the
Hadoop token file to access secure HDFS. This is
made available to the Spark executor via environment
variable HADOOP_TOKEN_FILE_LOCATION. SparkTaskService
needs to pass this to orted, which runs the Python code.

Signed-off-by: Enrico Minack <github@enrico.minack.dev>
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
@EnricoMi EnricoMi force-pushed the G-Research:branch-spark-kerberos branch from 70c73f0 to da77bf3 Nov 23, 2019
@EnricoMi

This comment has been minimized.

Copy link
Contributor Author

EnricoMi commented Nov 23, 2019

Build failed on unrelated issues, might be transient. Please retry the build checks.

Copy link
Collaborator

tgaddair left a comment

Looks good! Just have a small request about the additional test dep.

import horovod.torch as hvd

from mock import MagicMock

if sys.version_info[0] < 3:

This comment has been minimized.

Copy link
@tgaddair

tgaddair Nov 23, 2019

Collaborator

Rather than add a dependency on backports.tempfile, I think it would be easier to just write our own temp directory function. We already did this in an upcoming PR (https://github.com/horovod/horovod/blob/spark_estimator/test/common.py#L63). Maybe you can cherry pick that context manager and use it here?

This comment has been minimized.

Copy link
@EnricoMi

EnricoMi Nov 24, 2019

Author Contributor

I know, this import and backports.tempfile dependency is ugly, but I'd prefer that over duplicating code, even if it is small.

A related question that may help me assess the situation: how long do you plan to support Python2.7? It will die in a month. Is it worth adding this dependency / your own implementation when this is only needed to support Python2.7, which will go away short term anyway?

This comment has been minimized.

Copy link
@tgaddair

tgaddair Nov 25, 2019

Collaborator

Most likely we will drop Python 2.7 support at the end of the year, but that decision needs to be brought before the steering committee, in case anyone is stuck with Python 2.7 for whatever reason.

If you're uncomfortable adding that extra function, one option would be to simply skip this test for Python 2 by adding the following decorator:

@pytest.mark.skipif(sys.version_info < (3, 0), reason="tempfile.TemporaryDirectory only available in Python 3")

This comment has been minimized.

Copy link
@EnricoMi

EnricoMi Nov 25, 2019

Author Contributor

Interesting alternative.

Since you are introducing the TemporaryDirectory anyway, I will use it. But it should get removed once Python2.7 support is dropped.

This comment has been minimized.

Copy link
@alasdairm-gr

alasdairm-gr Dec 2, 2019

@tgaddair is this okay to merge now? It would be really useful to have in the next Horovod release

This comment has been minimized.

Copy link
@tgaddair

tgaddair Dec 2, 2019

Collaborator

Sorry for the delay! LGTM!

EnricoMi added 2 commits Nov 25, 2019
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
@tgaddair tgaddair merged commit e3d63de into horovod:master Dec 2, 2019
2 checks passed
2 checks passed
DCO DCO
Details
buildkite/horovod/pr Build #1421 passed (1 hour, 6 minutes, 57 seconds)
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
3 participants
You can’t perform that action at this time.