New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
misc(Python): Replace python 2 with python 3 #1341
misc(Python): Replace python 2 with python 3 #1341
Conversation
5ba4b97
to
04032d5
Compare
I don't think we can stop supporting python 2 all of a sudden. |
This PR does not drop support for python 2 at all. It just changes the default python version used in the test cases. If tests should still be executed with python 2, the env variable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In file src/python/sparkjobserver/subprocess.py
, do we need to add something like a WHEELPATH
?
job-server-python/src/test/scala/spark/jobserver/python/SubprocessSpec.scala
Show resolved
Hide resolved
04032d5
to
b5f30d5
Compare
I am not sure about this statement. |
@bsikander: Yes, you are correct, my comment was incorrect. Regarding the |
b5f30d5
to
45fafe6
Compare
I agree |
d5029f2
to
b08d169
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to run a simple word count but getting the following error. My python version is 3.7.9. My spark is 2.4.7.
....
TypeError: an integer is required (got type bytes)
....
org.apache.spark.SparkException: No port number in pyspark.daemon's stdout
....
As per your comment, I set the PYTHON_EXECUTABLE directly before the command and also tried to put it in my config/<env>.sh
but still the same error.
Here are the commands that I have been using
curl -X POST "localhost:8090/contexts/py-context?context-factory=spark.jobserver.python.PythonSessionContextFactory"
curl --data-binary @./job-server-python/target/python/sjs_python_examples-0.10.1_SNAPSHOT-py3-none-any.whl -H 'Content-Type: application/python-archive' localhost:8090/binaries/py_bin
curl -d 'input.strings = ["a", "b", "a", "b" ]' "localhost:8090/jobs?appName=py_bin&classPath=example_jobs.word_count.WordCountSparkSessionJob&context=py-context&sync=true" | jq
Any ideas?
@bsikander The PYTHON_EXECUTABLE variable is only used for testing. If you want to change the binary for actual usage, you have to change the I think, I should also add somewhere in the docs, that python 3.8 is not supported. Just for users encountering the same issues you have right now. Edit: I have added a comment in the Edit2: I just noticed that you uploaded a wheel file instead of a egg. I am not sure whether this will work, I have not tested it (but would actually be really cool if it just works out of the box) |
8f0422f
to
fc4e387
Compare
Codecov Report
@@ Coverage Diff @@
## master #1341 +/- ##
=======================================
Coverage 80.89% 80.89%
=======================================
Files 96 96
Lines 3973 3973
Branches 203 203
=======================================
Hits 3214 3214
Misses 759 759 Continue to review full report at Codecov.
|
Ok. My
I tried both and got the same error. Interestingly, in both case the job atleast got deployed, so it seems that jobserver is not doing any checks on wheel/eggs |
Hm, weird. I just tried the following:
with output
Could you please verify that you are really using python 3.7 in the running application. A simple check would be adding What makes this even stranger: With exception of the |
Update: Somehow, if I just use |
Cool, that wheels work. Maybe I will find time this week to look a bit more at wheels and provide a small PR with code maintenance/doc changes if really everything just works. |
@noorul The change looks fine. It can work with python2 also. Any concerns? |
Steps to build and use python jobserver. Writing it here to save time for others.
|
Python 2 has reached EOL last year and should not be used anymore. This commit replaces all references to the "python" binary with the more explicit "python3" binary. If desired, the build can still be performed for Python 2 by settings the "PYTHON_EXECUTABLE" environment variable to an appropriate version. Additionally, python wheels are the preferred way to distribute python code (see https://packaging.python.org/discussions/wheel-vs-egg/). This commit additionally builds the job-server-python wheel. Spark-2.4 does not support python >= 3.8 (see apache/spark#26194) leading to failed test cases (TypeError: an integer is required (got type bytes)). If you encounter these issues try to state a python executable < 3.8 explicitly.
5403715
to
5ac3fe1
Compare
Ok, I messed up the commits. Should be fixed now. |
@bsikander Looks GTM since we are documenting this as breaking change. |
Python 2 has reached EOL last year and should not be used anymore. This commit replaces all references to the "python" binary with the more explicit "python3" binary. If desired, the build can still be performed for Python 2 by settings the "PYTHON_EXECUTABLE" environment variable to an appropriate version .Additionally, python wheels are the preferred way to distribute python code (see https://packaging.python.org/discussions/wheel-vs-egg/). This commit replaces the job-server-python egg with a wheel.
Pull Request checklist
Other information:
If your
python3
executable points to a python >= 3.8 theSubprocessSpec
fails as Spark-2.4 does not support python > 3.7. To resolve this issue set thePYTHON_EXECUTABLE
to a python version < 3.8. I have added an according comment in the spec.This change is