New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem running PySpark on Windows #4253
Comments
Thanks for the report. Unfortunately, this is not the correct place for it. This tracker is only for issues with the software that runs https://pypi.org itself. It cannot help with packages installed from there. You'll need to contact the PySpark maintainers. You can probably find how to do that on the PySpark project page. |
Hi Jason, I was precisely on that page and clicked on "Bugs & feedback" at the bottom to get to this github. I haven't found the right place to report issues about PySpark. |
That "Bugs & feedback" link is in the section titled "Contributing to PyPI" in the page footer. There are a number of links higher up about the project. Try the 'homepage' link on the left, or look at the "open issues/PRs" link also on the left; in the body text there are a couple links directly to the project's external website you could also look at. |
I cannot find links on the places you suggest, except perhaps the Community/Issue Tracker menu on the main Spark project page http://spark.apache.org/ but my issue concerns the configuration of PySpark, not Spark. It seems that Hadoop has not been configured properly by the pip installer. But I can't find any complete documentation about how to install properly PySpark. |
The PySpark project page links directly to this issue tracker so presumably that's where issues should go. (Of course, the project page also says "This packaging is currently experimental and may change in future versions (although we will do our best to keep compatibility)" so there may not be much support to be had.) At any rate, this forum is definitely not the correct place. |
Oh ok, I thought there was an issue tracker dedicated to PySpark. |
Hello,
I've installed PyPark with pip on Windows 10 and started it. I first fixed the right access problem with /tmp/hive, then I tried a simple command:
textFile = spark.read.text(r"c:\files\text.txt")
Then I got an avalanche of Java exceptions starting with:
java.sql.SQLException: Failed to start database 'metastore_db' with class loader org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@3a5331e1, see the next exception for details.
What's the problem, do I have to install Hadoop prior to PySpark? I thought a standalone Spark engine could be run. I can't find any complete documentation about how to install PySpark.
Thanks in advance,
Mark
The text was updated successfully, but these errors were encountered: