Problem running PySpark on Windows #4253

Mark531 · 2018-07-03T12:43:40Z

Hello,

I've installed PyPark with pip on Windows 10 and started it. I first fixed the right access problem with /tmp/hive, then I tried a simple command:
textFile = spark.read.text(r"c:\files\text.txt")

Then I got an avalanche of Java exceptions starting with:
java.sql.SQLException: Failed to start database 'metastore_db' with class loader org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@3a5331e1, see the next exception for details.

What's the problem, do I have to install Hadoop prior to PySpark? I thought a standalone Spark engine could be run. I can't find any complete documentation about how to install PySpark.

Thanks in advance,
Mark

jamadden · 2018-07-03T13:23:42Z

Thanks for the report. Unfortunately, this is not the correct place for it. This tracker is only for issues with the software that runs https://pypi.org itself. It cannot help with packages installed from there. You'll need to contact the PySpark maintainers. You can probably find how to do that on the PySpark project page.

Mark531 · 2018-07-03T15:25:14Z

Hi Jason, I was precisely on that page and clicked on "Bugs & feedback" at the bottom to get to this github. I haven't found the right place to report issues about PySpark.

jamadden · 2018-07-03T15:28:07Z

That "Bugs & feedback" link is in the section titled "Contributing to PyPI" in the page footer. There are a number of links higher up about the project. Try the 'homepage' link on the left, or look at the "open issues/PRs" link also on the left; in the body text there are a couple links directly to the project's external website you could also look at.

Mark531 · 2018-07-03T15:41:23Z

I cannot find links on the places you suggest, except perhaps the Community/Issue Tracker menu on the main Spark project page http://spark.apache.org/ but my issue concerns the configuration of PySpark, not Spark. It seems that Hadoop has not been configured properly by the pip installer. But I can't find any complete documentation about how to install properly PySpark.

jamadden · 2018-07-03T15:47:44Z

The PySpark project page links directly to this issue tracker so presumably that's where issues should go. (Of course, the project page also says "This packaging is currently experimental and may change in future versions (although we will do our best to keep compatibility)" so there may not be much support to be had.)

At any rate, this forum is definitely not the correct place.

Mark531 · 2018-07-03T15:52:44Z

Oh ok, I thought there was an issue tracker dedicated to PySpark.
Thanks for your help.

jamadden closed this as completed Jul 3, 2018

di mentioned this issue Jul 3, 2018

Add some PyPI-specific language to our help FAQ #4256

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem running PySpark on Windows #4253

Problem running PySpark on Windows #4253

Mark531 commented Jul 3, 2018

jamadden commented Jul 3, 2018

Mark531 commented Jul 3, 2018

jamadden commented Jul 3, 2018

Mark531 commented Jul 3, 2018

jamadden commented Jul 3, 2018

Mark531 commented Jul 3, 2018

Problem running PySpark on Windows #4253

Problem running PySpark on Windows #4253

Comments

Mark531 commented Jul 3, 2018

jamadden commented Jul 3, 2018

Mark531 commented Jul 3, 2018

jamadden commented Jul 3, 2018

Mark531 commented Jul 3, 2018

jamadden commented Jul 3, 2018

Mark531 commented Jul 3, 2018