Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem running PySpark on Windows #4253

Closed
Mark531 opened this issue Jul 3, 2018 · 6 comments
Closed

Problem running PySpark on Windows #4253

Mark531 opened this issue Jul 3, 2018 · 6 comments

Comments

@Mark531
Copy link

Mark531 commented Jul 3, 2018

Hello,

I've installed PyPark with pip on Windows 10 and started it. I first fixed the right access problem with /tmp/hive, then I tried a simple command:
textFile = spark.read.text(r"c:\files\text.txt")

Then I got an avalanche of Java exceptions starting with:
java.sql.SQLException: Failed to start database 'metastore_db' with class loader org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@3a5331e1, see the next exception for details.

What's the problem, do I have to install Hadoop prior to PySpark? I thought a standalone Spark engine could be run. I can't find any complete documentation about how to install PySpark.

Thanks in advance,
Mark

@jamadden
Copy link
Contributor

jamadden commented Jul 3, 2018

Thanks for the report. Unfortunately, this is not the correct place for it. This tracker is only for issues with the software that runs https://pypi.org itself. It cannot help with packages installed from there. You'll need to contact the PySpark maintainers. You can probably find how to do that on the PySpark project page.

@jamadden jamadden closed this as completed Jul 3, 2018
@Mark531
Copy link
Author

Mark531 commented Jul 3, 2018

Hi Jason, I was precisely on that page and clicked on "Bugs & feedback" at the bottom to get to this github. I haven't found the right place to report issues about PySpark.

@jamadden
Copy link
Contributor

jamadden commented Jul 3, 2018

That "Bugs & feedback" link is in the section titled "Contributing to PyPI" in the page footer. There are a number of links higher up about the project. Try the 'homepage' link on the left, or look at the "open issues/PRs" link also on the left; in the body text there are a couple links directly to the project's external website you could also look at.

@Mark531
Copy link
Author

Mark531 commented Jul 3, 2018

I cannot find links on the places you suggest, except perhaps the Community/Issue Tracker menu on the main Spark project page http://spark.apache.org/ but my issue concerns the configuration of PySpark, not Spark. It seems that Hadoop has not been configured properly by the pip installer. But I can't find any complete documentation about how to install properly PySpark.

@jamadden
Copy link
Contributor

jamadden commented Jul 3, 2018

The PySpark project page links directly to this issue tracker so presumably that's where issues should go. (Of course, the project page also says "This packaging is currently experimental and may change in future versions (although we will do our best to keep compatibility)" so there may not be much support to be had.)

At any rate, this forum is definitely not the correct place.

@Mark531
Copy link
Author

Mark531 commented Jul 3, 2018

Oh ok, I thought there was an issue tracker dedicated to PySpark.
Thanks for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants