-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PySpark kernel for Jupyterhub creates file “???????e?” in working directory #9613
Comments
That seem like a pyspark kernel issue... and I'm not sure which one... this one? |
But I do not use the Apache Toree kernel. I have modified the Python2.7 kernel in following way:
This kernel is the same on both servers and with JupyterHUB 0.5.0 everything works just fine. I updated the stackoverflow question - please check the code I found. Please find attached shell.py - maybe it could help in further investigation. Thank you for your time. |
That sounds like something has used binary data as a filename. I'm not sure what might be doing that - I can't think of anything in our code that would. |
@LMtx what happens if you leave out the PYTHONSTARTUP env? It's likely to be something in the spark startup that's doing it. Can you compare the |
Hello, The only difference I found was: If I don't remove PYTHONSTARTUP, I'm not able to run |
Does it write the same file every time, or a different one? If it's the same, try making it read-only - maybe you'll get a failure that shows what's trying to open it. |
Yes, it writes the same file every time. Just once it created additional file "????????????????????????????????????????????????????????????????????????????????????????????????????????????????????e**-journal**" with content similar to "???..." file (but I could not reproduce this). The "???...-journal" file ended with:
That line is missing in the "???..." file. I removed all permissions for file "???...". When I tried to start the pyspark kernel it stuck in the reboot loop. No additional file similar to "???..." was created. Please find the attached tail of jupyterhub.log file. |
Interesting, it looks to be failing to open the IPython history database. Do you have any IPython configuration to set the history file to something in particular? |
No, we are using the default settings. The IPython version: Name: ipykernel |
In that case, can you move your IPython directory to a temporary location, so that it starts fresh:
and launch again? |
I moved the directory and started notebook - still the reboot loop :( |
The file name "???..." decoded to asci: <80><80><90><81><81><90><81><90><81><81><81><90><81><80><80><81><90><80><90><81><80><81><81><80><80><81><90><81><90><81><80><81><9f><81><90><81><80><81><80><80><81><81><90><81><80><80><81><80><81>e^A Maybe that will help? |
Try configuring IPython to store history in memory instead of on disk: |
Can I reconfigure IPython only for my user - I do not want to change the configuration of JupyterHUB on production server for everyone (at least not yet). |
Yes, run |
PySpark kernel seams to work but it still generates strange files: filename: <90><80><90><81><81>y<83>羳: content (plus lots of binary characters): <82>7tableoutput_historyoutput_history^FCREATE TABLE output_history Any more ideas what is going on? |
Not really. It looks like part of the history database, but I've no idea why it's getting written to garbage filenames. Maybe your sqlite library or the Python bindings are corrupted? That's a total guess, though. I don't think anyone's ever reported something like that. |
For that matter, maybe the filesystem is corrupted so that part of the data that should be in a file is showing up as a filename. Can you |
That could be the case but I think that one should not run |
Yeah, I believe you have to unmount a filesystem to fsck it, so if it's the root fs, that means rebooting. |
What would you recommend to totally remove JupyterHUB and IPython (with all configuration files etc.)? I going to take my chance and reinstall Jupyter - maybe it would help somehow. |
I have no reason to think that would fix it, but as I don't understand what's gone wrong, I can't rule it out. |
Quick update: neither reinstallation of version 0.6.1 nor downgrade to 0.5.0 solved my issue. I am seriously considering bringing down the server to Any additional thoughts on this subject are very welcome. |
The version of Jupyterhub almost certainly won't affect it, because the history database is written by IPython. But I don't think a different version of IPython is likely to fix it either. |
We have tried reinstalling the IPython as well. The problem occurs not only for PySpark kernel but also for Python2.7 kernel. Now we verify if Python2.7 on both servers was compiled in the same way - we used the same shell commands but maybe packages installed on both servers vary in a way that impacted the compilation. |
It occurred that compilation of Python2.7 on one server was somehow corrupted - "copy/paste" whole installation folder from one machine to another solved our issue. I guess that there are some differences in installed rpm packages on both machines - we are going to investigate this issue but the most urgent problem is fixed. Thank you for your time @takluyver and @minrk |
Great, thanks for letting us know. |
Hello again, It looks like IPython (python2.7 kernel) generates the "???..." files when SQLite3 is installed on the box. Did you encounter any issues regarding this version of SQLite? |
SQLite3 has been the standard version of sqlite for many years, as far as I know. I have 3.11.0 and 3.13.0 in different Pythons on my machine. I have never seen an issue similar to this with any version of sqlite. |
On CentOS 6 I use JupyterHUB 0.6.1 with PySpark kernel (Spark 1.5.0 + Python 2.7).
When I start PySpark kernel JupyterHUB creates file named "????????????????????????????????????????????????????????????????????????????????????????????????????????????????????e?" in my working directory.
Mirror server with JupyterHUB 0.5.0 does not have this issue.
Please see details on http://stackoverflow.com/questions/37808659/pyspark-kernel-for-jupyterhub-creates-file-e-in-working-directory
The text was updated successfully, but these errors were encountered: