PySpark kernel for Jupyterhub creates file “???????e?” in working directory #9613

LMtx · 2016-06-14T13:54:58Z

On CentOS 6 I use JupyterHUB 0.6.1 with PySpark kernel (Spark 1.5.0 + Python 2.7).

When I start PySpark kernel JupyterHUB creates file named "????????????????????????????????????????????????????????????????????????????????????????????????????????????????????e?" in my working directory.

Mirror server with JupyterHUB 0.5.0 does not have this issue.

Please see details on http://stackoverflow.com/questions/37808659/pyspark-kernel-for-jupyterhub-creates-file-e-in-working-directory

Carreau · 2016-06-14T21:33:10Z

That seem like a pyspark kernel issue... and I'm not sure which one... this one?

LMtx · 2016-06-15T12:52:55Z

But I do not use the Apache Toree kernel. I have modified the Python2.7 kernel in following way:

{
 "display_name": "pySpark (Spark 1.5.0)",
 "language": "python",
 "argv": [
  "/usr/bin/python2.7",
  "-m",
  "ipykernel",
  "-f",
  "{connection_file}"
 ],
 "env": {
  "PYSPARK_PYTHON": "python2.7",
  "SPARK_HOME": "/opt/cloudera/parcels/CDH/lib/spark",
  "PYTHONPATH": "/opt/cloudera/parcels/CDH/lib/spark/python/:/opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.8.2.1-src.zip",
  "PYTHONSTARTUP": "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/shell.py",
  "PYSPARK_SUBMIT_ARGS": "--master yarn-client pyspark-shell"
 }
}

This kernel is the same on both servers and with JupyterHUB 0.5.0 everything works just fine.

I updated the stackoverflow question - please check the code I found.

Please find attached shell.py - maybe it could help in further investigation.

shell.zip

Thank you for your time.

takluyver · 2016-06-15T13:10:22Z

That sounds like something has used binary data as a filename. I'm not sure what might be doing that - I can't think of anything in our code that would.

minrk · 2016-06-16T11:23:28Z

@LMtx what happens if you leave out the PYTHONSTARTUP env? It's likely to be something in the spark startup that's doing it. Can you compare the os.environ dict when run in the version that works and the version that doesn't?

Dom-nik · 2016-06-16T15:26:37Z

Hello,
I'm working with LMtx on this and I tried to run the experiment that you outlined: I removed the PYTHONSTARTUP variable from kernel.json, restarted Jupyter and printed os.environ on both instances.

The only difference I found was:
'PYTHONSTARTUP': '/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/shell.py'

If I don't remove PYTHONSTARTUP, I'm not able to run print os.environ.
The startup script is the same on both instances, as the underlying Cloudera distribution is the same.

takluyver · 2016-06-16T15:37:01Z

Does it write the same file every time, or a different one? If it's the same, try making it read-only - maybe you'll get a failure that shows what's trying to open it.

LMtx · 2016-06-17T08:06:58Z

Yes, it writes the same file every time. Just once it created additional file "????????????????????????????????????????????????????????????????????????????????????????????????????????????????????e**-journal**" with content similar to "???..." file (but I could not reproduce this).

The "???...-journal" file ended with:

end timestamp, num_cmds integer, remark text)9!N¦

That line is missing in the "???..." file.

I removed all permissions for file "???...". When I tried to start the pyspark kernel it stuck in the reboot loop. No additional file similar to "???..." was created.

Please find the attached tail of jupyterhub.log file.
jupyterhub.log.tail.zip

minrk · 2016-06-17T08:56:42Z

Interesting, it looks to be failing to open the IPython history database. Do you have any IPython configuration to set the history file to something in particular?

LMtx · 2016-06-17T09:13:45Z

No, we are using the default settings. The IPython version:

Name: ipykernel
Version: 4.3.1

minrk · 2016-06-17T09:21:06Z

In that case, can you move your IPython directory to a temporary location, so that it starts fresh:

mv ~/.ipython ~/.save_ipython

and launch again?

LMtx · 2016-06-17T09:40:03Z

I moved the directory and started notebook - still the reboot loop :(

LMtx · 2016-06-17T09:46:29Z

The file name "???..." decoded to asci:

<80><80><90><81><81><90><81><90><81><81><81><90><81><80><80><81><90><80><90><81><80><81><81><80><80><81><90><81><90><81><80><81><9f><81><90><81><80><81><80><80><81><81><90><81><80><80><81><80><81>e^A

Maybe that will help?

takluyver · 2016-06-17T09:56:10Z

Try configuring IPython to store history in memory instead of on disk: HistoryAccessor.hist_file=':memory'. You lose persistent history by doing this, but it avoids attempting to open a history database on disk.

LMtx · 2016-06-17T10:03:18Z

Can I reconfigure IPython only for my user - I do not want to change the configuration of JupyterHUB on production server for everyone (at least not yet).

takluyver · 2016-06-17T10:12:08Z

Yes, run ipython profile create as your user, and then edit ~/.ipython/profile_default/ipython_config.py.

LMtx · 2016-06-17T10:38:54Z

PySpark kernel seams to work but it still generates strange files:

filename: <90><80><90><81><81>y<83>羳:

content (plus lots of binary characters):

<82>7tableoutput_historyoutput_history^FCREATE TABLE output_history
(session integer, line integer, output text,
PRIMARY KEY (session, line));^F^F^WO)^A^@indexsqlite_autoindex_output_history_1output_history^G<81>*^C^G^W^[^[^A<82>+tablehistoryhistory^DCREATE TABLE history (session integer, line integer, ]^A<82>mtablesessionssessions^BCREATE TABLE sessions (session integer
primary key autoincrement, start timestamp,

Any more ideas what is going on?

takluyver · 2016-06-17T10:50:14Z

Not really. It looks like part of the history database, but I've no idea why it's getting written to garbage filenames. Maybe your sqlite library or the Python bindings are corrupted? That's a total guess, though. I don't think anyone's ever reported something like that.

takluyver · 2016-06-17T10:51:58Z

For that matter, maybe the filesystem is corrupted so that part of the data that should be in a file is showing up as a filename. Can you fsck it? Still guesswork, though.

LMtx · 2016-06-17T11:00:30Z

That could be the case but I think that one should not run fsck on running system though. Especially on production :/

takluyver · 2016-06-17T11:11:35Z

Yeah, I believe you have to unmount a filesystem to fsck it, so if it's the root fs, that means rebooting.

LMtx · 2016-06-17T12:34:01Z

What would you recommend to totally remove JupyterHUB and IPython (with all configuration files etc.)? I going to take my chance and reinstall Jupyter - maybe it would help somehow.

takluyver · 2016-06-17T13:25:10Z

I have no reason to think that would fix it, but as I don't understand what's gone wrong, I can't rule it out.

LMtx · 2016-06-21T20:31:15Z

Quick update: neither reinstallation of version 0.6.1 nor downgrade to 0.5.0 solved my issue. I am seriously considering bringing down the server to fsck the HDD's.

Any additional thoughts on this subject are very welcome.

takluyver · 2016-06-22T13:06:06Z

The version of Jupyterhub almost certainly won't affect it, because the history database is written by IPython. But I don't think a different version of IPython is likely to fix it either.

LMtx · 2016-06-22T13:38:36Z

We have tried reinstalling the IPython as well. The problem occurs not only for PySpark kernel but also for Python2.7 kernel. Now we verify if Python2.7 on both servers was compiled in the same way - we used the same shell commands but maybe packages installed on both servers vary in a way that impacted the compilation.

LMtx · 2016-06-22T14:48:03Z

It occurred that compilation of Python2.7 on one server was somehow corrupted - "copy/paste" whole installation folder from one machine to another solved our issue. I guess that there are some differences in installed rpm packages on both machines - we are going to investigate this issue but the most urgent problem is fixed.

Thank you for your time @takluyver and @minrk

takluyver · 2016-06-22T14:48:44Z

Great, thanks for letting us know.

LMtx · 2016-07-14T14:49:45Z

Hello again,

It looks like IPython (python2.7 kernel) generates the "???..." files when SQLite3 is installed on the box. Did you encounter any issues regarding this version of SQLite?

takluyver · 2016-07-14T15:53:07Z

SQLite3 has been the standard version of sqlite for many years, as far as I know. I have 3.11.0 and 3.13.0 in different Pythons on my machine. I have never seen an issue similar to this with any version of sqlite.

Carreau added this to the not ipython milestone Jun 14, 2016

minrk mentioned this issue Jun 16, 2016

PySpark kernel for Jupyterhub creates file “???????e?” in working directory jupyterhub/jupyterhub#610

Closed

LMtx closed this as completed Jun 22, 2016

LMtx reopened this Jul 14, 2016

LMtx closed this as completed Mar 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PySpark kernel for Jupyterhub creates file “???????e?” in working directory #9613

PySpark kernel for Jupyterhub creates file “???????e?” in working directory #9613

LMtx commented Jun 14, 2016 •

edited

Loading

Carreau commented Jun 14, 2016

LMtx commented Jun 15, 2016

takluyver commented Jun 15, 2016

minrk commented Jun 16, 2016

Dom-nik commented Jun 16, 2016

takluyver commented Jun 16, 2016

LMtx commented Jun 17, 2016 •

edited

Loading

minrk commented Jun 17, 2016

LMtx commented Jun 17, 2016 •

edited

Loading

minrk commented Jun 17, 2016

LMtx commented Jun 17, 2016

LMtx commented Jun 17, 2016

takluyver commented Jun 17, 2016

LMtx commented Jun 17, 2016

takluyver commented Jun 17, 2016

LMtx commented Jun 17, 2016

takluyver commented Jun 17, 2016

takluyver commented Jun 17, 2016

LMtx commented Jun 17, 2016

takluyver commented Jun 17, 2016

LMtx commented Jun 17, 2016

takluyver commented Jun 17, 2016

LMtx commented Jun 21, 2016

takluyver commented Jun 22, 2016

LMtx commented Jun 22, 2016

LMtx commented Jun 22, 2016

takluyver commented Jun 22, 2016

LMtx commented Jul 14, 2016 •

edited

Loading

takluyver commented Jul 14, 2016

PySpark kernel for Jupyterhub creates file “???????e?” in working directory #9613

PySpark kernel for Jupyterhub creates file “???????e?” in working directory #9613

Comments

LMtx commented Jun 14, 2016 • edited Loading

Carreau commented Jun 14, 2016

LMtx commented Jun 15, 2016

takluyver commented Jun 15, 2016

minrk commented Jun 16, 2016

Dom-nik commented Jun 16, 2016

takluyver commented Jun 16, 2016

LMtx commented Jun 17, 2016 • edited Loading

minrk commented Jun 17, 2016

LMtx commented Jun 17, 2016 • edited Loading

minrk commented Jun 17, 2016

LMtx commented Jun 17, 2016

LMtx commented Jun 17, 2016

takluyver commented Jun 17, 2016

LMtx commented Jun 17, 2016

takluyver commented Jun 17, 2016

LMtx commented Jun 17, 2016

takluyver commented Jun 17, 2016

takluyver commented Jun 17, 2016

LMtx commented Jun 17, 2016

takluyver commented Jun 17, 2016

LMtx commented Jun 17, 2016

takluyver commented Jun 17, 2016

LMtx commented Jun 21, 2016

takluyver commented Jun 22, 2016

LMtx commented Jun 22, 2016

LMtx commented Jun 22, 2016

takluyver commented Jun 22, 2016

LMtx commented Jul 14, 2016 • edited Loading

takluyver commented Jul 14, 2016

LMtx commented Jun 14, 2016 •

edited

Loading

LMtx commented Jun 17, 2016 •

edited

Loading

LMtx commented Jun 17, 2016 •

edited

Loading

LMtx commented Jul 14, 2016 •

edited

Loading