History database locking fail #2845

Open
vbraun opened this Issue Jan 25, 2013 · 18 comments

Projects

None yet

4 participants

@vbraun
Contributor
vbraun commented Jan 25, 2013

We are trying to use IPython-0.13.1 in Sage but get doctesting failures with multiple concurrent processes. We updated history.py to commit a7cc583 but we still have trouble.

Crash log is here:

https://gist.github.com/4634931

The problem is apparently that IPython startup still dies if the history database happens to be locked. Ideally, IPython would catch that and revert to in-memory history only (:memory:) with the option of merging it later to the disk database.

@takluyver
Member

If the database is locked, I think sqlite should simply wait until it is unlocked, and then let the program continue. So it should only throw an error if it reaches the timeout (default 5 seconds) while waiting for the lock. (Reference).

One way that the database can stay locked for a long time is if you call a history search method (search, get_tail, get_range), and don't exhaust the iterator. That holds a read cursor open on the database (SQLite allows simultaneous reads, but not writes).

If this is for a test framework, you might want to set HistoryManager.hist_file = ':memory:' in configuration. That gives you a separate database for each process. It might decrease the test time as well.

@vbraun
Contributor
vbraun commented Jan 25, 2013

Ah yes, we just implemented --HistoryManager.hist_file=:memory:. Great minds think alike ;-)

But it would still be nice to have a bomb-proof startup sequence even if a parallel process does something stupid with the history db.

@takluyver
Member

I agree. We can probably just print a warning and disable the SQLite history.

@GaelVaroquaux
Contributor

I am hitting the same problem: IPython is no longer starting with a similar traceback. My home folder is not on an NFS drive (and besides, I believe that it would be a good thing is IPython worked reliably on NFS drives).

This bug is somewhat of a problem: it puts the user in a situation in which IPython stops working on a given box.

A simple solution, that would not be fully solving the problem, but making the situation less bad, would be to give in the error message what database file is lock, and how to remove it. I don't care that much about loosing history, but I want my IPython back :)

@takluyver
Member

This bug is about starting many IPython instances, and getting a timeout as they all try to access the history. That should never affect normal users starting IPython manually. @GaelVaroquaux , it sounds more like you're talking about a corrupt history database. We should already be handling that case - see #2389 for instance. Is there some error from corruption that we're not catching yet?

@takluyver
Member

See also #2431, which is another failure on startup.

@GaelVaroquaux
Contributor

@GaelVaroquaux , it sounds more like you're talking about a corrupt history database

Indeed. Sorry for pitching in on the wrong issue. The traceback looked
very similar.

Is there some error from corruption that we're not catching yet?

I upgraded to master, and still had the error. I solved it by removing
the .ipython/profile_default/.sqlite (which I initialy got wrong, and
tried .ipython/db).

Do you want me to open a new issue with the traceback (I still have it in
a terminal)?

@GaelVaroquaux
Contributor

See also #2431, which is another failure on startup.

No, that's not the one :). My traceback ends with:

  File "/usr/lib/python2.7/dist-packages/IPython/core/history.py", line
427, in new_session
    NULL, "") """, (datetime.datetime.now(),))
OperationalError: database is locked
@takluyver
Member

OK, feel free to start a new issue for it. If you've got a copy of the history.sqlite file that caused it, that would be useful too.

@GaelVaroquaux
Contributor

OK, feel free to start a new issue for it.

Ooops. Looking at my traceback, I just realized that I was using the
system IPython (version 0.13.2), not my git clone. My statement about the
problem persisting on latest IPython was wrong. Do you still want me to
open a new issue?

If you've got a copy of the history.sqlite file that caused it, that would be useful too.

I am sorry, I removed it.

@takluyver
Member

Probably not worth the bother unless we have a way to replicate it in master. There won't be any more 0.13.x releases. But paste a copy of the full traceback somewhere in case we come back to it later.

@GaelVaroquaux
Contributor

Here is the traceback! Sorry for the noise.

$ ipython
Traceback (most recent call last):
  File "/usr/bin/ipython", line 7, in 
    launch_new_instance()
  File
"/usr/lib/python2.7/dist-packages/IPython/frontend/terminal/ipapp.py",
line 389, in launch_new_instance
    app.initialize()
  File "", line 2, in initialize
  File "/usr/lib/python2.7/dist-packages/IPython/config/application.py",
line 84, in catch_config_error
    return method(app, *args, **kwargs)
  File
"/usr/lib/python2.7/dist-packages/IPython/frontend/terminal/ipapp.py",
line 324, in initialize
    self.init_shell()
  File
"/usr/lib/python2.7/dist-packages/IPython/frontend/terminal/ipapp.py",
line 340, in init_shell
    ipython_dir=self.ipython_dir)
  File "/usr/lib/python2.7/dist-packages/IPython/config/configurable.py",
line 318, in instance
    inst = cls(*args, **kwargs)
  File
"/usr/lib/python2.7/dist-packages/IPython/frontend/terminal/interactiveshell.py",
line 360, in __init__
    user_module=user_module, custom_exceptions=custom_exceptions
  File
"/usr/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line
436, in __init__
    self.init_history()
  File
"/usr/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line
1489, in init_history
    self.history_manager = HistoryManager(shell=self, config=self.config)
  File "/usr/lib/python2.7/dist-packages/IPython/core/history.py", line
409, in __init__
    self.new_session()
  File "", line 2, in new_session
  File "/usr/lib/python2.7/dist-packages/IPython/core/history.py", line
60, in needs_sqlite
    return f(*a,**kw)
  File "/usr/lib/python2.7/dist-packages/IPython/core/history.py", line
427, in new_session
    NULL, "") """, (datetime.datetime.now(),))
OperationalError: database is locked
If you suspect this is an IPython bug, please report it at:
    https://github.com/ipython/ipython/issues
or send an email to the mailing list at ipython-dev@scipy.org
You can print a more detailed traceback right now with "%tb", or use
"%debug"
to interactively debug it.
Extra-detailed tracebacks for bug-reporting purposes can be enabled via:
    c.Application.verbose_crash=True
@vbraun
Contributor
vbraun commented May 17, 2014

We removed our own patch in Sage with the update to IPython 1.2.1 and there are still random test failures:

File "src/sage/misc/displayhook.py", line 23, in sage.misc.displayhook
Failed example:
    shell = get_test_shell()
Expected nothing
Got:
    ERROR! History file wasn't a valid SQLite database. It was moved to /home/buildslave-sage/slave/sage_git/dot_sage/ipython-1.2.1/profile_default/history-corrupt.sqlite and a new file created.
    ERROR! History file wasn't a valid SQLite database. It was moved to /home/buildslave-sage/slave/sage_git/dot_sage/ipython-1.2.1/profile_default/history-corrupt.sqlite and a new file created.

Browsing history.py a bit I don't see any reasonable locking (also in IPython 2.0). There is threading.lock, but that of course doesn't work across processes. Locking should either be on the DB level, Unix flock, or use atomic filesystem changes. Or accept that you don't have a reasonable locking and ignore the inevitable errors.

@takluyver
Member

Sqlite implements locking itself, and I don't think we should try to add another locking later on top of that. The locking is known not to work reliably on NFS, though - is that relevant for sage?

@vbraun
Contributor
vbraun commented May 17, 2014

The buildbot in question doesn't use NFS, it runs off a local harddisk (btrfs).

I don't see any pragmas or locking_modes set, so I guess sqlite3 operates in the default mode. Afaik you have to handle locked databases yourself and the DatabaseError caught might very well have been an OperationalError (better logging of the error would be nice, too).

@takluyver
Member

As I understand it, it should only be locked briefly, and other processes trying to write simultaneously should just wait until the lock is released to carry on. Maybe there's an issue on btrfs with the locking sqlite tries to do.

@vbraun
Contributor
vbraun commented May 18, 2014

IMHO this is highly unlikely to be a btrfs bug, sqllite is a common test/benchmark. More likely we are hitting the 5 second timeout in sqllite's locking, running doctests in parallel at low priority. Still, randomly injecting error messages into the output just because the system load is high is bad in my book.

@vbraun
Contributor
vbraun commented Nov 17, 2014

For the record, this still happens with IPython 2.3 occasionally, for example on our MacOSX 10.10 buildbot:

Traceback (most recent call last):
  File "/Users/buildslave-sage/slave/sage_git/build/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 2151, in __call__
    atexit._run_exitfuncs()
  File "/Users/buildslave-sage/slave/sage_git/build/local/lib/python/atexit.py", line 24, in _run_exitfuncs
    func(*targs, **kargs)
  File "/Users/buildslave-sage/slave/sage_git/build/local/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 3190, in atexit_operations
    self.history_manager.end_session()
  File "/Users/buildslave-sage/slave/sage_git/build/local/lib/python2.7/site-packages/IPython/core/history.py", line 525, in end_session
    len(self.input_hist_parsed)-1, self.session_number))
OperationalError: disk I/O error
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment