Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQLite history #282

Merged
merged 42 commits into from Mar 24, 2011
Merged

SQLite history #282

merged 42 commits into from Mar 24, 2011

Conversation

takluyver
Copy link
Member

This was discussed previously on the mailing list. The code will probably still need some refining, but I think it's working well enough to put it forward.

There are two major changes involved:

History is now stored on disk as an SQLite database. This replaces both the JSON history and the "shadow history" database. By default, the input is stored as each line is entered, but (as Fernando suggested), there is a configurable attribute, db_cache_size, which lets you determine how often (in number of commands) it will write to disk. Very large values will cause all input to be written when IPython quits. A separate configurable attribute, db_log_output, causes the plaintext repr of output to be written to a separate table in the file (Robert's suggestion).

Secondly, whereas history was previously treated as one long stream of commands, it is now stored by "sessions", i.e. runs of IPython. So, if you are particularly fond of the fifth command you gave IPython in one session, the next time you start IPython, you can refer to the fifth command of the last session (the syntax for magic commands is ~1/5). Sessions are consecutively numbered from when they start, so interleaved commands from two shells open at the same time will be kept separate.

This requires somewhat more complicated interfaces. There are now four main interfaces to retrieve history from HistoryManager:

  • get_history: Retrieves specified ranges of history, either from the database, or the current session.
  • get_hist_from_rangestr: Is a convenience method for magic functions, which accepts the "~1/3-6 12" style range specifications.
  • get_hist_tail: Gets the last n items from the database, ordered by session and line. Used to populate readline (and the equivalent in the Qt console).
  • get_hist_search: Searches the database using glob style matching (i.e. wildcards, not regexen).

All of these return iterators over 3-tuples, (session, lineno, item). If asked for output, item will be a 2-tuple of (input, output).

I've also given %hist the same range specification parsing as %save, %macro and %edit (it would previously only accept one or two numbers separated by a space). A single numeric argument, such as %hist 10 previously fetched the last 10 lines; now it gets the 10th line of the current session. So I've added a new option, -l (for last), which works with a single numeric argument. This is marginally less convenient if that's what you want, but I think the consistency among magic functions retrieving history is a good thing.

@minrk
Copy link
Member

minrk commented Feb 27, 2011

This looks great. I'll start exploring the code.

Thanks for the work.

@takluyver
Copy link
Member Author

One thought I've not yet implemented: A size check/trimmer operation that can run at startup. This isn't a high priority (I've got some 2700 commands stored now, taking only 152 KB), but if the file keeps growing, eventually we're going to want some way to reduce the size that's better than just deleting it.

@takluyver
Copy link
Member Author

All of the main get_hist* methods to retrieve history can now retrieve output history if it's available and requested (get_hist_search was the last one to be added). N.B. The search is always done on the input, even when output is being retrieved as well.

I've also simplified some code, so that the %history magic command has a standard interface for the three ways to get history, and added a couple of unit tests.

@minrk
Copy link
Member

minrk commented Mar 4, 2011

Just a note to help prevent dangling fixed Issues: Merging this will close #284

@fperez
Copy link
Member

fperez commented Mar 4, 2011

I'm going to start playing with this as well over the next few days, after I finish my conference madness. Many thanks for this work, Thomas!

@@ -1551,7 +1537,7 @@ Currently the magic system has the following functions:\n"""

stats = None
try:
self.shell.save_history()
#self.shell.save_history()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the reason for calling save_history and reload_history around this block (part of magic_run)? I've just commented them out for now in case it was significant. We no longer have those methods or direct equivalents, but I don't know if the code needs something else to achieve the same thing. Superficially, %run appears to work without them.

@minrk
Copy link
Member

minrk commented Mar 6, 2011

I notice that commands enter the history before history requests are processed. Is it possible to not do this?

e.g.:

  • In[-1] will always be 'In[-1]'
  • and %hist -l <n> will always return the last n-1 lines, plus '%hist -l n'

Probably not a big deal, but it would be nice for history to not include the present if it's straightforward to do so.

@takluyver
Copy link
Member Author

All commands are stored before being run at present, and I've not investigated the methods in interactiveshell which call store_inputs. I'll look into it.

(Aside: I've realised %hist -l has a corner case - if database caching is enabled, you won't see lines until they're flushed to the database.)

@takluyver
Copy link
Member Author

Hmmm, I've remembered that part of the motivation for storing commands before running them was that if IPython died during a command (e.g. if an SSH connection failed in a long-running command, or if you find a command that can crash IPython itself), the latest command could be retrieved.

I agree that it's more intuitive for In[-1] to be the previous line, rather than the current one. But I'm not sure whether that's more important than the ability to save commands instantly. I'd rather avoid working around it by turning In into an object that's almost a list but behaves differently in unexpected ways. I think I'll sleep on it, but feel free to add your thoughts in the meantime.

@takluyver
Copy link
Member Author

At the moment, I'm inclined to leave the history as is, so commands are stored as the "last" command before the history is retrieved. It seems to have been this way since at least 0.10, and I can't see a neat way to change it. Let me know if you think it's more important than that.

@minrk
Copy link
Member

minrk commented Mar 7, 2011

Fair enough. It's definitely much more important for commands to persist before execution than it is for edge cases to be neater. I just thought there might be a clean solution, but don't bother if it's not obvious. I don't imagine that users actually use the history for recent commands that often, using up-arrow or ctrl-r, etc. instead.

msg = self.session.send(self.reply_socket, 'history_reply',
def history_tail_request(self, ident, parent):
# parent['content'] should contain keys "n", "raw" and "output"
hist = self.shell.history_manager.get_hist_tail(**parent['content'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that parent['content'] will have unicode keys here, due to json unpacking. Python 2.6 doesn't seem to allow this, and gives TypeError: get_hist_tail() keywords must be strings

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's interesting - I can pass unicode arguments to a function with no problems (Python 2.6.6, Ubuntu). I'll go back to pulling them out as separate variables.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, further digging shows it was fixed in 2.6.5, I had just assumed it was 2.7. In any case, it fails on 2.6.1.

@minrk
Copy link
Member

minrk commented Mar 7, 2011

Does it make sense to store the date/time of sessions and/or commands? That might make later pruning and/or advanced searching efforts easier.

@takluyver
Copy link
Member Author

It's an interesting idea, and I can envisage slick timelines of my IPython usage. But I don't want to add further complexity without a good reason, and I'm not sure how useful it would really be (I don't think I've ever remembered a command I used by date; YMMV). We've got this far by simply recording a big list of commands. In fact, I doubt even the session interface will see a huge amount of use. It really grew out of a desire to do something sensible with history if two IPython shells are running at once (in the current setup, whichever one gets the last save_history will wipe away commands from the other).

For pruning, I envisage something like "when history db exceeds 500 MB, trim the first 20% of sessions". (At present, that would take something like 9 million commands before needing pruning!)

@minrk
Copy link
Member

minrk commented Mar 7, 2011

That makes sense. Are the tables organized in such a way that it would be easy to attach a single date (or two: for open/close) to each session? I think that has more value-per-data than a stamp for each command. Though if you want a close-time, that won't be easy to handle IPython crashing unless you are timestamping each command, which gives you a deducible open/close times from the first and last commands.

I can certainly imagine that one might want to load up the session for last Tuesday, but not want to sift through integer session ids to find the right one.

I'm just throwing out ideas to consider, you don't have to take them. I know the model for this is to have the same functionality on a database, which has obvious wins, but when you have a potentially large, efficiently searchable history, I do tend to think of fancier ways to use it. But maybe this stuff really belongs in the notebook interface.

@takluyver
Copy link
Member Author

I quite like the idea of loading up last Tuesday's session, yes. Especially with something like the qtconsole, it would be simple to have a little table view of sessions (perhaps with optional names/remarks).

At present, sessions are just a number in the session column for individual inputs/outputs. But it would be easy enough to add an extra table for sessions. Time would be automatically written when the session started, a remark could be added by the user, and the closing time and number of commands would be saved on a normal exit (but code shouldn't expect them to be there in every case).

@takluyver
Copy link
Member Author

Have a look at this. If it has trouble, you might need to delete $IPYTHON_DIR/history.sqlite, as the database is slightly different.

@fperez
Copy link
Member

fperez commented Mar 12, 2011

Hey Thomas, the branch currently doesn't merge onto trunk without conflicts:

(master)amirbar[ipython]> git checkout -b takluyver-sqlite-history master
Switched to a new branch 'takluyver-sqlite-history'
(takluyver-sqlite-history)amirbar[ipython]> git pull https://github.com/takluyver/ipython.git sqlite-history
From https://github.com/takluyver/ipython
 * branch            sqlite-history -> FETCH_HEAD
Auto-merging IPython/core/displayhook.py
Auto-merging IPython/core/history.py
CONFLICT (content): Merge conflict in IPython/core/history.py
Auto-merging IPython/core/interactiveshell.py
Auto-merging IPython/core/magic.py
Auto-merging IPython/core/tests/test_magic.py
Auto-merging IPython/frontend/qt/console/ipython_widget.py
Auto-merging IPython/frontend/terminal/interactiveshell.py
Auto-merging IPython/zmq/ipkernel.py
Auto-merging IPython/zmq/kernelmanager.py
Automatic merge failed; fix conflicts and then commit the result.

Any idea why? Might be worth fixing that so it applies cleanly before further review...

@takluyver
Copy link
Member Author

Nothing immediately springs to mind, but I'll remember to rebase it once I've sorted out the macro question.

@fperez
Copy link
Member

fperez commented Mar 13, 2011

Ok, thanks!

@takluyver
Copy link
Member Author

Rebased on trunk, and merged in some code for expanding macros in prefilter (work in progress).

@fperez
Copy link
Member

fperez commented Mar 13, 2011

Hey, you may want to rebase again quickly, just so you get a proper passing test suite. Trunk had inadvertently gone bad with a test suite that hangs because of a %cpaste example, I found it with bisection:

((a2d9aa2...)|BISECTING)amirbar[ipython]> git bisect bad
a2d9aa20c0f1cf170cc16a7e8eaa9289f9af1e29 is the first bad commit
commit a2d9aa20c0f1cf170cc16a7e8eaa9289f9af1e29
Author: Sathesh Chandra 
Date:   Tue Feb 15 19:27:38 2011 +0000

    Add example for %cpaste

I just pushed a small fix so you can rebase. That way we can evaluate this on top of a properly running test suite. This is complex enough code that we shouldn't review it without carefully testing it.

@fperez
Copy link
Member

fperez commented Mar 13, 2011

BTW, I'm not sure the timestamping is such a bad idea. Matlab does show old history timestamped; it's a small amount of data and it could come in handy in some scenarios...

Now that we have this rich database, we might as well store all the relevant information so that people can do interesting things with it, no? You could for example search for commands you typed on a given day while working on something, just like you search gmail with date controls...

I can easily see once this information is in and we're happy with the database, rich frontends like the qt/web ones building a little history navigator with all kinds of doodads :)

@takluyver
Copy link
Member Author

Rerebased.

By timestamping, do you mean timestamping each individual command? I've added timestamps now for the beginning and end of the session, as Min suggested. It's easy enough to add a timestamp per input, if we want that.

…anded and executed (+ unit test). History is now stored after all input transformations have been done.
@takluyver
Copy link
Member Author

OK, macros are now working. History is only stored after all the input transformations have been done. I've folded run_one_block and run_single_line into run_cell, so we're down to just three run_* methods (cell, source, code), which follow a simple hierarchy.

This will make a small practical difference - previously, a single line block at the end of a multi-block cell would be dynamically transformed (macros, automagic, and so on). Now, only simple single line cells get that treatment. The last part of a multi-block cell can still get compiled in "single" mode, so it produces output, but the dynamic transforms won't happen.

@takluyver
Copy link
Member Author

I've reworked the magic commands as I suggested on the mailing list, so %rep (=%recall) brings things to the next input prompt, and %rerun executes code.

…raw IPython code outside of the sequence of commands making the session. This also doesn't incremement the execution_count.
@fperez
Copy link
Member

fperez commented Mar 24, 2011

Completed code review during sage days 29 sprint. Upon final edits arising from code review, merge away.

Excellent work across the board, many thanks!

@fperez
Copy link
Member

fperez commented Mar 24, 2011

No description provided.

@takluyver
Copy link
Member Author

Dealt with the comments from today. It will move a corrupt database aside on startup, and if there's a collision on the session/line numbers, it will get a new session number from the database (and not crash).

Fernando, do you want to give this a final review, or shall I go ahead and merge it straight away?

@fperez
Copy link
Member

fperez commented Mar 24, 2011

Go for it, I think we did a very thorough review today. And there's always room to improve things later, so we don't need to hold up for perfection at each step :)

Thanks!

…licate session/line numbers are handled correctly (don't crash, get new session).
@takluyver takluyver merged commit 28fa7ad into ipython:master Mar 24, 2011
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants