Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readline.set_history_length corrupts history files when used in a libedit build #121160

Open
whitequark opened this issue Jun 29, 2024 · 9 comments
Labels
topic-repl Related to the interactive shell type-bug An unexpected behavior, bug, or error

Comments

@whitequark
Copy link

whitequark commented Jun 29, 2024

Bug report

Bug description:

Since the libedit transition (correction: If an end user switches from a GNU readline build of Python to an editline build of Python), several important functions related to history files are broken.

  1. It is inconvenient enough that history files written by GNU readline based readline module are unreadable and result in a rather confusing OSError with exc.errno == errno.EINVAL (without a corresponding operating system routine that returns such an error), but a workaround intercepting this OSError can be written that converts the old format to the new one. This has been covered before, e.g. in The New REPL Does Not Load My Command History #120766.
  2. More importantly, history files written by libedit based readline module are unreadable by itself when set_history_length is used, as demonstrated by the code below.
  3. While a workaround for that is also possible (also below), as far as I can tell, until 3.13, so until a year from now (?) (whenever Add readline.backend for the backend readline uses #112510 becomes available in a release), your only option is "libedit" in readline.__doc__, which I expect will be what many people will leave in the codebase even once readline.backend becomes available.

Reproducer:

import readline

readline.add_history("foo bar")
readline.add_history("nope nope")
readline.write_history_file("history-works")

# this works:
readline.read_history_file("history-works")

readline.set_history_length(2)
readline.write_history_file("history-breaks")

# this breaks:
readline.read_history_file("history-breaks")

Workaround:

def _is_using_libedit():
    if hasattr(readline, "backend"):
        return readline.backend == "editline"
    else:
        return "libedit" in readline.__doc__

readline.set_history_length(1000)
if _is_using_libedit():
    readline.replace_history_item(
        max(0, readline.get_current_history_length() - readline.get_history_length()),
        "_HiStOrY_V2_")

CPython versions tested on:

3.12

Operating systems tested on:

Linux

Linked PRs

@whitequark whitequark added the type-bug An unexpected behavior, bug, or error label Jun 29, 2024
@whitequark
Copy link
Author

whitequark commented Jun 29, 2024

For posterity, here is the workaround I came up with for the incompatible format of the history files:

import re
import os
import errno
import logging
import readline

logger = logging.getLogger(__loader__.name)
history_filename = os.path.expanduser("~/.history")

def _is_using_libedit():
    if hasattr(readline, "backend"):
        return readline.backend == "editline"
    else:
        return "libedit" in readline.__doc__

# to load:
try:
    readline.read_history_file(history_filename)
except OSError as exc:
    if exc.errno == errno.EINVAL: # (screaming internally)
        assert _is_using_libedit()
        with open(history_filename, "r") as f:
            history = f.readlines()
        assert history[:1] != ["_HiStOrY_V2_"], \
            "History file has already been converted"
        assert not history or any(" " in line for line in history), \
            "Pre-conversion history file is expected to contain space characters"
        backup_filename = f"{history_filename}~"
        if not os.path.exists(backup_filename):
            with open(backup_filename, "w") as f:
                f.writelines(history)
        else:
            logger.warning(f"history backup {backup_filename} exists, leaving it intact")
        new_filename = f"{history_filename}.new"
        with open(f"{history_filename}.new", "w") as f:
            f.write("_HiStOrY_V2_\n")
            f.writelines([
                re.sub(r"[ \\]", lambda m: f"\\{ord(m[0]):03o}", line)
                for line in history
            ])
        os.rename(new_filename, history_filename)
        logger.warning(f"this Python distribution uses libedit instead of GNU readline, "
                        f"and their history file formats are not compatible")
        logger.warning(f"REPL history file has been converted from the GNU readline format "
                        f"to the libedit format; backup saved to {backup_filename}")
        readline.read_history_file(history_filename)

# to save:
readline.set_history_length(1000)
if _is_using_libedit():
    readline.replace_history_item(
        max(0, readline.get_current_history_length() - readline.get_history_length()),
        "_HiStOrY_V2_")
    readline.write_history_file(history_filename)

In my view, the readline module should not be presenting the same interface for GNU readline and libedit unless it takes measures such as above to convert persisted data. The workaround is very complex and despite my best efforts probably still fragile in some edge cases, which is why I hope that it won't live on by being copied into every codebase that attempts to preserve persisted history.

@encukou
Copy link
Member

encukou commented Jul 1, 2024

Hello @whitequark! Thanks for reporting this.

I'm not sure I know what you mean by “libedit transition”? CPython supports building with libedit, and the support was improved gradually over the years, but I'm not aware of a “transition”.

(1.) This is a tough one :(

If we go for transparently converting history files, we also need a way to get back, or at least ensure that libedit's _HiStOrY_V2_ format is compatible with GNU readline (and always will be).

The workaround looks like a pain to maintain in the long term -- poking too deep into internals. I couldn't find much documentation on the _HiStOrY_V2_ format. Reading EINVAL as “old file format” looks pretty error-prone. The _HiStOrY_V2_ token looks like something that could change in the future -- this code could, for example, start converting a v3 file back to v2 in a few years.

IMO (based on my current understanding), I think this kind of transition code should be in libedit. If it's not there as a function we can call, CPython shouldn't try to do more than report the error it gets from libedit.

(2.)

Hmm. Perhaps history_truncate_file is incompatible with the _HiStOrY_V2_ format? AFAIK libedit provides its own version; maybe it's not called? Or not called correctly?

(3.)

As you say, I'm afraid _is_using_libedit is here to stay. Sorry for the old mistake.
Backporting it to 3.12 would speed up the transition to readline.backend by as single year. In the long run, that's not much.
(Not to mention we could only backport it to future 3.12 releases; having a feature that works in 3.12.5 but not 3.12.4 is its own kind of pain.)

@encukou
Copy link
Member

encukou commented Jul 1, 2024

I'm not sure I know what you mean by “libedit transition”?

Oh, I see in the other issue:

It seems that several popular builds of Python (from what I can tell, at least Homebrew and python-build-standalone) have switched to libedit and projects using the builtin readline module functionality

Yeah, that sounds like a bad move :/
Perhaps CPython docs should spend more words saying that the two backends aren't really compatible.

@whitequark
Copy link
Author

whitequark commented Jul 1, 2024

I'm not sure I know what you mean by “libedit transition”?

So what happened is that I ran into this issue headfirst with PDM's cpython builds (which are from python-build-standalone project, where since 3.12 it uses libedit for reasons unknown to me), it had disrupted my workflow in the middle of something else, and I attempted to research something I had absolutely no context for. I found some issues where Homebrew users hit the same issue, with roughly the same version range, where also the Python build was moved to libedit.

From this, the knowledge that people sometimes avoid GNU packages due to their licensing, and the fact that it seemed unthinkable to me (after noticing how incompatible the implementations are) that someone would be willing to disrupt people's workflows this much without a good reason I had concluded that there is a concerted effort behind, and the effort is to move to libedit while doing away with GNU readline entirely.

As it's clear to me now, it is wrong, and this is just an issue that fell through the cracks that happened to be very visibly disruptive to end users, which got triggered by a few seemingly independent packager decisions. Sometimes it really is just a coincidence!

I'll edit my original post.

@whitequark
Copy link
Author

whitequark commented Jul 1, 2024

(1.) This is a tough one :(

If we go for transparently converting history files, we also need a way to get back, or at least ensure that libedit's _HiStOrY_V2_ format is compatible with GNU readline (and always will be).

I could definitely implement a backward transformation, and will do it the moment I encounter the reverse issue (which is probably soon, unfortunately).

The workaround looks like a pain to maintain in the long term -- poking too deep into internals. I couldn't find much documentation on the _HiStOrY_V2_ format. Reading EINVAL as “old file format” looks pretty error-prone. The _HiStOrY_V2_ token looks like something that could change in the future -- this code could, for example, start converting a v3 file back to v2 in a few years.

I completely agree. (But couldn't the same be said about the dual readline/editline build in first place?)

In any case, it seems to me to boil down to a question of who spends the effort to work around it, and it seems more useful to coordinate 'more upstream' in the CPython codebase.

IMO (based on my current understanding), I think this kind of transition code should be in libedit. If it's not there as a function we can call, CPython shouldn't try to do more than report the error it gets from libedit.

I don't disagree but isn't libedit an essentially opaque package from some BSD distribution where I would have to chase a patch into it for who-knows-how-long, and then spend potentially years waiting for people to update their local copies of it that they build CPython with?

(2.)

Hmm. Perhaps history_truncate_file is incompatible with the _HiStOrY_V2_ format? AFAIK libedit provides its own version; maybe it's not called? Or not called correctly?

I was initially under impression that history_truncate_file was a function in CPython since it wasn't prefixed with rl_ or readline_ or something like that, but now that I see it's not (I guess it's too ancient for consistent namespacing) I'm really confused. Maybe it's a bug in libedit after all?..

(3.)

As you say, I'm afraid _is_using_libedit is here to stay. Sorry for the old mistake. Backporting it to 3.12 would speed up the transition to readline.backend by as single year. In the long run, that's not much. (Not to mention we could only backport it to future 3.12 releases; having a feature that works in 3.12.5 but not 3.12.4 is its own kind of pain.)

Yeah, such things happen. I acknowledge both of these things. I have to work around already around some issues that only affect 3.10.0, 3.10.1, 3.10.2 so I know the pain...

encukou added a commit to encukou/cpython that referenced this issue Jul 3, 2024
encukou added a commit to encukou/cpython that referenced this issue Jul 3, 2024
… formats.

This is not something we can do too much about, without help from the
underlying libraries.
@encukou
Copy link
Member

encukou commented Jul 3, 2024

We're missing a test for set_history_length; I'll add it.
If the test passes CPython CI and is merged, it should also warn any redistributors if the configuration isn't working. Especially for ones who patch Python, it's probably the best course of action.

I also sent a PR for the docs.
I'm not sure what kind of support we can give here, but I'm afraid transparently converting undocumented file formats isn't it.


Isn't libedit an essentially opaque package from some BSD distribution where I would have to chase a patch into it for who-knows-how-long, and then spend potentially years waiting for people to update their local copies of it that they build CPython with?

That's ... a reason for you (or your redistributor) to avoid that library, or patch it, rather than fix its bugs elsewhere. (But note that I'm not certain if the bug is there; it could be in CPython or in build settings.)

I was initially under impression that history_truncate_file was a function in CPython

Yeah, it seems GNU history is a separate library. That itself might cause some of the trouble.

@whitequark
Copy link
Author

If the test passes CPython CI and is merged, it should also warn any redistributors if the configuration isn't working. Especially for ones who patch Python, it's probably the best course of action.

Oh, that's a good call. I think that would be enough from my perspective.

@whitequark
Copy link
Author

I also sent a PR for the docs.

Thanks for the two PRs!

miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jul 16, 2024
…onGH-121326)

(cherry picked from commit 263c7e6)

Co-authored-by: Petr Viktorin <encukou@gmail.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jul 16, 2024
…onGH-121326)

(cherry picked from commit 263c7e6)

Co-authored-by: Petr Viktorin <encukou@gmail.com>
encukou added a commit that referenced this issue Jul 16, 2024
…121326) (GH-121856)


(cherry picked from commit 263c7e6)

Co-authored-by: Petr Viktorin <encukou@gmail.com>
encukou added a commit that referenced this issue Jul 16, 2024
…121326) (GH-121857)

(cherry picked from commit 263c7e6)

Co-authored-by: Petr Viktorin <encukou@gmail.com>
@encukou
Copy link
Member

encukou commented Jul 18, 2024

Reported to python-build-standalone: indygreg/python-build-standalone#281

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-repl Related to the interactive shell type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants