Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improvements for linecache #46049

Closed
umaxx mannequin opened this issue Dec 30, 2007 · 7 comments
Closed

improvements for linecache #46049

umaxx mannequin opened this issue Dec 30, 2007 · 7 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@umaxx
Copy link
Mannequin

umaxx mannequin commented Dec 30, 2007

BPO 1708
Nosy @pitrou, @bitdancer
Files
  • linecache.py.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2010-10-12.15:16:11.722>
    created_at = <Date 2007-12-30.14:18:38.079>
    labels = ['type-bug', 'library']
    title = 'improvements for linecache'
    updated_at = <Date 2010-10-12.15:16:11.651>
    user = 'https://bugs.python.org/umaxx'

    bugs.python.org fields:

    activity = <Date 2010-10-12.15:16:11.651>
    actor = 'r.david.murray'
    assignee = 'none'
    closed = True
    closed_date = <Date 2010-10-12.15:16:11.722>
    closer = 'r.david.murray'
    components = ['Library (Lib)']
    creation = <Date 2007-12-30.14:18:38.079>
    creator = 'umaxx'
    dependencies = []
    files = ['9036']
    hgrepos = []
    issue_num = 1708
    keywords = ['patch']
    message_count = 7.0
    messages = ['59041', '59106', '79284', '79288', '116823', '118174', '118428']
    nosy_count = 3.0
    nosy_names = ['pitrou', 'umaxx', 'r.david.murray']
    pr_nums = []
    priority = 'low'
    resolution = 'later'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue1708'
    versions = ['Python 3.0']

    @umaxx
    Copy link
    Mannequin Author

    umaxx mannequin commented Dec 30, 2007

    here comes a simple patch for linecache core module, which does the
    following:

    • remove double comment
    • instead of adding all lines with readlines() to the cache, just add
      seek points for every line
    • return lines from cached seek-points instead directly from dict-cache

    advantages of this patch:

    • reading lines from very big files (>1GB) is no problem anymore
    • linecache can handle a large number of large files now
    • updatecache() is faster now because "for line in fp:" is faster than
      readlines()

    disadvantages:

    • reading a single line from cache will be a little bit slower, then
      before because of extra open() call to the file

    summary:

    • this diff presents a different caching approach which is able to
      handle a lot of large files too

    __future__-work:

    • the code is ugly and unstructured, someone needs to beautify it
    • an extra function: get_list_of_lines_from_list_of_linenumbers() would
      be nice to have
    • test-cases for cache-consistence would be nice to have

    @umaxx umaxx mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Dec 30, 2007
    @gvanrossum
    Copy link
    Member

    I'll look at this when I have time. If you find someone else interested
    in reviewing, please give them the patch!

    @gvanrossum gvanrossum self-assigned this Jan 2, 2008
    @gvanrossum gvanrossum removed their assignment Jan 6, 2009
    @pitrou
    Copy link
    Member

    pitrou commented Jan 6, 2009

    Looking at the patch, the recorded seek points will probably be wrong if
    some newlines were translated (e.g. '\r\n' -> '\n') when reading the file.

    I'm also not sure not what the use case for very big files is. linecache
    is primarily used for printing tracebacks, the API isn't really
    general-purpose.

    @umaxx
    Copy link
    Mannequin Author

    umaxx mannequin commented Jan 6, 2009

    Looking at the patch, the recorded seek points will probably be wrong if
    some newlines were translated (e.g. '\r\n' -> '\n') when reading the file.

    ack, this could be a problem.

    I'm also not sure not what the use case for very big files is.

    this is easy to answer: i used it for example for parsing (still
    growing) big log files from mail servers. parsing the whole file first
    time, and than later: starting from line xyz+1 (xyz was the last line
    recorded after first time parsing) *without* parsing the whole file
    again. especially very useful for growing log files >1GB

    just try to get linenumber 1234567 from a 2,3GB log file with the
    current linecache implementation :)
    the main idea behind the patch is to cache the seek points to save a lot
    of time on big files.

    linecache is primarily used for printing tracebacks, the API
    isn't really general-purpose.

    i know :)

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Sep 18, 2010

    @umaxx are you interested in taking this forward?

    @umaxx
    Copy link
    Mannequin Author

    umaxx mannequin commented Oct 8, 2010

    @breamoreboy: what do you man by taking this forward?

    The patch is there. Since three years now, no one else seems to be interested.

    I personally do not have any interest in this anymore as I just do not use Python for this stuff anymore since a long time now too, so I do not care if Python linecache is going to be improved or not.

    IMHO, such things like slow linecache could be the reason for people to switch to languages with faster String-Operations and Caches like Perl.

    If you like just close the bug report or commit the patch or whatever - I do not care anymore.

    @bitdancer
    Copy link
    Member

    I am indeed going to close this. The patch isn't complete, since there's the line ending issue Antoine pointed out, which implies that there are also some missing tests.

    I doubt that linecache performance is something that affects very many people, but if someday someone wants to pick this up and finish it, it sounds like there's no objection in principle to the change.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants