Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

foreign-platform newline support #35434

Closed
jackjansen opened this issue Oct 31, 2001 · 13 comments
Closed

foreign-platform newline support #35434

jackjansen opened this issue Oct 31, 2001 · 13 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@jackjansen
Copy link
Member

BPO 476814
Nosy @gvanrossum, @tim-one, @loewis, @warsaw, @jackjansen
Files
  • @univnewlines-20020414: Patch version 7
  • @univnewlines-doc-20020414: Docs v2
  • test_univnewlines.py: test script v5, 14-apr-02
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/warsaw'
    closed_at = <Date 2002-05-22.20:30:16.000>
    created_at = <Date 2001-10-31.16:41:38.000>
    labels = ['interpreter-core']
    title = 'foreign-platform newline support'
    updated_at = <Date 2002-05-22.20:30:16.000>
    user = 'https://github.com/jackjansen'

    bugs.python.org fields:

    activity = <Date 2002-05-22.20:30:16.000>
    actor = 'barry'
    assignee = 'barry'
    closed = True
    closed_date = None
    closer = None
    components = ['Interpreter Core']
    creation = <Date 2001-10-31.16:41:38.000>
    creator = 'jackjansen'
    dependencies = []
    files = ['3725', '3726', '3727']
    hgrepos = []
    issue_num = 476814
    keywords = ['patch']
    message_count = 13.0
    messages = ['38047', '38048', '38049', '38050', '38051', '38052', '38053', '38054', '38055', '38056', '38057', '38058', '38059']
    nosy_count = 5.0
    nosy_names = ['gvanrossum', 'tim.peters', 'loewis', 'barry', 'jackjansen']
    pr_nums = []
    priority = 'normal'
    resolution = 'accepted'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue476814'
    versions = []

    @jackjansen
    Copy link
    Member Author

    This patch enables Python to interpret all known
    newline conventions,
    CR, LF or CRLF, on all platforms.

    This support is enabled by configuring with
    --with-universal-newlines
    (so by default it is off, and everything should behave
    as usual).

    With universal newline support enabled two things
    happen:

    • When importing or otherwise parsing .py files any
      newline convention
      is accepted.
    • Python code can pass a new "t" mode parameter to
      open() which
      reads files with any newline convention. "t" cannot
      be combined with
      any other mode flags like "w" or "+", for obvious
      reasons.

    File objects have a new attribute "newlines" which
    contains the type of
    newlines encountered in the file (or None when no
    newline has been seen,
    or "mixed" if there were various types of newlines).

    Also included is a test script which tests both file
    I/O and parsing.

    @jackjansen jackjansen added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Oct 31, 2001
    @jackjansen jackjansen added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Oct 31, 2001
    @gvanrossum
    Copy link
    Member

    Logged In: YES
    user_id=6380

    Tim, can you review this or pass it on to someone else who
    has time?

    Jack developed this patch after a discussion in which I was
    involved in some of the design, but I won't have time to
    look at it until December.

    @tim-one
    Copy link
    Member

    tim-one commented Nov 5, 2001

    Logged In: YES
    user_id=31435

    It would be better if get_line just called
    Py_UniversalNewlineFgets (when appropriate) instead of
    duplicating its logic inline.

    Py_UniversalNewlineFgets and Py_UniversalNewlineFread
    should deal with releasing the global lock themselves --
    the correct granularity for lock release/reacquire is
    around the C-level input routines (esp. for fread).

    The new routines never check for I/O errors! Why not? It
    seems essential.

    The new Fgets checks for EOF at the end of the loop instead
    of the top. This is surprising, and I stared a long time
    in vain trying to guess why. Setting

    newlinetypes |= NEWLINE_CR;

    immediately after seeing an '\r' would be as fast (instead
    of waiting to see EOF and then inferring the prior
    existence of '\r' indirectly from the state of the
    skipnextlf flag).

    Speaking of which <wink>, the fobj tests in the inner loop
    waste cycles. Set the local flag vrbls whether or not fobj
    is NULL. When you're *out* of the inner loop you can
    simply decline to store the new masks when fobj is NULL
    (and you're already doing the latter anyway). A test and
    branch inside the loop is much more expensive than or'ing
    in a flag bit inside the loop, ditto harder to understand.

    Floating the univ_newline test out of the loop (and
    duplicating the loop body, one way for univ_newline true
    and the other for it false) would also save a test and
    branch on every character.

    Doing fread one character at a time is very inefficient.
    Since you know you need to obtain n characters in the end,
    and that these transformations require reading at least n
    characters, you could very profitably read n characters in
    one gulp at the start, then switch to k at a time where k
    is the number of \r\n pairs seen since the last fread
    call. This is easier to code than it sounds <wink>.

    It would be fine by me if you included (and initialized)
    the new file-object fields all the time, whether or not
    universal newlines are configured. I'd rather waste a few
    bytes in a file object than see #ifdefs spread thru the
    code.

    I'll be damned if I can think of a quick way to do this
    stuff on Windows -- native Windows fgets() is still the
    only Windows handle we have on avoiding crushing thread
    overhead inside MS's C library. I'll think some more about
    it (the thrust still being to eliminate the 't' mode flag,
    as whined about <wink> on Python-Dev).

    @jackjansen
    Copy link
    Member Author

    Logged In: YES
    user_id=45365

    Here's a new version of the patch. To address your issues
    one by one:

    • get_line and Py_UniversalNewlineFgets are too difficult to
      integrate, at leat,
      I don't see how I could do it. The storage management of
      get_line gets in the way.

    • The global lock comment I don't understand. The
      Universal... routines are
      replacements for fgets() and fread(), so have nothing to do
      with the interpreter lock.

    • The logic of all three routines (get_line too) has changed
      and I've put comments in.
      I hope this addresses some of the points.

    • If universal_newline is false for a certain PyFileObject
      we now immedeately take
      a quick exit via fgets() or fread().

    There's also a new test script, that tests some more border
    cases (like lines longer
    than 100 characters, and a lone CR just before end of file).

    @tim-one
    Copy link
    Member

    tim-one commented Dec 13, 2001

    Logged In: YES
    user_id=31435

    Back to Jack -- and sorry for sitting on it so long.
    Clearly this isn't making it into 2.2 in the core. As I
    said on Python-Dev, I believe this needs a PEP: the design
    decisions are debatable, so *should* be debated outside the
    Mac community too. Note, though, that I can't stop you
    from adding it to the 2.2 Mac distribution (if you want it
    badly enough there).

    If a PEP won't be written, I suggest finding someone else
    to review it again; maybe Guido. Note that the patch needs
    doc changes too. The patch to regrtest.py doesn't belong
    here (I assume it just slipped in). There seems a lot of
    code in support of the f_newlinetypes member, and the value
    of that member isn't clear -- I can't imagine a good use
    for it (maybe it's a Mac thing?). The implementation of
    Py_UniversalNewlineFread appears incorrect to me: it reads
    n bytes *every* time around the outer loop, no matter how
    few characters are still required, and n doesn't change
    inside the loop. The business about the GIL may be due to
    the lack of docs: are, or are not, people supposed to
    release the GIL themselves around calls to these guys?
    It's not documented, and it appears your intent differed
    from my guess. Finally, it would be better to call ferror
    () after calling fread() instead of before it <wink>.

    @jackjansen
    Copy link
    Member Author

    Logged In: YES
    user_id=45365

    This version of the patch addresses the bug in Py_UniversalNewlineFread and fixes up some minor details. Tim's other issues are addressed (at least: I think they are:-) in a forthcoming PEP.

    @jackjansen
    Copy link
    Member Author

    Logged In: YES
    user_id=45365

    A new version of the patch. Main differences are that U is now the mode character to trigger universal newline input and --with-universal-newlines is default on.

    @gvanrossum
    Copy link
    Member

    Logged In: YES
    user_id=6380

    Thanks! But there's no documentation. Could I twist your arm
    for a separate doc patch?

    I'm tempted to give this a +1, but I'd like to hear from MvL
    and MAL to see if they foresee any interaction with their
    PEP-262 implemetation.

    @jackjansen
    Copy link
    Member Author

    Logged In: YES
    user_id=45365

    New doc patch, and new version of the patch that mainly allows the U to be specified (no-op) in non-univ-newline-builds.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Apr 11, 2002

    Logged In: YES
    user_id=21627

    What is the rationale for making this a compile-time option?
    It seems to complicate things, with no apparent advantage.

    If this is for backwards compatibility, don't make it an
    option: nobody will rebuild Python just to work around a
    compatibility problem.

    Apart from that, the patch looks goo.d

    @jackjansen
    Copy link
    Member Author

    Logged In: YES
    user_id=45365

    A final tweaks: return a tuple of newline values in stead of 'mixed'.

    @jackjansen
    Copy link
    Member Author

    Logged In: YES
    user_id=45365

    Barry,
    I've checked in the code fixes, but I don't dare check in the documentation fixes myself, as I have no TeX and hence no way to test them. Could you do this, please, and also check that I'm following all the relevant guidelines?
    Thanks!

    @warsaw
    Copy link
    Member

    warsaw commented May 22, 2002

    Logged In: YES
    user_id=12800

    I'll be committing the documentation changes momentarily.
    You were pretty close, and fortunately I had Dr. Doco
    sitting next to me to clean up a few nits.
    I'm also going to check in a change to the builtin `file'
    docstring.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs)
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants