Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

source files using encoding ./. universal newlines #38984

Closed
atsuoishimoto mannequin opened this issue Jul 31, 2003 · 19 comments
Closed

source files using encoding ./. universal newlines #38984

atsuoishimoto mannequin opened this issue Jul 31, 2003 · 19 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@atsuoishimoto
Copy link
Mannequin

atsuoishimoto mannequin commented Jul 31, 2003

BPO 780730
Nosy @malemburg, @loewis, @doerwalter, @jackjansen, @birkenfeld, @atsuoishimoto
Files
  • test.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/malemburg'
    closed_at = <Date 2006-02-21.09:45:42.000>
    created_at = <Date 2003-07-31.09:16:56.000>
    labels = ['interpreter-core']
    title = 'source files using encoding ./. universal newlines'
    updated_at = <Date 2006-02-21.09:45:42.000>
    user = 'https://github.com/atsuoishimoto'

    bugs.python.org fields:

    activity = <Date 2006-02-21.09:45:42.000>
    actor = 'lemburg'
    assignee = 'lemburg'
    closed = True
    closed_date = None
    closer = None
    components = ['Interpreter Core']
    creation = <Date 2003-07-31.09:16:56.000>
    creator = 'ishimoto'
    dependencies = []
    files = ['990']
    hgrepos = []
    issue_num = 780730
    keywords = []
    message_count = 19.0
    messages = ['17521', '17522', '17523', '17524', '17525', '17526', '17527', '17528', '17529', '17530', '17531', '17532', '17533', '17534', '17535', '17536', '17537', '17538', '17539']
    nosy_count = 8.0
    nosy_names = ['lemburg', 'loewis', 'doerwalter', 'nnorwitz', 'jackjansen', 'georg.brandl', 'ishimoto', 'johnjsmith']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue780730'
    versions = ['Python 2.5']

    @atsuoishimoto
    Copy link
    Mannequin Author

    atsuoishimoto mannequin commented Jul 31, 2003

    Universal Newline Support doesn't work for source files
    that contain encoding definition.

    @atsuoishimoto atsuoishimoto mannequin closed this as completed Jul 31, 2003
    @atsuoishimoto atsuoishimoto mannequin assigned malemburg Jul 31, 2003
    @atsuoishimoto atsuoishimoto mannequin added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Jul 31, 2003
    @atsuoishimoto atsuoishimoto mannequin closed this as completed Jul 31, 2003
    @atsuoishimoto atsuoishimoto mannequin assigned malemburg Jul 31, 2003
    @atsuoishimoto atsuoishimoto mannequin added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Jul 31, 2003
    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Aug 1, 2003

    Logged In: YES
    user_id=21627

    It's not clear to me what the problem is: Source code files
    and universal newline support have nothing to do with each
    other.

    Can you attach a small example that demonstrates the problem?

    @atsuoishimoto
    Copy link
    Mannequin Author

    atsuoishimoto mannequin commented Aug 1, 2003

    Logged In: YES
    user_id=463672

    Attached file has encoding definition and it's newline is '\r'.
    Executing this script on Windows 2000, I got following error.
    But it runs fine if I remove encoding definition.

    C:\Python23>python .\test.py
    File ".\test.py", line 2
    a() print 'hi'
    ^
    SyntaxError: invalid syntax

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Aug 1, 2003

    Logged In: YES
    user_id=21627

    I see. This is not about PEP-278, though, as you are not
    calling any open() function, and passing no 'U' argument to
    it - cross-platform newlines should work independent of that
    PEP.

    @jackjansen
    Copy link
    Member

    Logged In: YES
    user_id=45365

    The submitter isn't calling open explicitly, but PEP-278 is also about
    the implicit opening of sourcefiles by the interpreter.

    My guess (but I don't know the PEP-263 implementation at all) is
    that when you specify an encoding the normal code that the
    interpreter uses to read sourcefiles is bypassed, and replaced by
    something that does the encoding magic. Apparently this code
    does not do universal newline conversion, which it should.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Aug 3, 2003

    Logged In: YES
    user_id=21627

    The only way to implement that would be to add 'U' support
    for all codecs, and open the codec with the 'U' flag. As
    this will also affect codecs not part of Python, we cannot
    fix this bug in Python 2.3, but have to defer a solution to
    Python 2.4.

    @malemburg
    Copy link
    Member

    Logged In: YES
    user_id=38388

    I'm not sure this is correct: unless the codecs implement
    their own .readline() implementation, the one in codecs.py
    is used and that simply delegates the readline request to
    the underlying stream object.

    Now. since the stream object in the source code reader is
    a plain Python file object, currently opened in "rb" mode,
    changing the mode to "rbU" should be enough to get
    universal readline support for all such codecs.

    The relevant code is in Parser/tokenizer.c:fp_setreadl():

    static int
    fp_setreadl(struct tok_state *tok, const char* enc)
    {
    	PyObject *reader, *stream, *readline;
    /* XXX: constify filename argument. \*/
    stream = PyFile_FromFile(tok-\>fp, (char*)tok-\>filename,
    

    "rb", NULL);
    if (stream == NULL)
    return 0;

    	reader = PyCodec_StreamReader(enc, stream, NULL);
    	Py_DECREF(stream);
    	if (reader == NULL)
    		return 0;
    
    	readline = PyObject_GetAttrString(reader, "readline");
    	Py_DECREF(reader);
    	if (readline == NULL)
    		return 0;
    
    	tok->decoding_readline = readline;
    	return 1;
    }

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Aug 3, 2003

    Logged In: YES
    user_id=21627

    That would work indeed. The question is whether we can
    impose support for universal newlines on all codecs "out
    there", for Python 2.3.1, when 2.3 makes no such requirement.

    @malemburg
    Copy link
    Member

    Logged In: YES
    user_id=38388

    Uhm, we don't impose Universal readline support on the codecs.
    They would just get a stream object that happens to know
    about universal newlines and work with it. That's completely
    in line with the codec spec.

    I'm +1 on adding this to 2.3.1.

    @jackjansen
    Copy link
    Member

    Logged In: YES
    user_id=45365

    There's no such things as "rbU", I think, but simply "rU" should
    work. As far as I know the only difference between "r" and "rb" is
    newline conversion, right? If there are C libraries that do more
    then we should implement "rbU".

    About 2.3.1 compatibility: technically we could break workarounds
    people have done themselves, but I think the chances are slim.

    @malemburg
    Copy link
    Member

    Logged In: YES
    user_id=38388

    Jack, I was just looking at the code I posted and the one
    in fileobect.c. The latter enables universal newline support
    whenever it sees a 'U' in the mode string, so I throught that
    adding a 'U' to the mode would be enough.

    The only system where 'b' does make a difference that I'm
    aware of is Windows, so you may want to check whether it
    makes a difference there.

    @johnjsmith
    Copy link
    Mannequin

    johnjsmith mannequin commented Aug 4, 2003

    Logged In: YES
    user_id=830565

    In MS Windows, a '\x1a' (Ctrl-Z) in a file will be treated
    as EOF, unless the file is opened with 'rb'.

    @malemburg
    Copy link
    Member

    Logged In: YES
    user_id=38388

    Thanks John.

    Not sure whether any of codecs would actually use 0x1A,
    but using "rbU" sounds like the safer approach then.

    @jackjansen
    Copy link
    Member

    Logged In: YES
    user_id=45365

    You misunderstand what I tried to say (or I mis-said it:-): there is
    no such thing as mode "rbU", check the code in fileobject.c.

    There is "r" == "rt" for text mode, "rb" for binary mode,
    "U"=="rU" for universal newline textmode. With "rU" the
    underlying file is opened in binary mode, so I don't think we'll
    have the control-Z problem.

    @nnorwitz
    Copy link
    Mannequin

    nnorwitz mannequin commented Oct 2, 2005

    Logged In: YES
    user_id=33168

    I don't see a clear resolution here. Is there something we
    can/should do to fix this problem in 2.5?

    @doerwalter
    Copy link
    Contributor

    Logged In: YES
    user_id=89016

    The changes to the codecs done in Python 2.4 added support
    for universal newlines:

    Python 2.4.1 (#2, Mar 31 2005, 00:05:10) 
    [GCC 3.3 20030304 (Apple Computer, Inc. build 1666)] on darwin
    Type "help", "copyright", "credits" or "license" for more
    information.
    >>> open("foo.py", "wb").write("# -*- coding: iso-8859-1
    -*-\rprint 17\rprint 23\r")
    >>> import foo
    17
    23
    >>>

    @birkenfeld
    Copy link
    Member

    Logged In: YES
    user_id=849994

    So this is resolved now?

    @doerwalter
    Copy link
    Contributor

    Logged In: YES
    user_id=89016

    It looks to me that way.

    Any comments from the OP?

    @malemburg
    Copy link
    Member

    Logged In: YES
    user_id=38388

    Closing the bug. Thanks Walter.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs)
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants