Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syntax error on large file with MBCS encoding #41697

Closed
tnleeuw mannequin opened this issue Mar 14, 2005 · 14 comments
Closed

Syntax error on large file with MBCS encoding #41697

tnleeuw mannequin opened this issue Mar 14, 2005 · 14 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@tnleeuw
Copy link
Mannequin

tnleeuw mannequin commented Mar 14, 2005

BPO 1163244
Nosy @tim-one, @mhammond, @doerwalter
Files
  • 00020905-0000-0000-C000-000000000046x0x8x1.zip: zipped Python source that gives compilation error with 2.4.1rc1, but not wiht 2.3.5
  • foo2.py: Simple repro that does no imports
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/mhammond'
    closed_at = <Date 2005-11-11.14:56:31.000>
    created_at = <Date 2005-03-14.20:20:03.000>
    labels = ['interpreter-core']
    title = 'Syntax error on large file with MBCS encoding'
    updated_at = <Date 2005-11-11.14:56:31.000>
    user = 'https://bugs.python.org/tnleeuw'

    bugs.python.org fields:

    activity = <Date 2005-11-11.14:56:31.000>
    actor = 'tim.peters'
    assignee = 'mhammond'
    closed = True
    closed_date = None
    closer = None
    components = ['Interpreter Core']
    creation = <Date 2005-03-14.20:20:03.000>
    creator = 'tnleeuw'
    dependencies = []
    files = ['1627', '1628']
    hgrepos = []
    issue_num = 1163244
    keywords = []
    message_count = 14.0
    messages = ['24589', '24590', '24591', '24592', '24593', '24594', '24595', '24596', '24597', '24598', '24599', '24600', '24601', '24602']
    nosy_count = 9.0
    nosy_names = ['tim.peters', 'mhammond', 'doerwalter', 'nikis', 'tzot', 'jkew', 'sdahlbac', 'tnleeuw', 'tilinna']
    pr_nums = []
    priority = 'high'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue1163244'
    versions = ['Python 2.4']

    @tnleeuw
    Copy link
    Mannequin Author

    tnleeuw mannequin commented Mar 14, 2005

    Large files generated by make-py.py from the win32all
    extensions cannot be compiled by Python2.4.1rc1 - they
    give a syntax error.

    This is a regression from 2.3.5

    (With Python2.4, the interpreter crashes. That is fixed
    now.)

    Removing the mbcs encoding line from the top of the
    file, compilation succeeds.

    File should be attached, as zip-file. Probably requires
    win32all extensions to be installed to be compiled /
    imported (generated using build 203 of the win32all
    extensions).

    @tnleeuw tnleeuw mannequin closed this as completed Mar 14, 2005
    @tnleeuw tnleeuw mannequin assigned mhammond Mar 14, 2005
    @tnleeuw tnleeuw mannequin added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Mar 14, 2005
    @tnleeuw tnleeuw mannequin closed this as completed Mar 14, 2005
    @tnleeuw tnleeuw mannequin assigned mhammond Mar 14, 2005
    @tnleeuw tnleeuw mannequin added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Mar 14, 2005
    @tzot
    Copy link
    Mannequin

    tzot mannequin commented Mar 20, 2005

    Logged In: YES
    user_id=539787

    Useful pointers: in Python-dev, this has been characterised
    as related to pywin32 bug 1085454. Also related to
    www.python.org/sf/1101726 and www.python.org/sf/1089395.

    @mhammond
    Copy link
    Contributor

    Logged In: YES
    user_id=14198

    I believe this is a different bug than the recent
    "long-lines" errors (see below). I can reproduce this with
    a file that uses neither long lines, nor any pywin32
    extensions (2.4 branch, trunk)

    A Python source file containing:
    -- start snippet --
    # -- coding: mbcs --
    <1532 characters of code or comments>
    <cr/lf newline>
    x = {}
    -- end snippet --

    Will yield a SyntaxError when attempting to import the
    module. Running the module as a script does not provoke the
    error.

    To reproduce, there must be exactly 1532 characters where
    specified (see the attached file for a demo). Adding or
    removing even a single character will prevent the error. It
    is possible to replace characters with any others, including
    valid code, and still see the error - however, the number of
    characters must remain the same .cr/lf pairs can also be
    replaced with any other 2 characters. There are other
    "block sizes" that will provoke the error, but this is the
    only one I have nailed.

    Apart from the "block" of 1532 characters, the coding line
    and the blank line before the dict assignment also appear
    critical. Unlike the other characters in the block, this
    last cr/lf pair can not be replaced with comments. I can't
    provoke the error with other encodings (note there are no
    encoded characters in the sample - it is trivial).

    To reproduce, save the attached file on Windows and execute:
    > python -c "import foo2"
    Traceback (most recent call last):
      File "<string>", line 1, in ?
      File "foo2.py", line 24
    x = {}
        ^
    SyntaxError: invalid syntax

    Note that Python 2.3 and earlier all work. Also note that
    "python foo2.py" also works. The code is clearly valid.

    Haven't tried to repro on Linux (mbcs isn't available there,
    and I can't get a test case that doesn't use it)

    Other pointers/notes: pywin32 bug 1085454 is related to
    long-lines, by all accounts that underlying error has been
    fixed - I can't verify this as pywin32 no longer generates
    insanely long lines. I can confirm Python bugs
    1101726/1089395 still crashes Python 2.3+. I believe all
    (including this) are discrete bugs.

    [foo2.py is my attachment - ya gotta love sourceforge :)]

    @tzot
    Copy link
    Mannequin

    tzot mannequin commented Mar 21, 2005

    Logged In: YES
    user_id=539787

    Could be irrelevant but... are the other block sizes close
    to n*512 (eg 1536 is 3*512) marks?

    @tilinna
    Copy link
    Mannequin

    tilinna mannequin commented Apr 9, 2005

    Logged In: YES
    user_id=1074183

    Seems that the connection to n*512 blocks is very likely,
    and it's not just MBCS-related. I managed to reproduce this
    with a file that contains an ascii-coding declaration,
    close-to-1024 bytes section, extra crlf and a comment which
    raises a SyntaxError in Py2.4.1.

    Could this be linked to the new codec buffering code? See:
    www.python.org/sf/1178484

    @doerwalter
    Copy link
    Contributor

    Logged In: YES
    user_id=89016

    Importing foo2.py on Linux (with the current CVS HEAD
    version of Python) gives me a segmentation fault with the
    following stacktrace:
    0x080606cc in instance_repr (inst=0xb7c158bc) at
    Objects/classobject.c:880
    880 classname = inst->in_class->cl_name;
    (gdb) bt
    #0 0x080606cc in instance_repr (inst=0xb7c158bc) at
    Objects/classobject.c:880
    #1 0x08082235 in PyObject_Repr (v=0xb7c158bc) at
    Objects/object.c:308
    #2 0x080f3ccd in err_input (err=0xbfffe000) at
    Python/pythonrun.c:1478
    #3 0x080f3956 in PyParser_SimpleParseFileFlags
    (fp=0x818d6e0, filename=0xbfffe530 "foo2.py", start=257,
    flags=0)
    at Python/pythonrun.c:1348
    #4 0x080f3982 in PyParser_SimpleParseFile (fp=0x818d6e0,
    filename=0xbfffe530 "foo2.py", start=257)
    at Python/pythonrun.c:1355
    #5 0x080e6fef in parse_source_module (pathname=0xbfffe530
    "foo2.py", fp=0x818d6e0) at Python/import.c:761
    #6 0x080e72db in load_source_module (name=0xbfffe9d0
    "foo2", pathname=0xbfffe530 "foo2.py", fp=0x818d6e0)
    at Python/import.c:885
    #7 0x080e86b4 in load_module (name=0xbfffe9d0 "foo2",
    fp=0x818d6e0, buf=0xbfffe530 "foo2.py", type=1, loader=0x0)
    at Python/import.c:1656
    #8 0x080e9d52 in import_submodule (mod=0x8145768,
    subname=0xbfffe9d0 "foo2", fullname=0xbfffe9d0 "foo2")
    at Python/import.c:2250
    #9 0x080e9511 in load_next (mod=0x8145768,
    altmod=0x8145768, p_name=0xbfffedf0, buf=0xbfffe9d0 "foo2",
    p_buflen=0xbfffe9cc)
    at Python/import.c:2070
    #10 0x080e8e5e in import_module_ex (name=0x0,
    globals=0xb7d62e94, locals=0xb7d62e94, fromlist=0x8145768)
    at Python/import.c:1905
    #11 0x080e914b in PyImport_ImportModuleEx (name=0xb7cd8824
    "foo2", globals=0xb7d62e94, locals=0xb7d62e94,
    fromlist=0x8145768) at Python/import.c:1946
    #12 0x080b5c87 in builtin___import__ (self=0x0,
    args=0xb7d1e634) at Python/bltinmodule.c:45
    #13 0x0811d32e in PyCFunction_Call (func=0xb7d523ec,
    arg=0xb7d1e634, kw=0x0) at Objects/methodobject.c:73
    #14 0x0805d188 in PyObject_Call (func=0xb7d523ec,
    arg=0xb7d1e634, kw=0x0) at Objects/abstract.c:1757
    #15 0x080ca79d in PyEval_CallObjectWithKeywords
    (func=0xb7d523ec, arg=0xb7d1e634, kw=0x0) at Python/ceval.c:3425
    #16 0x080c6719 in PyEval_EvalFrame (f=0x816dd7c) at
    Python/ceval.c:2026
    #17 0x080c8fdd in PyEval_EvalCodeEx (co=0xb7cf1ef0,
    globals=0xb7d62e94, locals=0xb7d62e94, args=0x0, argcount=0,
    kws=0x0,
    kwcount=0, defs=0x0, defcount=0, closure=0x0) at
    Python/ceval.c:2736
    #18 0x080bffb0 in PyEval_EvalCode (co=0xb7cf1ef0,
    globals=0xb7d62e94, locals=0xb7d62e94) at Python/ceval.c:490
    #19 0x080f361d in run_node (n=0xb7d122d0, filename=0x8123ba3
    "<stdin>", globals=0xb7d62e94, locals=0xb7d62e94,
    flags=0xbffff584) at Python/pythonrun.c:1265
    #20 0x080f1f58 in PyRun_InteractiveOneFlags (fp=0xb7e94720,
    filename=0x8123ba3 "<stdin>", flags=0xbffff584)
    at Python/pythonrun.c:762
    #21 0x080f1c93 in PyRun_InteractiveLoopFlags (fp=0xb7e94720,
    filename=0x8123ba3 "<stdin>", flags=0xbffff584)
    at Python/pythonrun.c:695
    #22 0x080f1af6 in PyRun_AnyFileExFlags (fp=0xb7e94720,
    filename=0x8123ba3 "<stdin>", closeit=0, flags=0xbffff584)
    at Python/pythonrun.c:658
    #23 0x08055e45 in Py_Main (argc=1, argv=0xbffff634) at
    Modules/main.c:484
    #24 0x08055366 in main (argc=1, argv=0xbffff634) at
    Modules/python.c:23

    The value object in err_input() (in the E_DECODE case) seems
    to be bogus (it gives me a refcount of -606348325).

    @nikis
    Copy link
    Mannequin

    nikis mannequin commented Jun 2, 2005

    Logged In: YES
    user_id=27708

    i have reproductable test case with encoding cp1251
    file is 1594 bytes long, how to attach it?

    @sdahlbac
    Copy link
    Mannequin

    sdahlbac mannequin commented Jul 21, 2005

    Logged In: YES
    user_id=750513

    For what it's worth:

    I have two files (that I unfortunately cannot attach) which
    works fine on 2.3 that now on 2.4.1 produces spurious syntax
    errors when they have

    # -- coding: ascii --

    if I change that to something that does not match the coding
    regex I do not get any syntax error

    (winxp)

    @jkew
    Copy link
    Mannequin

    jkew mannequin commented Aug 4, 2005

    Logged In: YES
    user_id=598066

    Is pywin32 bug 1166627 relevant/related?

    @tzot
    Copy link
    Mannequin

    tzot mannequin commented Aug 4, 2005

    Logged In: YES
    user_id=539787

    Are you sure about the bug number? pywin32 seems not to have
    such a bug.

    @jkew
    Copy link
    Mannequin

    jkew mannequin commented Aug 4, 2005

    Logged In: YES
    user_id=598066

    http://sourceforge.net/tracker/?
    func=detail&aid=1166627&group_id=78018&atid=551954

    @tim-one
    Copy link
    Member

    tim-one commented Nov 11, 2005

    Logged In: YES
    user_id=31435

    Is this still an issue in 2.4.2? I downloaded the zip file, and
    didn't have any problem importing the .py file on Windows
    using 2.4.2. A number of problems with encoding directives
    were fixed in 2.4.2, so I doubt that's an accident ;-)

    @mhammond
    Copy link
    Contributor

    Logged In: YES
    user_id=14198

    Thanks Tim! I can confirm that I can no longer reproduce it
    with the svn release24-maint branch - so I'm going out on a
    limb and closing it. I haven't tested Linux, so it would be
    great of some others could also confirm it fixed (or reopen
    it if not)

    @tim-one
    Copy link
    Member

    tim-one commented Nov 11, 2005

    Logged In: YES
    user_id=31435

    [Mark]

    I can confirm that I can no longer reproduce it
    with the svn release24-maint branch

    Did you know 2.4.2 final was released? That happened
    September 28. So if someone has this problem, ask them to
    try the released 2.4.2 (no need to muck with release24-
    maint).

    Leaving this closed, but assigned to Mark just so he'll get
    this note.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs)
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants