Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IDLE is unable to open any .py files #104719

Closed
AlexWaygood opened this issue May 21, 2023 · 30 comments
Closed

IDLE is unable to open any .py files #104719

AlexWaygood opened this issue May 21, 2023 · 30 comments
Assignees
Labels
3.12 bugs and security fixes stdlib Python modules in the Lib dir topic-IDLE type-bug An unexpected behavior, bug, or error

Comments

@AlexWaygood
Copy link
Member

AlexWaygood commented May 21, 2023

With a fresh CPython build (be0c106), IDLE is unable to open any .py files.

To reproduce:

  1. Create an empty .py file with the name repro.py
  2. Run python -m idlelib repro.py

IDLE still seems able to create new .py files and save them; it just can't open pre-existing .py files right now.

Traceback observed

C:\Users\alexw\coding\cpython>python -m idlelib repro.py
Running Debug|x64 interpreter...
Traceback (most recent call last):
  File "C:\Users\alexw\coding\cpython\Lib\runpy.py", line 198, in _run_module_as_main
    return _run_code(code, main_globals, None,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\alexw\coding\cpython\Lib\runpy.py", line 88, in _run_code
    exec(code, run_globals)
  File "C:\Users\alexw\coding\cpython\Lib\idlelib\__main__.py", line 7, in <module>
    idlelib.pyshell.main()
  File "C:\Users\alexw\coding\cpython\Lib\idlelib\pyshell.py", line 1640, in main
    if flist.open(filename) is None:
       ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\alexw\coding\cpython\Lib\idlelib\filelist.py", line 37, in open
    edit = self.EditorWindow(self, filename, key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\alexw\coding\cpython\Lib\idlelib\pyshell.py", line 135, in __init__
    EditorWindow.__init__(self, *args)
  File "C:\Users\alexw\coding\cpython\Lib\idlelib\editor.py", line 289, in __init__
    self.set_indentation_params(is_py_src)
  File "C:\Users\alexw\coding\cpython\Lib\idlelib\editor.py", line 1327, in set_indentation_params
    i = self.guess_indent()
        ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\alexw\coding\cpython\Lib\idlelib\editor.py", line 1574, in guess_indent
    opener, indented = IndentSearcher(self.text, self.tabwidth).run()
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\alexw\coding\cpython\Lib\idlelib\editor.py", line 1646, in run
    save_tabsize = tokenize.tabsize
                   ^^^^^^^^^^^^^^^^
AttributeError: module 'tokenize' has no attribute 'tabsize'

Environment

(Given the cause of the bug, the environment details shouldn't really be relevant; but they're included here anyway, for completeness.)

Python 3.12.0a7+ (heads/main:be0c106789, May 21 2023, 12:00:27) [MSC v.1932 64 bit (AMD64)] on win32

Reproduces on a debug and non-debug build, FWIW.

Linked PRs

@AlexWaygood
Copy link
Member Author

AlexWaygood commented May 21, 2023

Bisects to 6715f91 (unsurprisingly 😄)

Cc. @mgmacias95, @pablogsal, @lysnikolaou, @isidentical

@pablogsal
Copy link
Member

Will check this today

@pablogsal
Copy link
Member

pablogsal commented May 21, 2023

Looks like IDLE is using a bunch of non documented attributes of the tokenize module that are no longer used internally. The easier course of action is probably just restore these but note that there will be no guarantee that whatever idle is going with them will continue working.

I ran the buildbots and the CI before merge so another question is why CI didn't catch this.

@AlexWaygood
Copy link
Member Author

AlexWaygood commented May 21, 2023

I ran the buildbots and the CI before merge so another question is why CI didn't catch this.

Do any of the buildbots run the test suite with -u gui (or -u all), I wonder? A bunch of IDLE's tests actually open and close GUI windows, so they're not run unless you run the test suite with that option.

@AlexWaygood
Copy link
Member Author

AlexWaygood commented May 21, 2023

I ran the buildbots and the CI before merge so another question is why CI didn't catch this.

Do any of the buildbots run the test suite with -u gui (or -u all), I wonder? A bunch of IDLE's tests actually open and close GUI windows, so they're not run unless you run the test suite with that option.

That doesn't seem relevant; python -m test test_idle -u gui passes on main, on both debug and non-debug builds. So it does seem like there's some missing test coverage here.

@pablogsal
Copy link
Member

I checked a bit more how the constant that was eliminated is being used and seems that IDLE is monkey patching the value to modify the behaviour of the now deleted _tokenize private function to adapt something regarding tabsize.

Unfortunately I don't think this is going to work and was out of contract anyway so I think the best course of action here is to eliminate that code from idlelib as restoring the constant will not restore whatever was happening before.

@AlexWaygood
Copy link
Member Author

@pablogsal
Copy link
Member

pablogsal commented May 21, 2023

FWIW, I can find other places in prominent open-source projects where these undocumented constants (which have now been removed) are used:

So it might be worth adding them back anyway, to limit the disruption caused by this change.

Most of the projects you mention are using the name of the tokens that are still exposed (as in tokenize.COMMENT), so that's not a problem and the other project seems to use a vendored copy of idle of some kind so the same caveats apply.

@pablogsal
Copy link
Member

Oh actually I misread, they are using the regular expressions.

Hummmm, I still think I prefer to not restore those because they were never part of the public API and I don't want to give any guarantees that they will be up to date and doing whatever they were doing before.

@AlexWaygood
Copy link
Member Author

AlexWaygood commented May 21, 2023

To be clear, I fully agree that these projects should not have been using undocumented constants from the tokenize module, and that no compatibility guarantees apply to undocumented module constants. However, I feel like it would be kinder (and would result in a less disruptive transition to 3.12) if we were to add these constants back for now and issue a deprecation warning when they're accessed by users (we could do this via a module-level __getattr__ method).

@AlexWaygood
Copy link
Member Author

Oh actually I misread, they are using the regular expressions.

Yes, to be clear:

@pablogsal
Copy link
Member

However, I feel like it would be kinder (and would result in a less disruptive transition to 3.12) if we were to add these constants back for now and issue a deprecation warning when they're accessed by users (we could do this via a module-level getattr method).

Probably this is what we will end up doing because it makes sense to reduce breakage in general but I am kind of torn here because I don't want to normalise adding deprecation warnings for APIs and constants that were never public and also it makes me a bit uncomfortable because for instance even if we add back tabsize changing it will have no effect so projects will still be broken anyway if they expect that the monkey patching will work.

Furthermore, as these constants were not public interface they have no tests or any way to validate that they work and that nobody breaks them in the future by accident and this makes me very uncomfortable giving any sort of guarantee even if they are informal.

@

@pablogsal
Copy link
Member

CC: @Yhg1s thoughts?

@AlexWaygood
Copy link
Member Author

AlexWaygood commented May 21, 2023

Yeah, I agree it doesn't make much sense to add tabsize back. The regular expressions seem like they're in a slightly different camp, though.

(Here's all the names that aren't in tokenize.__all__ but could be accessed from tokenize on 3.11 (and don't have names starting with leading underscores), and don't exist on main):

Ignore='[ \\f\\t]*(\\\\\\r?\\n[ \\f\\t]*)*(#[^\\r\\n]*)?'

Funny='(\\r?\\n|(\\~|\\}|\\|=|\\||\\{|\\^=|\\^|\\]|\\[|@=|@|>>=|>>|>=|>|==|=|<=|<<=|<<|<|;|:=|:|/=|//=|//|/|\\.\\.\\.|\\.|\\->|\\-=|\\-|,|\\+=|\\+|\\*=|\\*\\*=|\\*\\*|\\*|\\)|\\(|\\&=|\\&|%=|%|!=))'

Binnumber='0[bB](?:_?[01])+'

Single="[^'\\\\]*(?:\\\\.[^'\\\\]*)*'"

Whitespace='[ \\f\\t]*'

Double3='[^"\\\\]*(?:(?:\\\\.|"(?!""))[^"\\\\]*)*"""'

Octnumber='0[oO](?:_?[0-7])+'

any=<function any at 0x0000025619C20180>

maybe=<function maybe at 0x0000025619C20220>

StringPrefix='(|u|RB|rf|Rb|rb|r|U|B|bR|RF|br|F|Fr|b|Br|BR|fr|rB|rF|Rf|R|fR|FR|f)'

Decnumber='(?:0(?:_?0)*|[1-9](?:_?[0-9])*)'

single_quoted={'RB"', "U'", "FR'", 'RF"', "F'", 'rf"', 'br"', "RF'", 'Rf"', 'Br"', "fR'", 'u"', "rf'", 'fR"', "Br'", 'f"', 'R"', "rb'", 'B"', "Fr'", "BR'", 'rb"', "Rf'", "rB'", 'Rb"', 'bR"', "br'", 'F"', 'b"', 'BR"', "RB'", 'U"', "rF'", "b'", 'fr"', "r'", "bR'", 'FR"', "f'", 'rB"', "'", 'rF"', "B'", "fr'", "Rb'", "R'", 'r"', "u'", '"', 'Fr"'}

Floatnumber='(([0-9](?:_?[0-9])*\\.(?:[0-9](?:_?[0-9])*)?|\\.[0-9](?:_?[0-9])*)([eE][-+]?[0-9](?:_?[0-9])*)?|[0-9](?:_?[0-9])*[eE][-+]?[0-9](?:_?[0-9])*)'

Single3="[^'\\\\]*(?:(?:\\\\.|'(?!''))[^'\\\\]*)*'''"

Number='(([0-9](?:_?[0-9])*[jJ]|(([0-9](?:_?[0-9])*\\.(?:[0-9](?:_?[0-9])*)?|\\.[0-9](?:_?[0-9])*)([eE][-+]?[0-9](?:_?[0-9])*)?|[0-9](?:_?[0-9])*[eE][-+]?[0-9](?:_?[0-9])*)[jJ])|(([0-9](?:_?[0-9])*\\.(?:[0-9](?:_?[0-9])*)?|\\.[0-9](?:_?[0-9])*)([eE][-+]?[0-9](?:_?[0-9])*)?|[0-9](?:_?[0-9])*[eE][-+]?[0-9](?:_?[0-9])*)|(0[xX](?:_?[0-9a-fA-F])+|0[bB](?:_?[01])+|0[oO](?:_?[0-7])+|(?:0(?:_?0)*|[1-9](?:_?[0-9])*)))'

Name='\\w+'

Double='[^"\\\\]*(?:\\\\.[^"\\\\]*)*"'

Token='[ \\f\\t]*(\\\\\\r?\\n[ \\f\\t]*)*(#[^\\r\\n]*)?((([0-9](?:_?[0-9])*[jJ]|(([0-9](?:_?[0-9])*\\.(?:[0-9](?:_?[0-9])*)?|\\.[0-9](?:_?[0-9])*)([eE][-+]?[0-9](?:_?[0-9])*)?|[0-9](?:_?[0-9])*[eE][-+]?[0-9](?:_?[0-9])*)[jJ])|(([0-9](?:_?[0-9])*\\.(?:[0-9](?:_?[0-9])*)?|\\.[0-9](?:_?[0-9])*)([eE][-+]?[0-9](?:_?[0-9])*)?|[0-9](?:_?[0-9])*[eE][-+]?[0-9](?:_?[0-9])*)|(0[xX](?:_?[0-9a-fA-F])+|0[bB](?:_?[01])+|0[oO](?:_?[0-7])+|(?:0(?:_?0)*|[1-9](?:_?[0-9])*)))|(\\r?\\n|(\\~|\\}|\\|=|\\||\\{|\\^=|\\^|\\]|\\[|@=|@|>>=|>>|>=|>|==|=|<=|<<=|<<|<|;|:=|:|/=|//=|//|/|\\.\\.\\.|\\.|\\->|\\-=|\\-|,|\\+=|\\+|\\*=|\\*\\*=|\\*\\*|\\*|\\)|\\(|\\&=|\\&|%=|%|!=))|((|u|RB|rf|Rb|rb|r|U|B|bR|RF|br|F|Fr|b|Br|BR|fr|rB|rF|Rf|R|fR|FR|f)\'[^\\n\'\\\\]*(?:\\\\.[^\\n\'\\\\]*)*\'|(|u|RB|rf|Rb|rb|r|U|B|bR|RF|br|F|Fr|b|Br|BR|fr|rB|rF|Rf|R|fR|FR|f)"[^\\n"\\\\]*(?:\\\\.[^\\n"\\\\]*)*")|\\w+)'

Intnumber='(0[xX](?:_?[0-9a-fA-F])+|0[bB](?:_?[01])+|0[oO](?:_?[0-7])+|(?:0(?:_?0)*|[1-9](?:_?[0-9])*))'

PseudoExtras='(\\\\\\r?\\n|\\Z|#[^\\r\\n]*|((|u|RB|rf|Rb|rb|r|U|B|bR|RF|br|F|Fr|b|Br|BR|fr|rB|rF|Rf|R|fR|FR|f)\'\'\'|(|u|RB|rf|Rb|rb|r|U|B|bR|RF|br|F|Fr|b|Br|BR|fr|rB|rF|Rf|R|fR|FR|f)"""))'

Pointfloat='([0-9](?:_?[0-9])*\\.(?:[0-9](?:_?[0-9])*)?|\\.[0-9](?:_?[0-9])*)([eE][-+]?[0-9](?:_?[0-9])*)?'

group=<function group at 0x0000025619BE6F20>

triple_quoted={'U"""', 'BR"""', "B'''", "b'''", "fr'''", "R'''", 'rB"""', 'b"""', "Fr'''", "f'''", "rb'''", 'fr"""', "RF'''", 'rF"""', "Br'''", 'rf"""', "FR'''", "fR'''", 'RF"""', "Rb'''", "BR'''", "bR'''", 'u"""', 'F"""', "Rf'''", 'FR"""', "br'''", "rf'''", "r'''", 'R"""', 'fR"""', "'''", "rF'''", 'RB"""', 'bR"""', 'Rf"""', "u'''", 'Rb"""', "F'''", "U'''", '"""', 'r"""', 'Br"""', "RB'''", 'Fr"""', 'B"""', "rB'''", 'rb"""', 'br"""', 'f"""'}

ContStr='((|u|RB|rf|Rb|rb|r|U|B|bR|RF|br|F|Fr|b|Br|BR|fr|rB|rF|Rf|R|fR|FR|f)\'[^\\n\'\\\\]*(?:\\\\.[^\\n\'\\\\]*)*(\'|\\\\\\r?\\n)|(|u|RB|rf|Rb|rb|r|U|B|bR|RF|br|F|Fr|b|Br|BR|fr|rB|rF|Rf|R|fR|FR|f)"[^\\n"\\\\]*(?:\\\\.[^\\n"\\\\]*)*("|\\\\\\r?\\n))'

Imagnumber='([0-9](?:_?[0-9])*[jJ]|(([0-9](?:_?[0-9])*\\.(?:[0-9](?:_?[0-9])*)?|\\.[0-9](?:_?[0-9])*)([eE][-+]?[0-9](?:_?[0-9])*)?|[0-9](?:_?[0-9])*[eE][-+]?[0-9](?:_?[0-9])*)[jJ])'

String='((|u|RB|rf|Rb|rb|r|U|B|bR|RF|br|F|Fr|b|Br|BR|fr|rB|rF|Rf|R|fR|FR|f)\'[^\\n\'\\\\]*(?:\\\\.[^\\n\'\\\\]*)*\'|(|u|RB|rf|Rb|rb|r|U|B|bR|RF|br|F|Fr|b|Br|BR|fr|rB|rF|Rf|R|fR|FR|f)"[^\\n"\\\\]*(?:\\\\.[^\\n"\\\\]*)*")'

tabsize=8

endpats={"'": "[^'\\\\]*(?:\\\\.[^'\\\\]*)*'", '"': '[^"\\\\]*(?:\\\\.[^"\\\\]*)*"', "'''": "[^'\\\\]*(?:(?:\\\\.|'(?!''))[^'\\\\]*)*'''", '"""': '[^"\\\\]*(?:(?:\\\\.|"(?!""))[^"\\\\]*)*"""', "u'": "[^'\\\\]*(?:\\\\.[^'\\\\]*)*'", 'u"': '[^"\\\\]*(?:\\\\.[^"\\\\]*)*"', "u'''": "[^'\\\\]*(?:(?:\\\\.|'(?!''))[^'\\\\]*)*'''", 'u"""': '[^"\\\\]*(?:(?:\\\\.|"(?!""))[^"\\\\]*)*"""', "RB'": "[^'\\\\]*(?:\\\\.[^'\\\\]*)*'", 'RB"': '[^"\\\\]*(?:\\\\.[^"\\\\]*)*"', "RB'''": "[^'\\\\]*(?:(?:\\\\.|'(?!''))[^'\\\\]*)*'''", 'RB"""': '[^"\\\\]*(?:(?:\\\\.|"(?!""))[^"\\\\]*)*"""', "rf'": "[^'\\\\]*(?:\\\\.[^'\\\\]*)*'", 'rf"': '[^"\\\\]*(?:\\\\.[^"\\\\]*)*"', "rf'''": "[^'\\\\]*(?:(?:\\\\.|'(?!''))[^'\\\\]*)*'''", 'rf"""': '[^"\\\\]*(?:(?:\\\\.|"(?!""))[^"\\\\]*)*"""', "Rb'": "[^'\\\\]*(?:\\\\.[^'\\\\]*)*'", 'Rb"': '[^"\\\\]*(?:\\\\.[^"\\\\]*)*"', "Rb'''": "[^'\\\\]*(?:(?:\\\\.|'(?!''))[^'\\\\]*)*'''", 'Rb"""': '[^"\\\\]*(?:(?:\\\\.|"(?!""))[^"\\\\]*)*"""', "rb'": "[^'\\\\]*(?:\\\\.[^'\\\\]*)*'", 'rb"': '[^"\\\\]*(?:\\\\.[^"\\\\]*)*"', "rb'''": "[^'\\\\]*(?:(?:\\\\.|'(?!''))[^'\\\\]*)*'''", 'rb"""': '[^"\\\\]*(?:(?:\\\\.|"(?!""))[^"\\\\]*)*"""', "r'": "[^'\\\\]*(?:\\\\.[^'\\\\]*)*'", 'r"': '[^"\\\\]*(?:\\\\.[^"\\\\]*)*"', "r'''": "[^'\\\\]*(?:(?:\\\\.|'(?!''))[^'\\\\]*)*'''", 'r"""': '[^"\\\\]*(?:(?:\\\\.|"(?!""))[^"\\\\]*)*"""', "U'": "[^'\\\\]*(?:\\\\.[^'\\\\]*)*'", 'U"': '[^"\\\\]*(?:\\\\.[^"\\\\]*)*"', "U'''": "[^'\\\\]*(?:(?:\\\\.|'(?!''))[^'\\\\]*)*'''", 'U"""': '[^"\\\\]*(?:(?:\\\\.|"(?!""))[^"\\\\]*)*"""', "B'": "[^'\\\\]*(?:\\\\.[^'\\\\]*)*'", 'B"': '[^"\\\\]*(?:\\\\.[^"\\\\]*)*"', "B'''": "[^'\\\\]*(?:(?:\\\\.|'(?!''))[^'\\\\]*)*'''", 'B"""': '[^"\\\\]*(?:(?:\\\\.|"(?!""))[^"\\\\]*)*"""', "bR'": "[^'\\\\]*(?:\\\\.[^'\\\\]*)*'", 'bR"': '[^"\\\\]*(?:\\\\.[^"\\\\]*)*"', "bR'''": "[^'\\\\]*(?:(?:\\\\.|'(?!''))[^'\\\\]*)*'''", 'bR"""': '[^"\\\\]*(?:(?:\\\\.|"(?!""))[^"\\\\]*)*"""', "RF'": "[^'\\\\]*(?:\\\\.[^'\\\\]*)*'", 'RF"': '[^"\\\\]*(?:\\\\.[^"\\\\]*)*"', "RF'''": "[^'\\\\]*(?:(?:\\\\.|'(?!''))[^'\\\\]*)*'''", 'RF"""': '[^"\\\\]*(?:(?:\\\\.|"(?!""))[^"\\\\]*)*"""', "br'": "[^'\\\\]*(?:\\\\.[^'\\\\]*)*'", 'br"': '[^"\\\\]*(?:\\\\.[^"\\\\]*)*"', "br'''": "[^'\\\\]*(?:(?:\\\\.|'(?!''))[^'\\\\]*)*'''", 'br"""': '[^"\\\\]*(?:(?:\\\\.|"(?!""))[^"\\\\]*)*"""', "F'": "[^'\\\\]*(?:\\\\.[^'\\\\]*)*'", 'F"': '[^"\\\\]*(?:\\\\.[^"\\\\]*)*"', "F'''": "[^'\\\\]*(?:(?:\\\\.|'(?!''))[^'\\\\]*)*'''", 'F"""': '[^"\\\\]*(?:(?:\\\\.|"(?!""))[^"\\\\]*)*"""', "Fr'": "[^'\\\\]*(?:\\\\.[^'\\\\]*)*'", 'Fr"': '[^"\\\\]*(?:\\\\.[^"\\\\]*)*"', "Fr'''": "[^'\\\\]*(?:(?:\\\\.|'(?!''))[^'\\\\]*)*'''", 'Fr"""': '[^"\\\\]*(?:(?:\\\\.|"(?!""))[^"\\\\]*)*"""', "b'": "[^'\\\\]*(?:\\\\.[^'\\\\]*)*'", 'b"': '[^"\\\\]*(?:\\\\.[^"\\\\]*)*"', "b'''": "[^'\\\\]*(?:(?:\\\\.|'(?!''))[^'\\\\]*)*'''", 'b"""': '[^"\\\\]*(?:(?:\\\\.|"(?!""))[^"\\\\]*)*"""', "Br'": "[^'\\\\]*(?:\\\\.[^'\\\\]*)*'", 'Br"': '[^"\\\\]*(?:\\\\.[^"\\\\]*)*"', "Br'''": "[^'\\\\]*(?:(?:\\\\.|'(?!''))[^'\\\\]*)*'''", 'Br"""': '[^"\\\\]*(?:(?:\\\\.|"(?!""))[^"\\\\]*)*"""', "BR'": "[^'\\\\]*(?:\\\\.[^'\\\\]*)*'", 'BR"': '[^"\\\\]*(?:\\\\.[^"\\\\]*)*"', "BR'''": "[^'\\\\]*(?:(?:\\\\.|'(?!''))[^'\\\\]*)*'''", 'BR"""': '[^"\\\\]*(?:(?:\\\\.|"(?!""))[^"\\\\]*)*"""', "fr'": "[^'\\\\]*(?:\\\\.[^'\\\\]*)*'", 'fr"': '[^"\\\\]*(?:\\\\.[^"\\\\]*)*"', "fr'''": "[^'\\\\]*(?:(?:\\\\.|'(?!''))[^'\\\\]*)*'''", 'fr"""': '[^"\\\\]*(?:(?:\\\\.|"(?!""))[^"\\\\]*)*"""', "rB'": "[^'\\\\]*(?:\\\\.[^'\\\\]*)*'", 'rB"': '[^"\\\\]*(?:\\\\.[^"\\\\]*)*"', "rB'''": "[^'\\\\]*(?:(?:\\\\.|'(?!''))[^'\\\\]*)*'''", 'rB"""': '[^"\\\\]*(?:(?:\\\\.|"(?!""))[^"\\\\]*)*"""', "rF'": "[^'\\\\]*(?:\\\\.[^'\\\\]*)*'", 'rF"': '[^"\\\\]*(?:\\\\.[^"\\\\]*)*"', "rF'''": "[^'\\\\]*(?:(?:\\\\.|'(?!''))[^'\\\\]*)*'''", 'rF"""': '[^"\\\\]*(?:(?:\\\\.|"(?!""))[^"\\\\]*)*"""', "Rf'": "[^'\\\\]*(?:\\\\.[^'\\\\]*)*'", 'Rf"': '[^"\\\\]*(?:\\\\.[^"\\\\]*)*"', "Rf'''": "[^'\\\\]*(?:(?:\\\\.|'(?!''))[^'\\\\]*)*'''", 'Rf"""': '[^"\\\\]*(?:(?:\\\\.|"(?!""))[^"\\\\]*)*"""', "R'": "[^'\\\\]*(?:\\\\.[^'\\\\]*)*'", 'R"': '[^"\\\\]*(?:\\\\.[^"\\\\]*)*"', "R'''": "[^'\\\\]*(?:(?:\\\\.|'(?!''))[^'\\\\]*)*'''", 'R"""': '[^"\\\\]*(?:(?:\\\\.|"(?!""))[^"\\\\]*)*"""', "fR'": "[^'\\\\]*(?:\\\\.[^'\\\\]*)*'", 'fR"': '[^"\\\\]*(?:\\\\.[^"\\\\]*)*"', "fR'''": "[^'\\\\]*(?:(?:\\\\.|'(?!''))[^'\\\\]*)*'''", 'fR"""': '[^"\\\\]*(?:(?:\\\\.|"(?!""))[^"\\\\]*)*"""', "FR'": "[^'\\\\]*(?:\\\\.[^'\\\\]*)*'", 'FR"': '[^"\\\\]*(?:\\\\.[^"\\\\]*)*"', "FR'''": "[^'\\\\]*(?:(?:\\\\.|'(?!''))[^'\\\\]*)*'''", 'FR"""': '[^"\\\\]*(?:(?:\\\\.|"(?!""))[^"\\\\]*)*"""', "f'": "[^'\\\\]*(?:\\\\.[^'\\\\]*)*'", 'f"': '[^"\\\\]*(?:\\\\.[^"\\\\]*)*"', "f'''": "[^'\\\\]*(?:(?:\\\\.|'(?!''))[^'\\\\]*)*'''", 'f"""': '[^"\\\\]*(?:(?:\\\\.|"(?!""))[^"\\\\]*)*"""'}

PseudoToken='[ \\f\\t]*((\\\\\\r?\\n|\\Z|#[^\\r\\n]*|((|u|RB|rf|Rb|rb|r|U|B|bR|RF|br|F|Fr|b|Br|BR|fr|rB|rF|Rf|R|fR|FR|f)\'\'\'|(|u|RB|rf|Rb|rb|r|U|B|bR|RF|br|F|Fr|b|Br|BR|fr|rB|rF|Rf|R|fR|FR|f)"""))|(([0-9](?:_?[0-9])*[jJ]|(([0-9](?:_?[0-9])*\\.(?:[0-9](?:_?[0-9])*)?|\\.[0-9](?:_?[0-9])*)([eE][-+]?[0-9](?:_?[0-9])*)?|[0-9](?:_?[0-9])*[eE][-+]?[0-9](?:_?[0-9])*)[jJ])|(([0-9](?:_?[0-9])*\\.(?:[0-9](?:_?[0-9])*)?|\\.[0-9](?:_?[0-9])*)([eE][-+]?[0-9](?:_?[0-9])*)?|[0-9](?:_?[0-9])*[eE][-+]?[0-9](?:_?[0-9])*)|(0[xX](?:_?[0-9a-fA-F])+|0[bB](?:_?[01])+|0[oO](?:_?[0-7])+|(?:0(?:_?0)*|[1-9](?:_?[0-9])*)))|(\\r?\\n|(\\~|\\}|\\|=|\\||\\{|\\^=|\\^|\\]|\\[|@=|@|>>=|>>|>=|>|==|=|<=|<<=|<<|<|;|:=|:|/=|//=|//|/|\\.\\.\\.|\\.|\\->|\\-=|\\-|,|\\+=|\\+|\\*=|\\*\\*=|\\*\\*|\\*|\\)|\\(|\\&=|\\&|%=|%|!=))|((|u|RB|rf|Rb|rb|r|U|B|bR|RF|br|F|Fr|b|Br|BR|fr|rB|rF|Rf|R|fR|FR|f)\'[^\\n\'\\\\]*(?:\\\\.[^\\n\'\\\\]*)*(\'|\\\\\\r?\\n)|(|u|RB|rf|Rb|rb|r|U|B|bR|RF|br|F|Fr|b|Br|BR|fr|rB|rF|Rf|R|fR|FR|f)"[^\\n"\\\\]*(?:\\\\.[^\\n"\\\\]*)*("|\\\\\\r?\\n))|\\w+)'

Special='(\\~|\\}|\\|=|\\||\\{|\\^=|\\^|\\]|\\[|@=|@|>>=|>>|>=|>|==|=|<=|<<=|<<|<|;|:=|:|/=|//=|//|/|\\.\\.\\.|\\.|\\->|\\-=|\\-|,|\\+=|\\+|\\*=|\\*\\*=|\\*\\*|\\*|\\)|\\(|\\&=|\\&|%=|%|!=)'

Triple='((|u|RB|rf|Rb|rb|r|U|B|bR|RF|br|F|Fr|b|Br|BR|fr|rB|rF|Rf|R|fR|FR|f)\'\'\'|(|u|RB|rf|Rb|rb|r|U|B|bR|RF|br|F|Fr|b|Br|BR|fr|rB|rF|Rf|R|fR|FR|f)""")'

Hexnumber='0[xX](?:_?[0-9a-fA-F])+'

Exponent='[eE][-+]?[0-9](?:_?[0-9])*'

Expfloat='[0-9](?:_?[0-9])*[eE][-+]?[0-9](?:_?[0-9])*'

PlainToken='((([0-9](?:_?[0-9])*[jJ]|(([0-9](?:_?[0-9])*\\.(?:[0-9](?:_?[0-9])*)?|\\.[0-9](?:_?[0-9])*)([eE][-+]?[0-9](?:_?[0-9])*)?|[0-9](?:_?[0-9])*[eE][-+]?[0-9](?:_?[0-9])*)[jJ])|(([0-9](?:_?[0-9])*\\.(?:[0-9](?:_?[0-9])*)?|\\.[0-9](?:_?[0-9])*)([eE][-+]?[0-9](?:_?[0-9])*)?|[0-9](?:_?[0-9])*[eE][-+]?[0-9](?:_?[0-9])*)|(0[xX](?:_?[0-9a-fA-F])+|0[bB](?:_?[01])+|0[oO](?:_?[0-7])+|(?:0(?:_?0)*|[1-9](?:_?[0-9])*)))|(\\r?\\n|(\\~|\\}|\\|=|\\||\\{|\\^=|\\^|\\]|\\[|@=|@|>>=|>>|>=|>|==|=|<=|<<=|<<|<|;|:=|:|/=|//=|//|/|\\.\\.\\.|\\.|\\->|\\-=|\\-|,|\\+=|\\+|\\*=|\\*\\*=|\\*\\*|\\*|\\)|\\(|\\&=|\\&|%=|%|!=))|((|u|RB|rf|Rb|rb|r|U|B|bR|RF|br|F|Fr|b|Br|BR|fr|rB|rF|Rf|R|fR|FR|f)\'[^\\n\'\\\\]*(?:\\\\.[^\\n\'\\\\]*)*\'|(|u|RB|rf|Rb|rb|r|U|B|bR|RF|br|F|Fr|b|Br|BR|fr|rB|rF|Rf|R|fR|FR|f)"[^\\n"\\\\]*(?:\\\\.[^\\n"\\\\]*)*")|\\w+)'

Comment='#[^\\r\\n]*'

@hugovk
Copy link
Member

hugovk commented May 21, 2023

(Here's all the names that aren't in tokenize.__all__ but could be accessed from tokenize on 3.11 (and don't have names starting with leading underscores), and don't exist on main):

Searching the top 5k PyPI projects (for direct uses only, like tokenize.Ignore) doesn't find very many:

Found 28 matching lines in 7 projects
python3 ~/github/misc/cpython/search_pypi_top.py -q . "tokenize\.(any|Binnumber|Comment|ContStr|Decnumber|Double|Double3|endpats|Expfloat|Exponent|Floatnumber|Funny|group|Hexnumber|Ignore|Imagnumber|Intnumber|maybe|Name|Number|Octnumber|PlainToken|Pointfloat|PseudoExtras|PseudoToken|Single|single_quoted|Single3|Special|String|StringPrefix|tabsize|Token|Triple|triple_quoted|Whitespace)\b"
./vowpalwabbit-9.8.0.tar.gz: vowpalwabbit-9.8.0/ext_libs/rapidjson/thirdparty/gtest/googlemock/scripts/generator/cpp/ast.py: type_name.append(tokenize.Token(tokenize.SYNTAX, ' ', 0, 0))
./vowpalwabbit-9.8.0.tar.gz: vowpalwabbit-9.8.0/ext_libs/rapidjson/thirdparty/gtest/googlemock/scripts/generator/cpp/ast.py: last_token = tokenize.Token(tokenize.SYNTAX, ';', 0, 0)
./vowpalwabbit-9.8.0.tar.gz: vowpalwabbit-9.8.0/ext_libs/rapidjson/thirdparty/gtest/googlemock/scripts/generator/cpp/ast.py: name = tokenize.Token(tokenize.NAME, 'operator[]',
./vowpalwabbit-9.8.0.tar.gz: vowpalwabbit-9.8.0/ext_libs/rapidjson/thirdparty/gtest/googlemock/scripts/generator/cpp/ast.py: seq_copy.append(tokenize.Token(tokenize.SYNTAX, '', 0, 0))
./vowpalwabbit-9.8.0.tar.gz: vowpalwabbit-9.8.0/ext_libs/rapidjson/thirdparty/gtest/googlemock/scripts/generator/cpp/ast.py: struct = tokenize.Token(tokenize.NAME, 'struct',
./vowpalwabbit-9.8.0.tar.gz: vowpalwabbit-9.8.0/ext_libs/rapidjson/thirdparty/gtest/googlemock/scripts/generator/cpp/ast.py: if tokens and isinstance(tokens[0], tokenize.Token):
./vowpalwabbit-9.8.0.tar.gz: vowpalwabbit-9.8.0/ext_libs/rapidjson/thirdparty/gtest/googlemock/scripts/generator/cpp/ast.py: internal_token = tokenize.Token(_INTERNAL_TOKEN, _NAMESPACE_POP,
./onnxsim-0.4.26.tar.gz: onnxsim-0.4.26/third_party/onnx-optimizer/third_party/protobuf/third_party/googletest/googlemock/scripts/generator/cpp/ast.py: type_name.append(tokenize.Token(tokenize.SYNTAX, ' ', 0, 0))
./onnxsim-0.4.26.tar.gz: onnxsim-0.4.26/third_party/onnx-optimizer/third_party/protobuf/third_party/googletest/googlemock/scripts/generator/cpp/ast.py: last_token = tokenize.Token(tokenize.SYNTAX, ';', 0, 0)
./onnxsim-0.4.26.tar.gz: onnxsim-0.4.26/third_party/onnx-optimizer/third_party/protobuf/third_party/googletest/googlemock/scripts/generator/cpp/ast.py: name = tokenize.Token(tokenize.NAME, 'operator[]',
./onnxsim-0.4.26.tar.gz: onnxsim-0.4.26/third_party/onnx-optimizer/third_party/protobuf/third_party/googletest/googlemock/scripts/generator/cpp/ast.py: seq_copy.append(tokenize.Token(tokenize.SYNTAX, '', 0, 0))
./onnxsim-0.4.26.tar.gz: onnxsim-0.4.26/third_party/onnx-optimizer/third_party/protobuf/third_party/googletest/googlemock/scripts/generator/cpp/ast.py: struct = tokenize.Token(tokenize.NAME, 'struct',
./onnxsim-0.4.26.tar.gz: onnxsim-0.4.26/third_party/onnx-optimizer/third_party/protobuf/third_party/googletest/googlemock/scripts/generator/cpp/ast.py: if tokens and isinstance(tokens[0], tokenize.Token):
./onnxsim-0.4.26.tar.gz: onnxsim-0.4.26/third_party/onnx-optimizer/third_party/protobuf/third_party/googletest/googlemock/scripts/generator/cpp/ast.py: internal_token = tokenize.Token(_INTERNAL_TOKEN, _NAMESPACE_POP,
./onnxoptimizer-0.3.13.tar.gz: onnxoptimizer-0.3.13/third_party/protobuf/third_party/googletest/googlemock/scripts/generator/cpp/ast.py: type_name.append(tokenize.Token(tokenize.SYNTAX, ' ', 0, 0))
./onnxoptimizer-0.3.13.tar.gz: onnxoptimizer-0.3.13/third_party/protobuf/third_party/googletest/googlemock/scripts/generator/cpp/ast.py: last_token = tokenize.Token(tokenize.SYNTAX, ';', 0, 0)
./onnxoptimizer-0.3.13.tar.gz: onnxoptimizer-0.3.13/third_party/protobuf/third_party/googletest/googlemock/scripts/generator/cpp/ast.py: name = tokenize.Token(tokenize.NAME, 'operator[]',
./onnxoptimizer-0.3.13.tar.gz: onnxoptimizer-0.3.13/third_party/protobuf/third_party/googletest/googlemock/scripts/generator/cpp/ast.py: seq_copy.append(tokenize.Token(tokenize.SYNTAX, '', 0, 0))
./onnxoptimizer-0.3.13.tar.gz: onnxoptimizer-0.3.13/third_party/protobuf/third_party/googletest/googlemock/scripts/generator/cpp/ast.py: struct = tokenize.Token(tokenize.NAME, 'struct',
./onnxoptimizer-0.3.13.tar.gz: onnxoptimizer-0.3.13/third_party/protobuf/third_party/googletest/googlemock/scripts/generator/cpp/ast.py: if tokens and isinstance(tokens[0], tokenize.Token):
./onnxoptimizer-0.3.13.tar.gz: onnxoptimizer-0.3.13/third_party/protobuf/third_party/googletest/googlemock/scripts/generator/cpp/ast.py: internal_token = tokenize.Token(_INTERNAL_TOKEN, _NAMESPACE_POP,
./os_sys-2.1.4.tar.gz: os_sys-2.1.4/edit/editor.py: save_tabsize = tokenize.tabsize
./os_sys-2.1.4.tar.gz: os_sys-2.1.4/edit/editor.py: tokenize.tabsize = self.tabwidth
./os_sys-2.1.4.tar.gz: os_sys-2.1.4/edit/editor.py: tokenize.tabsize = save_tabsize
./pandas-2.0.1.tar.gz: pandas-2.0.1/pandas/_config/config.py: if not re.match("^" + tokenize.Name + "$", k):
./recordlinkage-0.15.tar.gz: recordlinkage-0.15/recordlinkage/config.py: if not bool(re.match('^' + tokenize.Name + '$', k)):
./libcst-0.4.9.tar.gz: libcst-0.4.9/libcst/_parser/base_parser.py: # - Supports our custom Token class, instead of `parso.python.tokenize.Token`.
./libcst-0.4.9.tar.gz: libcst-0.4.9/libcst/_parser/types/token.py: Token = tokenize.Token

Time: 0:00:22.471982
Found 28 matching lines in 7 projects
Screenshot with highlighted matches image

@terryjreedy terryjreedy added stdlib Python modules in the Lib dir and removed topic-IDLE labels May 21, 2023
@AlexWaygood
Copy link
Member Author

AlexWaygood commented May 21, 2023

@terryjreedy, I think the topic-IDLE label makes sense here, no? The potential release blocker, to my mind, is that IDLE is broken on main. From what @pablogsal says in #104719 (comment), even if we add the tokenize.tabsize constant back, IDLE will still be broken. As such, IDLE code will probably have to be changed in order to fix this (unless 6715f91 is reverted).

@terryjreedy
Copy link
Member

The 101 lines, lines 59 to 159, were deleted. These are all PUBLIC names with no indication in the module itself that they are private in any sense. Reading the corresponding doc is optional. To me, removing them without deprecation is a gross violation of back compatibility rules.

I am investigating the IDLE issue.

@pablogsal
Copy link
Member

pablogsal commented May 21, 2023

These are all PUBLIC names with no indication in the module itself that they are private in any sense. Reading the corresponding doc is optional. To me, removing them without deprecation is a gross violation of back compatibility rules.

They are not documented in the docs nor exposed in __all__ so I don't think is that clear that they are public nor what they do. I agree with you that this requires us discussing it but I would not say that this is a "gross violation of back-compatibility rules". For instance, let's say we restore tabsize which has the value 8. Idle is monkey patching the constant so what are the guarantees there of whatever is achieving with this will keep working? There are no tests for any of this nor is documented so is impossible for us to give any guarantees that whatever was working will keep working.

Reading the corresponding doc is optional.

I disagree (but I understand your position): the docs and the __all__ indicates what is supported and what's not. How should someone know what a function does and what the guarantees are?

In any case, as I mention, I think the correct thing here is restore all these constants because it doesn't give us a lot of maintainance burden and minimizes breakage, so is an easy fix. @mgmacias95 will make a PR

@pablogsal
Copy link
Member

For now, let's restore the constants to keep projects and IDLE working, but let's keep the discussion open.

@pablogsal
Copy link
Member

pablogsal commented May 21, 2023

I think IDLE should be fine even without the tabsize monkeypatching. This was the value of tabwith in the tokenize module:

https://github.com/python/cpython/blob/abb32de8c4e9541fbd0c6b14dc937193078e6955/Lib/tokenize.py#LL159C9-L159C9

And this is what IDLE uses (note the comment):

# tabwidth is the display width of a literal tab character.
# CAUTION: telling Tk to use anything other than its default
# tab setting causes it to use an entirely different tabbing algorithm,
# treating tab stops as fixed distances from the left margin.
# Nobody expects this, so for now tabwidth should never be changed.
self.tabwidth = 8 # must remain 8 until Tk is fixed.

There is also this commented-out code:

cpython/Lib/idlelib/editor.py

Lines 1538 to 1544 in abb32de

# XXX this isn't bound to anything -- see tabwidth comments
## def change_tabwidth_event(self, event):
## new = self._asktabwidth()
## if new != self.tabwidth:
## self.tabwidth = new
## self.set_indentation_params(0, guess=0)
## return "break"

So this was monkeypatching 8 with 8 and IDLE doesn't customize this value so this code

cpython/Lib/idlelib/editor.py

Lines 1646 to 1647 in abb32de

save_tabsize = tokenize.tabsize
tokenize.tabsize = self.tabwidth

should keep working with the new changes because it does not do anything in reality (and the tokenize interface should be preserved as the test pass).

@terryjreedy
Copy link
Member

I recompiled. Upon opening a non-blank file, I got the traceback and a half-opened blank editor window that will not close.
tabsize is the only 'unapproved' name in tokenize that IDLE looks at. I will immediately remove the monkeypatching. It was added in 2002. change_tabwith_event was commented out in 2005, making the monkeypatch pretty much obsolete. Since at least 2.7, indent tabs are up 8 spaces. Allowing the Python-coded tokenizer to interpret tabs differently from the C tokenizer makes no sense. I would mark tabsize as deprecated now and remove as soon as SC allows.

I think that syntax element REs should be moved to an re submodule, with explanations and tests. idlelib.colorizer has a similar but not totally identical collection. I have no idea why the near-duplication and differences and which are better in whatever sense.

@pablogsal
Copy link
Member

pablogsal commented May 21, 2023

I recompiled. Upon opening a non-blank file, I got the traceback and a half-opened blank editor window that will not close.

This is before ffe47cb or after? I can succesfully execute /python -m idlelib Lib/abc.py and seems to be working.

@AlexWaygood
Copy link
Member Author

I recompiled. Upon opening a non-blank file, I got the traceback and a half-opened blank editor window that will not close.

This is before ffe47cb or after? I can succesfully execute /python -m idlelib Lib/abc.py and seems to be working.

IDLE also appears fixed for me on main, following ffe47cb.

terryjreedy added a commit to terryjreedy/cpython that referenced this issue May 21, 2023
@terryjreedy
Copy link
Member

terryjreedy commented May 21, 2023

IDLE now working, made patch on branch tokenpatch, but have problem described on discord 'off-topic'.

EDIT: CAM G helped fixup.

terryjreedy added a commit to terryjreedy/cpython that referenced this issue May 21, 2023
miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 21, 2023
…nGH-104726)

(cherry picked from commit 0c5e79b)

Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
terryjreedy added a commit that referenced this issue May 21, 2023
…04726) (#104727)

gh-104719: IDLE - delete useless monkeypatch of tokenize (GH-104726)
(cherry picked from commit 0c5e79b)

Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
@AlexWaygood
Copy link
Member Author

Reopening, as there's still two things left to do here:

  1. Add a regression test for the IDLE issue. As @pablogsal noted in IDLE is unable to open any .py files #104719 (comment), it's a little concerning that this regression didn't cause any test failures.
  2. We're still waiting on @Yhg1s's feedback regarding what we should do with the undocumented constants in the tokenize module -- whether we should expose them as public API or add a deprecation warning and remove them in 3.14.

@AlexWaygood AlexWaygood reopened this May 21, 2023
terryjreedy added a commit to terryjreedy/cpython that referenced this issue May 22, 2023
This class contains all editor references to tokenize module.
@terryjreedy
Copy link
Member

I decided that it would in general be a good idea to have test coverage of references to non-idlelib modules. I have a toktest branch that adds coverage of the class containing the failure and all tokenize references in editor.py. I will make a PR tomorrow with or without coverage of tokenize references in 2 other idlelib modules.

@Yhg1s
Copy link
Member

Yhg1s commented May 22, 2023

FWIW, whether to deprecate or document the unused globals isn't really a RM question, but considering the difficulty of emitting warnings for them, I'd rather not do that right now (but for 3.13 it's fine). Documenting them as for internal use only and/or deprecated can be done after b1.

terryjreedy added a commit that referenced this issue May 24, 2023
Class editor.IndentSearcher contains all editor references to tokenize module.
Module io tokenize reference cover those other modules.

Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 24, 2023
…ythonGH-104767)

Class editor.IndentSearcher contains all editor references to tokenize module.
Module io tokenize reference cover those other modules.

(cherry picked from commit e561c09)

Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 24, 2023
…ythonGH-104767)

Class editor.IndentSearcher contains all editor references to tokenize module.
Module io tokenize reference cover those other modules.

(cherry picked from commit e561c09)

Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
terryjreedy added a commit that referenced this issue May 24, 2023
…H-104767) (#104844)

gh-104719: IDLE - test existence of all tokenize references. (GH-104767)

Class editor.IndentSearcher contains all editor references to tokenize module.
Module io tokenize reference cover those other modules.

(cherry picked from commit e561c09)

Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
terryjreedy added a commit that referenced this issue May 24, 2023
…H-104767) (#104845)

gh-104719: IDLE - test existence of all tokenize references. (GH-104767)

Class editor.IndentSearcher contains all editor references to tokenize module.
Module io tokenize reference cover those other modules.

(cherry picked from commit e561c09)

Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
@terryjreedy
Copy link
Member

IDLE now has a regression test that will fail if any tokenizer names change. I suggest we close this and someone open a new issue about the now obsolete constants.

@AlexWaygood
Copy link
Member Author

IDLE now has a regression test that will fail if any tokenizer names change. I suggest we close this and someone open a new issue about the now obsolete constants.

That sounds good to me. I'll try to open an issue later today.

Thanks @terryjreedy, @pablogsal and @mgmacias95!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes stdlib Python modules in the Lib dir topic-IDLE type-bug An unexpected behavior, bug, or error
Projects
Status: Done
Development

No branches or pull requests

6 participants
@hugovk @Yhg1s @pablogsal @terryjreedy @AlexWaygood and others