-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IDLE: Fix pyparse.find_good_parse_start #77170
Comments
The call to find_good_parse_start: bod = y.find_good_parse_start(self.context_use_ps1,
self._build_char_in_string_func(startatindex)) sends 3 parameters. And in pyparse.find_good_parse_start(), the signature allows 3. However, the signature is: def find_good_parse_start(self, is_char_in_string=None,
_synchre=_synchre): This means that the if not is_char_in_string:
# no clue -- make the caller pass everything
return None Here's the commit that changed the signature: |
To be clear, the signature that got changed in 2005 is the signature for find_good_parse_start ('fgps'), which was previously def find_good_parse_start(self, use_ps1, is_char_in_string=None,
_synchre=_synchre) When the use_ps1 parameter was removed, the 'if use_ps1' code was moved to the else: branch of the new 'if not use_ps1: ... else: ' in the editor method, but the call in question, moved into the 'if not use_ps1' branch, was not changed. The immediate fix is to remove the extra argument. The similar call in the then new hyperparser module is correct. bod = parser.find_good_parse_start(
editwin._build_char_in_string_func(startatindex)) The erroneous call has not been detected in execution because of this bug: Both calls to fgps (editor and hyperparser) pass _build_char_in_string_func(initial_start), which unconditionally returns a function. So I think we can delete '=None' and the early return, rather than changing the early return condition. |
If fgps never returns 0, then returning 0 instead of None would allow simplification of if bod is not None or startat == 1:
break
parser.set_lo(bod or 0) to If it can (or should) ever return 0, separate from None, I would like to see a test case for that. We could then think about whether or not the loop should break on 0 as well as None. Perhaps separate issue: the 'if use_ps1' statements in editor and hyperparser, and a couple of lines before, is nearly identical, and could be factored into a separate editor method that returns a parser instance ready for analysis. It could then be tested in isolation. The method should return a parser instance ready for analysis. Both blocks have an explicit set_lo(0) call, which does nothing, and could be removed. |
Since _synchre is never passed, it should not be a parameter either. I think we should either limit this to fixing the call, with no unit test added, or expand to 'fix find_good_parse_start and buggy call', with revised tests. It might be interesting to verify that the time it takes to find a better start point is less that the time saved in later analysis. |
I didn't incorporate all the suggestions into the first PR for this so that it could focus on the parameter change and the tests. Oddly enough, I don't know how to show in the tests that the bug fix makes any difference. Finding a "good parse start" is more about parsing faster, but, as the tests show from running all the tests with the
I'd like to address the on another PR that focuses more on refactoring pyparse than this one.
It can return 0, separately from None, as some of the tests on this PR show.
I agree. The duplicated code is bugging me. :-) --------- |
I agree with limiting the scope to the None bug and the faulty call. However, we should think of the None fix as primary, and the new test thereof as the primary test. Fixing the None check exposes the call bug, which we also fix. I change the title here and on PR. As you noted, the new editor TestCase is not directly relevant to testing the double fix except to show that there is no change in indent. The way to do that is to pass None instead of the in-string function. I did that (temporarily!) and the test passes, meaning that the indents are the same. (I do however think some of them are dubious, and I want to mark those cases.) We could have made editor tests that initially failed by exposing bod as an instance attribute (as we have done before), and including a longer test case for which bod is a positive int. However, bod should remain local. As an alternative, for experimentation, I added print(bod). The values for the patch are 0, None, None, None, None, None, 0, 0, None. I added ' a\n' to 'Block opener - indents +1 level.' and changed the mark and the 5th 'None' became '4'. The fact that passing None and _build_char_in_string_func(startatindex) result in the same indents raises the question of whether the call has any benefit in reducing net time after the followup call to get_continuation_type(). Maybe tomorrow I will try to write a good timeit test. In the meanwhile, to get some idea of how well find_good_parse_start finds good parse starts, I restarted IDLE in a console with the print still added, loaded editor.py, and hit RETURN followed by UNDO, in various places. The first non-zero bod, 812, comes with the cursor at the end of 'def _sphinx_version():' 812 is probably the beginning of the line. After "if __name__ == '__main__':" near the end, 1416. After the final "run(_editor_window)", 1654. The highest value I got in about 10 tries past the middle, 1931. To me, this is pathetically bad. I tried turning on CodeContext and got the same results where I tested. bod should just be the beginning of the last context line. I am not optimistic about timing results. |
Print It seems that the purpose of the parsing is to apply the translate, etc to as few lines as possible. So, it tries to make sure it includes the openers (':' ending lines) and closers (return, pass, etc) and the beginning of the brackets and continuation lines. The big thing is that it wants to make sure it's not in a string or comment. So, I think the program almost overcompensates for the idea of a 'large string'. It is very complex and very hard to figure out exactly what it is trying to accomplish, even with the comments. Maybe modern computing power (compared to 2000) has made it such that translating a whole source file is quick enough to not need fancy parsing. :-) |
After updating the patch, I noticed that deletion of self.context_use_ps1 from the editor call to y.find_good_parse_start was no longer part of the diff. This is because 3.8 changeset 6bdc4de for bpo-35610 (backported only to 3.7) already did so. The remaining substantive change other than new tests is the removal of the unused default value None in the signature of the function and unused code triggered by 'if not arg'. I don't remember how well I tested that never exiting find...start early left indentations unchanged. This would require comparing behavior to 3.6. It would make more sense to check that the behavior and new tests conform to PEP-8. |
Cheryl, sorry to take so long. I will look at and probably remove the _synchre parameter, on this issue. Feel free to pursue any of the other possible followups. |
Tim, idlelib.pyparse has this definition: # Find what looks like the start of a popular statement.
_synchre = re.compile(r"""
^
[ \t]*
(?: while
| else
| def
| return
| assert
| break
| class
| continue
| elif
| try
| except
| raise
| import
| yield
)
\b
""", re.VERBOSE | re.MULTILINE).search You are credited with adding 'yield' to David Sherer's original list: Do you know if there is any reason to not add 'if', 'for', and now 'with'? |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: