Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get rid of getPhases #272

Closed
wants to merge 2 commits into from
Closed

Conversation

gsnedders
Copy link
Member

With this we no longer include "process the token using the rules for" phases in the debug log.

This also needs perf review given it touches mainLoop.

(If you want to view the diff, append ?w=1 to the URL to ignore whitespace, otherwise the reindenting of the phases just dominates.)

@codecov-io
Copy link

codecov-io commented Jul 13, 2016

Current coverage is 90.33%

Merging #272 into master will decrease coverage by 0.10%

@@             master       #272   diff @@
==========================================
  Files            51         51          
  Lines          6926       6904    -22   
  Methods           0          0          
  Messages          0          0          
  Branches       1332       1329     -3   
==========================================
- Hits           6264       6237    -27   
- Misses          501        506     +5   
  Partials        161        161          

Powered by Codecov. Last updated by 945911b...a120557

@willkg
Copy link
Contributor

willkg commented Oct 3, 2017

@gsnedders Is this important? There's no explanatory issue, so I'm not sure what the value is here and whether we should spend time on it for 1.0 or not.

@gsnedders
Copy link
Member Author

@willkg Reduction in complexity and potentially performance gains by reducing indirection.

@willkg
Copy link
Contributor

willkg commented Oct 5, 2017

Ok. I'm going to push this off until after 1.0, then.

@codecov-commenter
Copy link

codecov-commenter commented Jun 24, 2020

Codecov Report

Merging #272 into master will decrease coverage by 0.03%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #272      +/-   ##
==========================================
- Coverage   91.07%   91.03%   -0.04%     
==========================================
  Files          50       50              
  Lines        7044     7016      -28     
  Branches     1341     1337       -4     
==========================================
- Hits         6415     6387      -28     
  Misses        475      475              
  Partials      154      154              
Impacted Files Coverage Δ
html5lib/_utils.py 81.25% <ø> (-1.71%) ⬇️
html5lib/html5parser.py 98.41% <ø> (-0.03%) ⬇️
html5lib/tests/test_parser2.py 100.00% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f4646e6...db7310d. Read the comment docs.

This added a fair bit of complexity, and notable made the Phase classes
dynamically generated.

However, by doing this, we no longer include "process the
token using the rules for" phases in the debug log.
Copy link
Contributor

@jayaddison jayaddison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB: I've no write or commit access here, but to me these changes look good.

From local checkout & review:

  • Tests pass
  • Performance impact (for parser benchmarks, since that's what this code affects): neutral (cpython 3.9.1)

Although there doesn't seem to be a performance improvement from this, in my opinion it does work towards making the code simpler and opens up future refactoring and cleanup work that is likely to improve performance in more significant ways.

@@ -201,6 +188,9 @@ def mainLoop(self):
DoctypeToken = tokenTypes["Doctype"]
ParseErrorToken = tokenTypes["ParseError"]

type_names = {value: key for key, value in tokenTypes.items()}
debug = self.debug
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gsnedders Is there a reason to bring tokenTypes and self.debug into local method variables here?

At L225 we could compare using if self.debug and similarly at L226 we could use info = {"type": tokenTypes[type]}.

(am guessing it's possibly an artifact of some IDE-based refactoring? Hopefully a straightforward cleanup either way)

else:
return type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice :) I get some minor worries about difficult-to-debug future issues when built-in Python keywords like type get used as variables / returned as values. Removing this is a nice improvement 👍

@@ -68,7 +68,6 @@ def test_debug_log():
('dataState', 'InBodyPhase', 'InBodyPhase', 'processStartTag', {'name': 'p', 'type': 'StartTag'}),
('dataState', 'InBodyPhase', 'InBodyPhase', 'processCharacters', {'type': 'Characters'}),
('dataState', 'InBodyPhase', 'InBodyPhase', 'processStartTag', {'name': 'script', 'type': 'StartTag'}),
('dataState', 'InBodyPhase', 'InHeadPhase', 'processStartTag', {'name': 'script', 'type': 'StartTag'}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! How did you discover this duplicate entry, out of interest?

# pylint:enable=unused-argument


_phases = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core of the improvement - looks good generally. What do you think about going one step further and moving this to the initialization of the HTMLParser's self.phases at L121?

@ambv ambv mentioned this pull request Mar 3, 2023
@ambv
Copy link
Member

ambv commented Feb 21, 2024

This was merged as #567.

@ambv ambv closed this Feb 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants