Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTMLParser.handle_data may be invoked although HTMLParser.reset was invoked #70398

Open
Hibou57 mannequin opened this issue Jan 26, 2016 · 7 comments
Open

HTMLParser.handle_data may be invoked although HTMLParser.reset was invoked #70398

Hibou57 mannequin opened this issue Jan 26, 2016 · 7 comments
Labels
stdlib Python modules in the Lib dir

Comments

@Hibou57
Copy link
Mannequin

Hibou57 mannequin commented Jan 26, 2016

BPO 26210
Nosy @ezio-melotti, @zhangyangyu

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2016-01-26.21:10:08.489>
labels = ['library']
title = '`HTMLParser.handle_data` may be invoked although `HTMLParser.reset` was invoked'
updated_at = <Date 2016-01-27.10:11:03.829>
user = 'https://bugs.python.org/Hibou57'

bugs.python.org fields:

activity = <Date 2016-01-27.10:11:03.829>
actor = 'Hibou57'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2016-01-26.21:10:08.489>
creator = 'Hibou57'
dependencies = []
files = []
hgrepos = []
issue_num = 26210
keywords = []
message_count = 7.0
messages = ['258973', '258992', '259000', '259002', '259003', '259009', '259011']
nosy_count = 3.0
nosy_names = ['ezio.melotti', 'Hibou57', 'xiang.zhang']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = None
url = 'https://bugs.python.org/issue26210'
versions = ['Python 3.5']

@Hibou57
Copy link
Mannequin Author

Hibou57 mannequin commented Jan 26, 2016

HTMLParser.handle_data may be invoked although HTMLParser.reset was invoked. This occurs at least when HTMLParser.reset was invoked during HTMLParser.handle_endtag.

According to the documentation, HTMLParser.reset discard all data, so it should immediately stop the parser.

Additionally as an aside, it's strange HTMLParser.reset is invoked during object creation as it's invoking a method on an object which is potentially not entirely initialized (that matters with derived classes).

@Hibou57 Hibou57 mannequin added the stdlib Python modules in the Lib dir label Jan 26, 2016
@zhangyangyu
Copy link
Member

reset just set some attributes to the initial states and it does not control the parsing process. So reading the gohead function, even if reset is called in handle_endtag and all data are discarded, it is still possible for the process to move forward.

@Hibou57
Copy link
Mannequin Author

Hibou57 mannequin commented Jan 27, 2016

The documentation says:

Reset the instance. Loses all unprocessed data.

How can parsing go ahead with all unprocessed data lost? This is the “Loses all unprocessed data” which made me believe it is to stop it.

May be the documentation is unclear.

By the way, if reset does not stop the parser, then a stop method is missing. I searched for it, and as there was nothing else and could not imagine the parser cannot be stopped, I though reset is the way to stop it.

@zhangyangyu
Copy link
Member

Actually it does move forward since in goahead, it first store a "copy" of the initial self.rawdata and use it to control the flow. If you make some change to self.rawdata when parsing, for example call reset, goahead can not feel it. But methods parse_* can. So the data conflicts.

I think it's not proper to change self.rawdata when parsing. You can easily get various errors by doing that.

@Hibou57
Copy link
Mannequin Author

Hibou57 mannequin commented Jan 27, 2016

Thanks Xiang, for the clear explanations.

So an error should be triggered when reset is invoked while it should not. And remains the issue about how to stop the parser: should an exception be raised and caught at an outer invocation level? Something like raising StopIteration? (I don't enjoy using exceptions for flow control, but that seems to be the Python way, cheese).

@zhangyangyu
Copy link
Member

Hmm, I don't know whether I am right or not. Let's wait for a core member to clarify. If I am wrong, I am quite sorry.

I don't think invoking reset when parsing should raise an error(and I don't know how to achieve that). When to invoke a subroutine is determined by the programmer. You can always put a well-written subroutine in some wrong place and then cause error. And I don't see how to stop the process either.

@Hibou57
Copy link
Mannequin Author

Hibou57 mannequin commented Jan 27, 2016

And I don't see how to stop the process either.

I just did it with raise StopIteration, caught at a proper place (in the procedure which invokes feed and close), and it seems to be fine, I have no more strange behaviours. At least, I cannot see a cleaner way.

Now reset is invoked after end of parsing only (thus to be able to have a next round).

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir
Projects
None yet
Development

No branches or pull requests

1 participant