Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTMLParser: undocumented not implemented method #76025

Closed
srittau mannequin opened this issue Oct 23, 2017 · 10 comments
Closed

HTMLParser: undocumented not implemented method #76025

srittau mannequin opened this issue Oct 23, 2017 · 10 comments
Assignees
Labels
3.10 stdlib type-bug

Comments

@srittau
Copy link
Mannequin

@srittau srittau mannequin commented Oct 23, 2017

BPO 31844
Nosy @srittau, @ezio-melotti, @berkerpeksag, @csabella
PRs
  • #8562
  • #21504
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ezio-melotti'
    closed_at = <Date 2020-07-16.06:39:25.857>
    created_at = <Date 2017-10-23.08:27:23.318>
    labels = ['type-bug', 'library', '3.10']
    title = 'HTMLParser: undocumented not implemented method'
    updated_at = <Date 2020-07-16.06:39:25.857>
    user = 'https://github.com/srittau'

    bugs.python.org fields:

    activity = <Date 2020-07-16.06:39:25.857>
    actor = 'berker.peksag'
    assignee = 'ezio.melotti'
    closed = True
    closed_date = <Date 2020-07-16.06:39:25.857>
    closer = 'berker.peksag'
    components = ['Library (Lib)']
    creation = <Date 2017-10-23.08:27:23.318>
    creator = 'srittau'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 31844
    keywords = ['patch']
    message_count = 10.0
    messages = ['304782', '304783', '305303', '305306', '322662', '322672', '323968', '371500', '373745', '373746']
    nosy_count = 5.0
    nosy_names = ['srittau', 'ezio.melotti', 'berker.peksag', 'cheryl.sabella', 'William Ayd']
    pr_nums = ['8562', '21504']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue31844'
    versions = ['Python 3.10']

    @srittau
    Copy link
    Mannequin Author

    @srittau srittau mannequin commented Oct 23, 2017

    HTMLParser derives from _markupbase.ParserBase, which has the following method:

    class HTMLParser:
    ...
    
        def error(self, message):
            raise NotImplementedError(
                "subclasses of ParserBase must override error()")

    HTMLParser does not implement this method and the documentation for HTMLParser (https://docs.python.org/3.6/library/html.parser.html) does not mention that its sub-classes need to override it.

    I am not sure whether this is a documentation omission, whether HTMLParser should provide an (empty?) implementation, or whether ParserBase should not raise a NotImplementedError (to make linters happy).

    @srittau
    Copy link
    Mannequin Author

    @srittau srittau mannequin commented Oct 23, 2017

    The quoted code above should have used ParserBase:

    class ParserBase:
    ...
    
        def error(self, message):
            raise NotImplementedError(
                "subclasses of ParserBase must override error()")

    @WilliamAyd
    Copy link
    Mannequin

    @WilliamAyd WilliamAyd mannequin commented Oct 31, 2017

    Would we be open to setting the meta class of the ParserBase to ABCMeta and setting error as an abstract method? That at the very least would make the expectation clearer for subclasses.

    I haven’t contributed to Python before but am open to this as a first attempt if the direction makes sense.

    @WilliamAyd
    Copy link
    Mannequin

    @WilliamAyd WilliamAyd mannequin commented Oct 31, 2017

    And assuming that subclass requirement is intentional we could add an optional keyword argument to the HTMLParser that indicates what to do with errors, much like how encoding issues are handled within codecs. For backwards compatibility it can default to ignore, but fail and warn could be two alternate approaches that the error method could account for

    @serhiy-storchaka serhiy-storchaka added stdlib type-bug labels Oct 31, 2017
    @berkerpeksag
    Copy link
    Member

    @berkerpeksag berkerpeksag commented Jul 30, 2018

    HTMLParser.error() method was deprecated in Python 3.4 (88ebfb1#diff-1a7486df8279dbac7f20abd487947845R157) and removed in Python 3.5 (73a4359#diff-1a7486df8279dbac7f20abd487947845L171)

    _markupbase is a private and undocumented module and its only user is HTMLParser (sgmllib was removed from the stdlib in 2008) Since we already have removed HTMLParser.error(), I think we can just remove _markupbase.ParserBase.error() without a deprecation period.

    @srittau
    Copy link
    Mannequin Author

    @srittau srittau mannequin commented Jul 30, 2018

    Good call. Maybe it's actually time to retire _markupbase and merge ParserBase into HTMLParser.

    @berkerpeksag
    Copy link
    Member

    @berkerpeksag berkerpeksag commented Aug 23, 2018

    After triaging bpo-34480, I realized that we can't simply remove the error() method because the _markupbase.ParserBase() class still uses it. I've just closed PR 8562.

    @ezio-melotti ezio-melotti self-assigned this Aug 25, 2018
    @csabella
    Copy link
    Contributor

    @csabella csabella commented Jun 14, 2020

    @berker.peksag's last comment was he closed the PR on 23 August 2018. However, he reopened it on 6 January 2020 as @ezio.melotti mentioned that they are both needed.

    The PR for this issue is waiting to be re-reviewed by Ezio.

    @csabella csabella added 3.10 and removed 3.8 labels Jun 14, 2020
    @berkerpeksag
    Copy link
    Member

    @berkerpeksag berkerpeksag commented Jul 16, 2020

    New changeset e34bbfd by Berker Peksag in branch 'master':
    bpo-31844: Remove _markupbase.ParserBase.error() (GH-8562)
    e34bbfd

    @berkerpeksag
    Copy link
    Member

    @berkerpeksag berkerpeksag commented Jul 16, 2020

    New changeset d4d127f by Berker Peksag in branch 'master':
    bpo-31844: Move whatsnew note to 3.10.rst (GH-21504)
    d4d127f

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 stdlib type-bug
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants