Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: getpos() for sgmllib #39602

Closed
d98dzone mannequin opened this issue Nov 25, 2003 · 6 comments
Closed

Request: getpos() for sgmllib #39602

d98dzone mannequin opened this issue Nov 25, 2003 · 6 comments
Labels
easy stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@d98dzone
Copy link
Mannequin

d98dzone mannequin commented Nov 25, 2003

BPO 849097
Nosy @devdanzin
Files
  • diff.txt: Unix diff on The updated version and the CVS version(1.46)
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2010-08-22.10:44:01.773>
    created_at = <Date 2003-11-25.16:47:35.000>
    labels = ['easy', 'type-feature', 'library']
    title = 'Request: getpos() for sgmllib'
    updated_at = <Date 2010-08-22.10:44:01.771>
    user = 'https://bugs.python.org/d98dzone'

    bugs.python.org fields:

    activity = <Date 2010-08-22.10:44:01.771>
    actor = 'BreamoreBoy'
    assignee = 'none'
    closed = True
    closed_date = <Date 2010-08-22.10:44:01.773>
    closer = 'BreamoreBoy'
    components = ['Library (Lib)']
    creation = <Date 2003-11-25.16:47:35.000>
    creator = 'd98dzone'
    dependencies = []
    files = ['1117']
    hgrepos = []
    issue_num = 849097
    keywords = ['patch', 'easy']
    message_count = 6.0
    messages = ['19144', '19145', '19146', '81883', '114300', '114669']
    nosy_count = 4.0
    nosy_names = ['nnorwitz', 'd98dzone', 'ajaksu2', 'BreamoreBoy']
    pr_nums = []
    priority = 'normal'
    resolution = 'out of date'
    stage = 'test needed'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue849097'
    versions = ['Python 3.2']

    @d98dzone
    Copy link
    Mannequin Author

    d98dzone mannequin commented Nov 25, 2003

    During the process of making my masters thesis I
    discovered the need for a working getpos() in
    sgmllib.py. As it is now you can successfully call it
    since it is inherited from markupbase.py but you will
    always get the answer (1,0) since it is never updated.

    To fix this one needs to change the goahead function.
    This is my own implementation of this change, in part
    influenced by the "sister" goahead-function in
    HTLMParser.py:


    def goahead(self, end):
    rawdata = self.rawdata
    i = 0
    k = 0
    n = len(rawdata)
    tmp=0
    while i < n:
    if self.nomoretags:
    self.handle_data(rawdata[i:n])
    i = n
    break
    match = interesting.search(rawdata, i)
    if match: j = match.start()
    else: j = n
    if i < j:
    self.handle_data(rawdata[i:j])
    tmp = self.updatepos(i, j)
    i = j
    if i == n: break
    startswith = rawdata.startswith
    if rawdata[i] == '<':
    if starttagopen.match(rawdata, i):
    if self.literal:
    self.handle_data(rawdata[i])
    tmp = self.updatepos(i, i+1)
    i = i+1
    continue
    k = self.parse_starttag(i)
    if k < 0: break
    tmp = self.updatepos(i, k)
    i = k
    continue
    if rawdata.startswith("</", i):
    k = self.parse_endtag(i)
    if k < 0: break
    tmp = self.updatepos(i, k)
    i = k
    self.literal = 0
    continue
    if self.literal:
    if n > (i + 1):
    self.handle_data("<")
    i = i+1
    tmp = self.updatepos(i, k)
    else:
    # incomplete
    break
    continue
    if rawdata.startswith("<!--", i):
    # Strictly speaking, a comment
    is --.*--
    # within a declaration tag <!...>.
    # This should be removed,
    # and comments handled only in
    parse_declaration.
    k = self.parse_comment(i)

                    if k \< 0: break
                    tmp = self.updatepos(i, k)
                    i = k
    
                    continue
                if rawdata.startswith("\<?", i):
                    k = self.parse_pi(i)
                    if k \< 0: break
                    tmp = self.updatepos(i, k)
                    i = i+k
                    continue
                if rawdata.startswith("\<!", i):
                    \# This is some sort of declaration;
    

    in "HTML as
    # deployed," this should only be
    the document type
    # declaration ("<!DOCTYPE html...>").
    k = self.parse_declaration(i)
    if k < 0: break
    tmp = self.updatepos(i, k)
    i = k
    continue
    tmp = self.updatepos(i, k)
    elif rawdata[i] == '&':

                if self.literal:
                    self.handle_data(rawdata[i])
                    #tmp = self.updatepos(i,i+1)#added
                    i = i+1
                    continue
                match = charref.match(rawdata, i)
                if match:
                    name = match.group()[2:-1]
                    self.handle_charref(name)
                    k = match.end()
                    if not startswith(';', k-1):
                        k = k - 1
                    tmp = self.updatepos(i, k)
                    i = k
                    continue
                match = entityref.match(rawdata, i)
                if match:
                    name = match.group(1)
                    self.handle_entityref(name)
                    k = match.end()
                    if not startswith(';', k-1):
                        k = k - 1
                    tmp = self.updatepos(i, k)
                    i = k
                    continue
                
            else:
                self.error('neither \< nor & ??')
            \# We get here only if incomplete matches but
            \# nothing else
            match = incomplete.match(rawdata, i)
            if not match:
                self.handle_data(rawdata[i])
                i = i+1
                continue
            j = match.end(0)
            if j == n:
                break # Really incomplete
            self.handle_data(rawdata[i:j])
    
                i = j
    
                
            # end while
            if end and i < n:
                self.handle_data(rawdata[i:n])
                tmp = self.updatepos(i, n)
                i = n
            self.rawdata = rawdata[i:]
            # XXX if end: check for empty stack
    # Extensions for the DOCTYPE scanner:
    _decl_otherchars = '='
    

    The major diffrence is the updatepos functions. It
    seems to work fine, or at least it has worked fine for
    me so far.

    @d98dzone d98dzone mannequin added stdlib Python modules in the Lib dir labels Nov 25, 2003
    @nnorwitz
    Copy link
    Mannequin

    nnorwitz mannequin commented Nov 25, 2003

    Logged In: YES
    user_id=33168

    Can you please post a context diff against the version in
    CVS as an attachment? Formatting is not preserved when
    viewing through SF. Thanks.

    @d98dzone
    Copy link
    Mannequin Author

    d98dzone mannequin commented Dec 2, 2003

    Logged In: YES
    user_id=917420

    Added an attachment with the diffrence to the current file
    version. This har three parts. The first is just updatepos
    inserted at the correct places in the function goahead. The
    second is from the part of the goahead function which
    handles the &-characters. I had a hard time making it work
    with the current model and changed it to a version inspired
    by the same part of the goahead-function in HTMLParser.py.
    The last is the printouts in the testfunction to check if
    the function performs ok.

    @akuchling akuchling added type-feature A feature request or enhancement labels Feb 19, 2008
    @devdanzin
    Copy link
    Mannequin

    devdanzin mannequin commented Feb 13, 2009

    Closed bpo-868908 as a duplicate of this one.

    @devdanzin devdanzin mannequin added easy labels Apr 22, 2009
    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Aug 18, 2010

    Anyone interested in this? I found the patch unreadable but YMMV.

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Aug 22, 2010

    sgmllib has been deprecated since 2.6 and has been removed from py3k.

    @BreamoreBoy BreamoreBoy mannequin closed this as completed Aug 22, 2010
    @BreamoreBoy BreamoreBoy mannequin closed this as completed Aug 22, 2010
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    easy stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant