Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many HTML entities don't work #43

Closed
ByCzech opened this issue Jul 3, 2019 · 5 comments
Closed

Many HTML entities don't work #43

ByCzech opened this issue Jul 3, 2019 · 5 comments

Comments

@ByCzech
Copy link

ByCzech commented Jul 3, 2019

Many HTML entities i.e. … cause KeyError:

Try:
kajiki.XMLTemplate(u'<div>&copy; &hellip;</div>')

Result:
Traceback (most recent call last): File "<input>", line 1, in <module> Template = kajiki.XMLTemplate(u'<div>&copy; &hellip;</div>') File "/usr/lib/python2.7/dist-packages/kajiki/xml_template.py", line 52, in XMLTemplate doc = _Parser(filename, source).parse() File "/usr/lib/python2.7/dist-packages/kajiki/xml_template.py", line 629, in parse parser.parse(source) File "/usr/lib/python2.7/xml/sax/expatreader.py", line 110, in parse xmlreader.IncrementalParser.parse(self, source) File "/usr/lib/python2.7/xml/sax/xmlreader.py", line 123, in parse self.feed(buffer) File "/usr/lib/python2.7/xml/sax/expatreader.py", line 213, in feed self._parser.Parse(data, isFinal) File "/usr/lib/python2.7/xml/sax/expatreader.py", line 416, in skipped_entity_handler self._cont_handler.skippedEntity(name) File "/usr/lib/python2.7/dist-packages/kajiki/xml_template.py", line 675, in skippedEntity return self.characters(html5[name]) KeyError: u'hellip'

@philippkraft
Copy link

Same is true for greek letters and math symbols

@philippkraft
Copy link

The problem seems to be, that the entity translator dict kajiki.entities.html5 does contain some keys with and without a trailing semicolon, while most entities are only saved with a trailing semicolon. The code referenced in the traceback of @ByCzech shows the key without a trailing semicolon. A cheap fix would be to extend the dict with all keys that have a semicolon with a copy of the value without the semicolon.

The following monkey patch works for me:

import kajiki
kajiki.entities.html5.update(
    {k[:-1]: v 
     for k, v in kajiki.entities.html5.items() 
     if k[-1]==';' and k[:-1] not in kajiki.entities.html5
})

from kajiki import XMLTemplate
t = XMLTemplate('<xml>&hellip;</xml>')
print(t().render())

If this fast fix is ok for now, I can provide a PR, of course.

@amol-
Copy link
Collaborator

amol- commented Nov 25, 2019

35e916d

should provide a reasonable solution to the problem without having to maintain our own delta from https://github.com/python/cpython/blob/master/Lib/html/entities.py#L264

@amol- amol- closed this as completed Nov 25, 2019
@philippkraft
Copy link

Thank you for the fast solution, for now I am going with this, do you think the version estimate to get it fixed in a PyPI release ok?

if kajiki.__version__ < '0.9':
    kajiki.entities.html5.update(
        {k[:-1]: v
         for k, v in kajiki.entities.html5.items()
         if k[-1] == ';' and k[:-1] not in kajiki.entities.html5
    })

@amol-
Copy link
Collaborator

amol- commented Nov 26, 2019

Released 0.8.2 https://pypi.org/project/Kajiki/0.8.2/

@amol- amol- reopened this Nov 26, 2019
@amol- amol- closed this as completed Nov 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants