svg double hyphen in plot title -- #1967

Merged
merged 1 commit into from May 3, 2013

Projects

None yet

3 participants

@cirosantilli

a double hyphen in plot title breaks the svg output because the double hyphen -- gets included in a xml comment, breaking it

minimal example:

import matplotlib.pyplot as plt
plt.plot([0,1])
plt.title('--')
plt.savefig( 'a.svg', format='svg' )
plt.savefig( 'a.png', format='png' )

now try and open svg output it with firefox and it will point at the error line, 479 in version 1.2.1. which contains:

<!-- -- -->

but xml comments cannot contain -- http://stackoverflow.com/questions/10842131/xml-comments-and

I could not find in the stdlib a standard way to escape/unescape double hyphens, but if anyone does, we should use it.

if an standard way does not exist, my proposed solution is: replace -- with something else in the svg comments. I suggest

  • replace all occurrences of -- with -.-, where . is any literal character
  • replace all occurrences of -.- with -..-
  • -..- with -...-

and that this be done in an overlapping manner:

  • --- becomes -.-.-

This can be achieved with the following class:

import re

class XmlCommentEscaper:
    """Escapes and unescapes ``--`` in xml comments so that they are not broken."""

    def __init__(self):
        self.escape_xml_comment = re.compile( r'-(\.*)(?=-)')

    def escape(self,comment):
        """
        >>> XmlCommentEscaper().escape('--')
        '-.-'
        >>> XmlCommentEscaper().escape('-- --')
        '-.- -.-'

        multiple:

        >>> XmlCommentEscaper().escape('-.-')
        '-..-'
        >>> XmlCommentEscaper().escape('-..-')
        '-...-'
        >>> XmlCommentEscaper().escape('-...-')
        '-....-'

        overlap:

        >>> XmlCommentEscaper().escape('---')
        '-.-.-'
        >>> XmlCommentEscaper().escape('----')
        '-.-.-.-'
        >>> XmlCommentEscaper().escape('-.--')
        '-..-.-'

        non matches:

        >>> XmlCommentEscaper().escape('-.')
        '-.'
        >>> XmlCommentEscaper().escape('.-')
        '.-'
        """
        return self.escape_xml_comment.sub( lambda m: '-' + m.group(1) + '.', comment )

    def unescape(self,comment):
        """
        >>> XmlCommentEscaper().unescape('-.-')
        '--'
        >>> XmlCommentEscaper().unescape('-.- -.-')
        '-- --'

        multiple:

        >>> XmlCommentEscaper().unescape('-..-')
        '-.-'
        >>> XmlCommentEscaper().unescape('-...-')
        '-..-'
        >>> XmlCommentEscaper().unescape('-....-')
        '-...-'

        overlap:

        >>> XmlCommentEscaper().unescape('-.-.-')
        '---'
        >>> XmlCommentEscaper().unescape('-.-.-.-')
        '----'
        >>> XmlCommentEscaper().unescape('-..-.-')
        '-.--'

        non matches:

        >>> XmlCommentEscaper().unescape('--')
        '--'
        >>> XmlCommentEscaper().unescape('-.')
        '-.'
        >>> XmlCommentEscaper().unescape('.-')
        '.-'
        """
        return self.escape_xml_comment.sub( lambda m: '-' + m.group(1)[1:], comment )

if __name__ == "__main__":
    import doctest
    doctest.testmod()

I just don't know exactly where to plug this, but it should take 5 mins for someone who knows the svg savefig architecture.

@mdboom
Member
mdboom commented May 3, 2013

I think as long as we find a way to escape it, we don't need to worry about unescaping (since matplotlib doesn't read XML/SVG files at all). I'll work up a PR here shortly.

@mdboom mdboom was assigned May 3, 2013
@mdboom
Member
mdboom commented May 3, 2013

A solution is attached. @cirosantilli, could you please confirm it works for you?

FWIW, I took the approach from docbook, which is to convert -- to - -.

@cirosantilli

I'm sorry but I haven't yet been able to install matplotlib from source to test it out, I'll try to do that

for the time being, I see that you approach is based on s = s.replace(u"--", u"- -"), but I think this may fail because of overlapping:

>>> "---".replace("--","- -")
'- --'

so the comment would still broken (please check). I think the way to go is still with re + sub:

escape_xml_comment = re.compile( r'-(?=-)')
return escape_xml_comment.sub( '- ', s )

which works because of the lookahead: "---" --> "- - -"

as for unscapping, I agree that it is not directly necessary for matplotlib, I just feel it is a good practice to give unscape to escapes, since one day some user will want to use the comment content grammatically and will have to write this himself. Of course, not sure it is worth the extra code maintenance...

@mdboom
Member
mdboom commented May 3, 2013

Ah, thanks. Good catch. I will update this to use the regex. I'm not too concerned about the unescaping at this point -- comments shouldn't ever really by used grammatically anyway...

@cirosantilli

haha sorry I meant programatically, not gramatically =)

@pelson pelson commented on an outdated diff May 3, 2013
lib/matplotlib/backends/backend_svg.py
@@ -71,6 +71,11 @@ def escape_cdata(s):
s = s.replace(u">", u"&gt;")
return s
+def escape_comment(s):
+ escape_xml_comment = re.compile( r'-(?=-)')
@pelson
pelson May 3, 2013 Member

Perhaps compile this re outside of the function?

@pelson pelson commented on an outdated diff May 3, 2013
lib/matplotlib/backends/backend_svg.py
@@ -71,6 +71,11 @@ def escape_cdata(s):
s = s.replace(u">", u"&gt;")
return s
+def escape_comment(s):
+ escape_xml_comment = re.compile( r'-(?=-)')
+ s = escape_cdata(s)
+ return escape_xml_comment.sub( '- ', s )
@pelson
pelson May 3, 2013 Member

A couple of extra whitespaces could do with being removed from this line and the ones above.

@pelson
Member
pelson commented May 3, 2013

👍 - LGTM

@mdboom mdboom merged commit aca3fea into matplotlib:v1.2.x May 3, 2013
@mdboom mdboom deleted the mdboom:svg-comment-escape branch Aug 7, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment