BUG HtmlEditorField doesn't save HTML fragments in HTMLValue correctly #793

Merged
merged 1 commit into from Sep 20, 2012

Projects

None yet

5 participants

@halkyon
Member
halkyon commented Sep 14, 2012

The issue was raised in #7628 - http://open.silverstripe.org/ticket/7628, where an anchor tag was being changed from
<a name="anchor"></a> to <a name="anchor"/> by SS_HTMLValue, when
HtmlEditorField::saveInto() parses the HTML fragments.

This is because SS_HTMLValue uses DOMDocument::saveXML() which is fine
for saving an XML document, but not suitable for HTML. This fixes that
to use DOMDocument::saveHTML() instead.

@chillu chillu and 1 other commented on an outdated diff Sep 18, 2012
tests/forms/HtmlEditorFieldTest.php
$editor->setValue('<img src="assets/example.jpg" alt="foo" title="bar" />');
$editor->saveInto($obj);
- $xml = new SimpleXMLElement($obj->Content);
- $this->assertNotNull('foo', $xml['alt'], 'Alt tags are preserved.');
- $this->assertNotNull('bar', $xml['title'], 'Title tags are preserved.');
+ $parser = new CSSContentParser($obj->Content);
+ $xml = $parser->getByXpath('//img');
+ $this->assertNotNull('foo', $xml[0]['alt'], 'Alt tags are preserved.');
@chillu
chillu Sep 18, 2012 Member

Presumably you wanted to write assertNotEquals here?

@halkyon
halkyon Sep 18, 2012 Member

Umm, no? The original assertion was assertNotNull, I'm just making the test work again with CSSContentParser which tidies up the HTML so it can be parsed again in the test.

EDIT: Yeah the original test wasn't very good (that's been like that for years now from the original HtmlEditorFieldTest cases...), it doesn't make sense, and it's not actually testing the cases correctly. I'll fix that too.

@chillu
Member
chillu commented Sep 18, 2012

The DOMNode arg to saveHTML() was added in PHP 5.3.6 only: http://de3.php.net/manual/en/domdocument.savehtml.php

Our server requirements state 5.3.2 as a minimum: http://doc.silverstripe.org/framework/en/trunk/installation/server-requirements

Ideas?

@sminnee
Member
sminnee commented Sep 18, 2012

On earlier versions, do a regex-replacement of the content that gets output without an arg?

@halkyon
Member
halkyon commented Sep 18, 2012

Hrm... silly version differences of PHP ;-) I'll look into some workaround for < 5.3.6.

@halkyon halkyon BUG HtmlEditorField doesn't save HTML fragments in HTMLValue correctly
The issue was raised in #7628, where an anchor tag was being changed from
<a name="anchor"></a> to <a name="anchor"/> by SS_HTMLValue, when
HtmlEditorField::saveInto() parses the HTML fragments.

This is because SS_HTMLValue uses DOMDocument::saveXML(), which is fine
for saving an XML document, but not suitable for HTML. This fix changes
that to use DOMDocument::saveHTML() instead.
Note that we can't use the parameter to saveHTML() for selecting a single
node only, as that's only supported in PHP 5.3.6+, SilverStripe 3.0 supports
PHP 5.3.2 as a minimum. The workaround for this shortcoming is to replace
unncessary output by DOMDocument with a regular expression.
26d70d6
@halkyon
Member
halkyon commented Sep 18, 2012

I've fixed this up so it doesn't rely on the argument to saveHTML(), instead it just removes all the stuff DOMDocument adds with a regex.

Tests fixed up, and passing (PGSQL build on Travis seems to be broken, but unrelated to this.)

Could someone please review again?

@chillu chillu merged commit e44a355 into silverstripe:3.0 Sep 20, 2012

1 check failed

default The Travis build failed
Details
@mtils
mtils commented Nov 7, 2012

I fixed the problem with a regex (HTMLValue: Line 33:

return preg_replace (
        array (
            '/^\s*<body[^>]*>/i',
            '/<\/body[^>]*>\s*$/i',
            '/<\s*a(.*)\/>/i'
        ),
        array(
            ',
            '',
            '<a $1></a>'
        ),
        $this->getDocument()->saveXML($this->getDocument()->documentElement->lastChild)
    );

DomDocument::saveHTML() does not work!. It will destroy all your bbcoded internal links of silverstripe. ([= will be %5D=)
I hope this will help. I can't see why halkyons fix will fix the inline closed a tags.
(The comment function here destroys my code. But the only changes are the third addition in array 1 array 2)

@sminnee
Member
sminnee commented Nov 7, 2012

@mtils - the comment function uses Markdown; if you indent the code it will work.

The clobbering of shortcodes is concerning; although frankly I would probably use regexes to fix that (or find a DomDocument config option) rather than reverting to saveXML. Your 3 regexes are likely just the tip of the iceberg, and trying to reliably parse HTML with regexes is one of those things that seems easy until you need to make it work 100% of the time. ;-)

Given that Sean's fix hasn't horribly broken everything, I assume that the %5D replacement doesn't happen all the time. @halkyon, @chillu - any thoughts / confirmation?

@mtils what is your specific environment? Perhaps it would be better to file a ticket on open.silverstripe.org rather than have this discussion on a closed pull request... ;-)

@mtils
mtils commented Nov 7, 2012

@sminnee Thanks for the Markdown Hint.
Regex and HTML really doesn't work often. But it is a much bigger change to choose saveHTML instead of saveXML. I preferred the regex version because its not that intrusive. Theres also a regex already there for body. This could be done much easier via xpath on the DomDocument.

To be honest, I also couldnt believe that this url-encoding error did not hit the other commenters. But I had it on 2 machines: Silverstripe 3.0.2 Live on php 5.3.1 on a 5 bugs hosting server and locally on php 5.3.17 (both apaches on Linux).

Filing a bug: Yes, would be better. I will wait if the others wil confirm the url-encoding error and file a bug if i am not the only one with this error.

@halkyon
Member
halkyon commented Nov 7, 2012

You can't use saveXML, because it breaks other assumptions like self-closing anchors. saveXML will produce invalid HTML, and we can't have that. Besides, saveXML is for saving XML not HTML.

@halkyon
Member
halkyon commented Nov 7, 2012

By the way, I fixed shortcodes here: ac48950#model/HTMLValue.php

@mtils
mtils commented Nov 7, 2012

@halkyon Oh yes I didn't see that you fixed it. The closed anchors are fixed through the regex I added in my first comment. But I have to try your fix. Perhaps it works better.

@halkyon
Member
halkyon commented Nov 7, 2012

By the way, the fix made its way into 3.0.3-rc1, soon to be released as stable :)

@halkyon
Member
halkyon commented Nov 7, 2012

Yeah, I see your replacement, but the problem is there's more cases where self-closing occurs when it shouldn't, it can happen with any arbitrary element, not just anchors, and this can lead to invalid HTML. I'd much prefer we use saveHTML and then fix the issue with shortcodes. Besides, [ and ] are special characters in HTML, so the fact they're replaced with entities is actually correct according to spec, but of course we need them for shortcodes in the href, so replacing them back seems good enough.

@mtils
mtils commented Nov 7, 2012

@halkyon Yes your right. There could be any empty tag in it. In my opinion the browsers should accept a <a name="x" /> as well as <div style="clear: both;" /> ;-) So I hope to see 3.0.3 soon 👍

@janaka360

This is not working on IE7 and IE8 ( Windows7, SS 3.1 Beta )

Im in a middle of project where i have to test it on IE 7,8,9,10 , due to self closing on article and img elements CSS is got broken, now it seems i didn't care about IE :| :| :|
do u have any idea how to fix this ?

@chillu
Member
chillu commented Jul 3, 2013

@janaka360 Can you please paste some markup before and after save? <article> is not supported in 3.1 TinyMCE out of the box since its HTML5, but we have an extension for that: https://github.com/silverstripe/silverstripe-html5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment