Auto-strip empty <p /> tags #31
Comments
Regarding
where |
OK, so I think I've worked out a nice custom Rule class that I'm happy with that works for this purpose: <?php
/**
* @file
* Contains \EmptyRule
*/
use Facebook\InstantArticles\Transformer\Rules\ConfigurationSelectorRule;
/**
* Matches empty nodes given a selector. Here empty nodes are defined as nodes
* in a DOMDocument that have no children, or have only text children containing
* whitespace characters.
*/
class EmptyRule extends ConfigurationSelectorRule {
public function __construct() {
}
public static function create() {
return new EmptyRule();
}
public static function createFrom($configuration) {
return self::create()->withSelector($configuration['selector']);
}
public function getContextClass() {
return array(
InstantArticle::getClassName(),
Header::getClassName(),
Footer::getClassName(),
TextContainer::getClassName(),
);
}
public function matchesContext($context) {
return TRUE;
}
/**
* @param \DOMNode $node
* @return mixed
*/
public function matchesNode($node) {
// We're only interested in elements here, testing if they are empty.
if ($node->nodeType !== XML_ELEMENT_NODE) {
return FALSE;
}
// Limit by the selector passed in the configuration.
if (!parent::matchesNode($node)) {
return FALSE;
}
// Match iff the node has no children and/or all children are empty text
// nodes.
if ($node->hasChildNodes()) {
/* @var \DOMNode $child */
foreach ($node->childNodes as $child) {
if ($child->nodeName !== '#text') {
return FALSE;
}
else {
// @see https://stackoverflow.com/a/27990195/142145
$trimmed = trim($child->nodeValue, " \t\n\r\0\x0B\xC2\xA0");
if (!empty($trimmed)) {
return FALSE;
}
}
}
}
return TRUE;
}
public function apply($transformer, $context, $element) {
return $context;
}
} Then assuming you've got a $transformer->addRule(
EmptyRule::createFrom(array(
'class' => 'EmptyRule',
'selector' => '//p|//div|//span',
))
); The method Working for me so far. If you guys are into this, I'd be up for re-jigging this into a proper pull request. |
Hey @m4olivei , thanks for digging into this! This is cool that this solved your cases. I just want to warn you to be aware of this usage. This way, the empty rule will need a selector that selects only empty #text content. Im working on a solution right now and will have a pull request soon.
This way we wont:
|
@everton-rosario yeah your right. It's only working for me b/c I have it as the last rule added to the Transformer, which means its the first rule that gets checked against all nodes. Sounds like a neat solution, look forward to testing it. Thanks for looking into this as well! |
Any update on this @everton-rosario. |
A lot of our content has
<p> </p>
in the content. This is kind of a bad habit of editors to introduce vertical space in an article, but I digress. Also, I've found that loading up HTML to pass to the transformer can also introduce empty<p></p>
tags. For example:In the output you get:
Of course, this is bad source HTML, headings are not allowed as children of
<p>
, so we'll look at fixing our source HTML, but here an empty paragraph is allowed to pass through to Facebook, where there is no reason for it in the context of FBIA.Is there a way that the SDK could strip out empty
<p>
tags? Or is there a Transformer rule that you can write to strip out empty<p>
tags (I've tried this to no avail..)?The text was updated successfully, but these errors were encountered: