-
-
Notifications
You must be signed in to change notification settings - Fork 472
Description
Hi, this issue description is long, as I will describe it step by step.
php-html-parser/src/PHPHtmlParser/Dom/AbstractNode.php
Lines 71 to 78 in 12b94f6
| /** | |
| * Creates a unique id for this node. | |
| */ | |
| public function __construct() | |
| { | |
| $this->id = self::$count; | |
| self::$count++; | |
| } |
This part explains well how node ids are generated and what they look like. It's a running number starting from 0.
AbstractNode::$count is increased every time a new node is created, so every node should get a unique id, right?
Well, not always.
php-html-parser/src/PHPHtmlParser/Dom/AbstractNode.php
Lines 139 to 147 in 12b94f6
| /** | |
| * Reset node counter | |
| * | |
| * @return void | |
| */ | |
| public static function resetCount() | |
| { | |
| self::$count = 0; | |
| } |
I don't know the reason why the counter should be reset, but here we have a method that does it.
php-html-parser/src/PHPHtmlParser/Dom.php
Lines 147 to 160 in 12b94f6
| public function load(string $str, array $options = []): Dom | |
| { | |
| AbstractNode::resetCount(); | |
| // check if it's a file | |
| if (strpos($str, "\n") === false && is_file($str)) { | |
| return $this->loadFromFile($str, $options); | |
| } | |
| // check if it's a url | |
| if (preg_match("/^https?:\/\//i", $str)) { | |
| return $this->loadFromUrl($str, $options); | |
| } | |
| return $this->loadStr($str, $options); | |
| } |
According to my IDE, this is the only place where
AbstractNode::resetCount() is called. I still don't see, why it's called. And see: Dom::load() is one kind of a wrapper: it inspects the value of $str and calls one of these other methods:
loadFromFile(),loadFromUrl(),- or
loadStr()
None of those methods callAbstractNode::resetCount(), and all of those methods are public, so it's easy to bypassresetCount()by just avoiding theload()method completely, and calling one of the specificloadFrom*()methods mentioned above.
Finally: what is the problem with AbstractNode::resetCount()?
Well, it does bad things!
Lets take an example where we want to insert one new element into a dom:
// Create a dom which will be used as a parent/container for a paragraph
$dom1 = new \PHPHtmlParser\Dom;
$dom1->load('<div>A container div</div>'); // Resets the counter (doesn't matter here as the counter was 0 even without resetting)
$div = $dom1->firstChild();
echo 'Div id: '. $div->id().'<br>'; // Outputs 1
// Create a paragraph outside of the first dom
$dom2 = new \PHPHtmlParser\Dom;
$dom2->load('<p>Our new paragraph.</p>'); // Resets the counter
$paragraph = $dom2->firstChild();
echo 'Paragraph id: '. $paragraph->id(); // Outputs 1(Just a note before continuing the example: when $dom1/$dom2 is loaded with HTML content, it will have actually two nodes instead of just one: both doms will have an invisible root element which will have an id of 0. This is why the div/p element will have an id of 1 instead of 0.)
But let's continue with the example:
$div->addChild($paragraph);This will throw a CircularException, because take a look at the InnerNode::addChild() method:
php-html-parser/src/PHPHtmlParser/Dom/InnerNode.php
Lines 110 to 122 in 12b94f6
| public function addChild(AbstractNode $child, int $before = -1): bool | |
| { | |
| $key = null; | |
| // check integrity | |
| if ($this->isAncestor($child->id())) { | |
| throw new CircularException('Can not add child. It is my ancestor.'); | |
| } | |
| // check if child is itself | |
| if ($child->id() == $this->id) { | |
| throw new CircularException('Can not set itself as a child.'); | |
| } |
The last check will make it to: CRASH! BOOM! "The numbers are BAD!!" (says Hurley from the most legendary TV show, Lost.)