Skip to content

AbstractNode::resetCount() breaks node id uniqueness #203

@Taitava

Description

@Taitava

Hi, this issue description is long, as I will describe it step by step.

/**
* Creates a unique id for this node.
*/
public function __construct()
{
$this->id = self::$count;
self::$count++;
}

This part explains well how node ids are generated and what they look like. It's a running number starting from 0. AbstractNode::$count is increased every time a new node is created, so every node should get a unique id, right?

Well, not always.

/**
* Reset node counter
*
* @return void
*/
public static function resetCount()
{
self::$count = 0;
}

I don't know the reason why the counter should be reset, but here we have a method that does it.

public function load(string $str, array $options = []): Dom
{
AbstractNode::resetCount();
// check if it's a file
if (strpos($str, "\n") === false && is_file($str)) {
return $this->loadFromFile($str, $options);
}
// check if it's a url
if (preg_match("/^https?:\/\//i", $str)) {
return $this->loadFromUrl($str, $options);
}
return $this->loadStr($str, $options);
}

According to my IDE, this is the only place where AbstractNode::resetCount() is called. I still don't see, why it's called. And see: Dom::load() is one kind of a wrapper: it inspects the value of $str and calls one of these other methods:

  • loadFromFile(),
  • loadFromUrl(),
  • or loadStr()
    None of those methods call AbstractNode::resetCount(), and all of those methods are public, so it's easy to bypass resetCount() by just avoiding the load() method completely, and calling one of the specific loadFrom*() methods mentioned above.

Finally: what is the problem with AbstractNode::resetCount()?
Well, it does bad things!
Lets take an example where we want to insert one new element into a dom:

// Create a dom which will be used as a parent/container for a paragraph
$dom1 = new \PHPHtmlParser\Dom;
$dom1->load('<div>A container div</div>'); // Resets the counter (doesn't matter here as the counter was 0 even without resetting)
$div = $dom1->firstChild();
echo 'Div id: '. $div->id().'<br>'; // Outputs 1

// Create a paragraph outside of the first dom
$dom2 = new \PHPHtmlParser\Dom;
$dom2->load('<p>Our new paragraph.</p>'); // Resets the counter
$paragraph = $dom2->firstChild();
echo 'Paragraph id: '. $paragraph->id(); // Outputs 1

(Just a note before continuing the example: when $dom1/$dom2 is loaded with HTML content, it will have actually two nodes instead of just one: both doms will have an invisible root element which will have an id of 0. This is why the div/p element will have an id of 1 instead of 0.)

But let's continue with the example:

$div->addChild($paragraph);

This will throw a CircularException, because take a look at the InnerNode::addChild() method:

public function addChild(AbstractNode $child, int $before = -1): bool
{
$key = null;
// check integrity
if ($this->isAncestor($child->id())) {
throw new CircularException('Can not add child. It is my ancestor.');
}
// check if child is itself
if ($child->id() == $this->id) {
throw new CircularException('Can not set itself as a child.');
}

The last check will make it to: CRASH! BOOM! "The numbers are BAD!!" (says Hurley from the most legendary TV show, Lost.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions