[DomCrawler] Fixed filterXPath() chaining #10207

robbertkl · 2014-02-05T00:56:16Z

Q	A
Bug fix?	yes
New feature?	no
BC breaks?	debatable (see below)
Deprecations?	no
Tests pass?	yes
Fixed tickets	#10206
License	MIT
Doc PR

As @stof mentions in #10206, each node in the Crawler can belong to a different \DOMDocument. Therefore, I've made each node do its own XPath, relative to itself, and add all the results to a new Crawler. This way, all resulting nodes are still part of their original \DOMDocument and thus can reach all of their parent nodes.

No current tests break on this change. I've added a new test for this case, by checking if the number of parents is correct after obtaining a node through chaining of filterXPath().

Now for BC: I can think of a number of cases where this change would give a different result. However, it's debatable/unclear if:

the old behavior was a bug in the first place (which would validate this change), or
the old behavior was intended (which would make this a BC breaking change)

As an example, consider the following HTML:

<div name="a"><div name="b"><div name="c"></div></div></div>

What would happen if we run this:

echo $crawler->filterXPath('//div')->filterXPath('div')->filterXPath('div')->attr('name');

Aside from breaking reachability of the parent nodes by chaining, with the original code it would echo 'a'.
With this patch it would echo 'c', which, to me, makes more sense.

stof · 2014-02-05T01:35:14Z

Well, to fix the issue fully, it is not the only place which should be changed

robbertkl · 2014-02-05T10:52:25Z

I've found another reference to the magic "_root" node (in parents()), which I've removed.

@stof Is that what you mean? If not, could you please give me a hint of the other place(s) you're referring to? Thanks!

stof · 2014-02-05T11:19:44Z

@robbertkl there are also such cases in the handling of forms in Form::initialize

robbertkl · 2014-02-05T11:53:27Z

@stof Thanks, I see what you mean now.

I've added changes in Form::initialize() and in Field\FormField::__construct() as well. Tests still pass.

stof · 2014-02-05T13:18:12Z

👍

stof · 2014-02-05T13:34:22Z

@robbertkl could you rebase your work ? It conflicts with the merge of #10205

robbertkl · 2014-02-05T16:31:54Z

Done!

fabpot · 2014-02-05T16:38:17Z

Thanks for fixing this bug @robbertkl.

@stof

This PR was merged into the 2.3 branch. Discussion ---------- [DomCrawler] Fixed filterXPath() chaining | Q | A | ------------- | --- | Bug fix? | yes | New feature? | no | BC breaks? | debatable (see below) | Deprecations? | no | Tests pass? | yes | Fixed tickets | #10206 | License | MIT | Doc PR | As @stof mentions in #10206, each node in the Crawler can belong to a different \DOMDocument. Therefore, I've made each node do its own XPath, relative to itself, and add all the results to a new Crawler. This way, all resulting nodes are still part of their original \DOMDocument and thus can reach all of their parent nodes. No current tests break on this change. I've added a new test for this case, by checking if the number of parents is correct after obtaining a node through chaining of `filterXPath()`. Now for BC: I can think of a number of cases where this change would give a different result. However, it's debatable/unclear if: - the old behavior was a bug in the first place (which would validate this change), or - the old behavior was intended (which would make this a BC breaking change) As an example, consider the following HTML: ```html <div name="a"><div name="b"><div name="c"></div></div></div> ``` What would happen if we run this: ```php echo $crawler->filterXPath('//div')->filterXPath('div')->filterXPath('div')->attr('name'); ``` Aside from breaking reachability of the parent nodes by chaining, with the original code it would echo 'a'. With this patch it would echo 'c', which, to me, makes more sense. Commits ------- 43a7716 [DomCrawler] Fixed filterXPath() chaining

tommygnr · 2014-02-16T16:48:33Z

This commit has broken a functional test in one my applications. I am working on putting together a simple example to reproduce.

…kl)" This reverts commit c11c588, reversing changes made to e453c45.

fabpot · 2014-02-18T16:26:11Z

reverted as it causes some regression like mentioned in #10260

* 2.3: Revert "bug #10207 [DomCrawler] Fixed filterXPath() chaining (robbertkl)" Bypass sigchild detection if phpinfo is not available Conflicts: src/Symfony/Component/DomCrawler/Crawler.php

* 2.4: Revert "bug #10207 [DomCrawler] Fixed filterXPath() chaining (robbertkl)" Bypass sigchild detection if phpinfo is not available

…nt DOM nodes (stof, robbertkl) This PR was merged into the 2.3 branch. Discussion ---------- [DomCrawler] Fixed filterXPath() chaining loosing the parent DOM nodes | Q | A | ------------- | --- | Bug fix? | yes | New feature? | no | BC breaks? | no | Deprecations? | no | Tests pass? | yes | Fixed tickets | #10206 | License | MIT | Doc PR | n/a This is a fixed version of #10207, preserving the BC for XPath queries. It is the rebased version of #10935 targetting 2.3 The example given in #10260 when reporting the regression in the previous attempt is covered by the new tests added in the first commit of the PR. I also added many tests ensuring that the behavior is the same than in the current implementation. Commits ------- 80438c2 Fixed the XPath filtering to have the same behavior than Symfony 2.4 711ac32 [DomCrawler] Fixed filterXPath() chaining 8f706c9 [DomCrawler] Added more tests for the XPath filtering

robbertkl mentioned this pull request Feb 5, 2014

DomCrawler\Crawler::filterXPath() should not create new \DOMDocuments #10206

Closed

[DomCrawler] Fixed filterXPath() chaining

43a7716

fabpot merged commit 43a7716 into symfony:2.3 Feb 5, 2014

robbertkl deleted the ticket_10206 branch February 5, 2014 18:12

fabpot added a commit that referenced this pull request Feb 18, 2014

Revert "bug #10207 [DomCrawler] Fixed filterXPath() chaining (robbert…

8f37921

…kl)" This reverts commit c11c588, reversing changes made to e453c45.

fabpot mentioned this pull request Feb 18, 2014

Dom crawler context bug #10260

Closed

fabpot added a commit that referenced this pull request Feb 18, 2014

Merge branch '2.4'

838dc7e

* 2.4: Revert "bug #10207 [DomCrawler] Fixed filterXPath() chaining (robbertkl)" Bypass sigchild detection if phpinfo is not available

This was referenced May 18, 2014

[DomCrawler] Fixed filterXPath() chaining loosing the parent DOM nodes #10935

Closed

[DomCrawler] Fixed filterXPath() chaining loosing the parent DOM nodes #10958

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DomCrawler] Fixed filterXPath() chaining #10207

[DomCrawler] Fixed filterXPath() chaining #10207

robbertkl commented Feb 5, 2014

stof commented Feb 5, 2014

robbertkl commented Feb 5, 2014

stof commented Feb 5, 2014

robbertkl commented Feb 5, 2014

stof commented Feb 5, 2014

stof commented Feb 5, 2014

robbertkl commented Feb 5, 2014

fabpot commented Feb 5, 2014

tommygnr commented Feb 16, 2014

fabpot commented Feb 18, 2014

[DomCrawler] Fixed filterXPath() chaining #10207

[DomCrawler] Fixed filterXPath() chaining #10207

Conversation

robbertkl commented Feb 5, 2014

stof commented Feb 5, 2014

robbertkl commented Feb 5, 2014

stof commented Feb 5, 2014

robbertkl commented Feb 5, 2014

stof commented Feb 5, 2014

stof commented Feb 5, 2014

robbertkl commented Feb 5, 2014

fabpot commented Feb 5, 2014

tommygnr commented Feb 16, 2014

fabpot commented Feb 18, 2014