Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DomCrawler component Crawler::innerText() doesn't get text after a child node #48682

Closed
otsch opened this issue Dec 16, 2022 · 0 comments · Fixed by #48940
Closed

DomCrawler component Crawler::innerText() doesn't get text after a child node #48682

otsch opened this issue Dec 16, 2022 · 0 comments · Fixed by #48940

Comments

@otsch
Copy link
Contributor

otsch commented Dec 16, 2022

Symfony version(s) affected

5.4

Description

The test case for this method basically is:

<div id="complex-element">
    Parent text
    <span>Child text</span>
</div>

Getting $crawler->text() then returns Parent text Child text and $crawler->innerText() returns Parent text. So far, so good.

If I don't misunderstand anything, in case of:

<div id="complex-element">
    Parent text
    <span>Child text</span>
    More parent text
</div>

the return value of $crawler->innerText() should be Parent text More parent text. But it's still just Parent text.

I'll provide a pull request to fix this in a minute.

How to reproduce

In the Tests/AbstractCrawlerTest.php test file, adapt the HTML in the createTestCrawler() method from:

<div id="complex-element">
    Parent text
    <span>Child text</span>
</div>

to

<div id="complex-element">
    Parent text
    <span>Child text</span>
    More parent text
</div>

And the testInnerText() method accordingly to:

public function testInnerText()
{
    self::assertCount(1, $crawler = $this->createTestCrawler()->filterXPath('//*[@id="complex-element"]'));

    self::assertSame('Parent text Child text More parent text', $crawler->text());
    self::assertSame('Parent text More parent text', $crawler->innerText());
}

Possible Solution

#48684

Additional Context

No response

@otsch otsch added the Bug label Dec 16, 2022
otsch added a commit to otsch/dom-crawler that referenced this issue Dec 16, 2022
Fixes the issue described in
symfony/symfony#48682

Further align the signature of the innerText() method with the text()
method by adding optional default value and normalizeWhitespace
arguments and extract normalizing whitespace to a shared method.
otsch added a commit to otsch/symfony that referenced this issue Dec 16, 2022
Fixes the issue described in
symfony#48682

Further align the signature of the innerText() method with the text()
method by adding optional default value and normalizeWhitespace
arguments and extract normalizing whitespace to a shared method.
otsch added a commit to otsch/symfony that referenced this issue Dec 16, 2022
Fixes the issue described in
symfony#48682
otsch added a commit to otsch/symfony that referenced this issue Dec 20, 2022
Fixes the issue described in
symfony#48682
otsch added a commit to otsch/symfony that referenced this issue Jan 9, 2023
Fixes the issue described in
symfony#48682
otsch added a commit to otsch/symfony that referenced this issue Jan 9, 2023
Fixes the issue described in
symfony#48682
otsch added a commit to otsch/symfony that referenced this issue Jan 10, 2023
Fixes the issue described in
symfony#48682
@fabpot fabpot closed this as completed Jan 11, 2023
fabpot added a commit that referenced this issue Jan 11, 2023
…rawler::innerText()` and make it return the first non-empty text (otsch)

This PR was merged into the 6.3 branch.

Discussion
----------

[DomCrawler] Add argument `$normalizeWhitespace` to `Crawler::innerText()` and make it return the first non-empty text

This is a new PR instead of #48684 with target branch 6.3 as requested.

| Q             | A
| ------------- | ---
| Branch?       | 6.3
| Bug fix?      | yes
| New feature?  | no
| Deprecations? | no
| Tickets       | Fix #48682
| License       | MIT

Commits
-------

bb0c214 [DomCrawler] Add argument `$normalizeWhitespace` to `Crawler::innerText()` and make it return the first non-empty text
fabpot added a commit that referenced this issue Aug 1, 2023
…ty string (NanoSector)

This PR was submitted for the 6.4 branch but it was merged into the 6.3 branch instead.

Discussion
----------

[Crawler] Fix regression where cdata nodes will return empty string

| Q             | A
| ------------- | ---
| Branch?       | 6.3
| Bug fix?      | yes
| New feature?  | no
| Deprecations? | no
| Tickets       | N/A
| License       | MIT
| Doc PR        | N/A
<!--
Replace this notice by a short README for your feature/bugfix.
This will help reviewers and should be a good start for the documentation.

Additionally (see https://symfony.com/releases):
 - Always add tests and ensure they pass.
 - Bug fixes must be submitted against the lowest maintained branch where they apply
   (lowest branches are regularly merged to upper ones so they get the fixes too).
 - Features and deprecations must be submitted against the latest branch.
 - For new features, provide some code snippets to help understand usage.
 - Changelog entry should follow https://symfony.com/doc/current/contributing/code/conventions.html#writing-a-changelog-entry
 - Never break backward compatibility (see https://symfony.com/bc).
-->
Fixes a regression caused by the fix to #48682 where `<script>` tags would return empty string on the innerContent call, where in 6.2 this did work.

Attached is a zipped PHPUnit test case used to verify the regression on 6.2 with a screenshot of its result.

[CrawlerInnerTextTest.zip](https://github.com/symfony/symfony/files/10567760/CrawlerInnerTextTest.zip)
![image](https://user-images.githubusercontent.com/1280380/216297235-1eda66d3-6768-4e49-9812-2857cff67086.png)

Commits
-------

23c1dda [Crawler] Fix regression where cdata nodes will return empty string
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants