Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DomCrawler] return empty string on `Crawler::text()` and `Crawler::html()` instead of an exception #28581

Merged

Conversation

Projects
None yet
7 participants
@respinoza
Copy link
Contributor

commented Sep 24, 2018

Q A
Branch? master
Bug fix? no
New feature? yes
BC breaks? yes
Deprecations? no
Tests pass? yes
Fixed tickets #28313
License MIT
Doc PR symfony/symfony-docs#...

This changes the behavior when using text and html methods from the Crawler by returning null instead of throwing an exception.

@respinoza

This comment has been minimized.

Copy link
Contributor Author

commented Sep 24, 2018

I am not sure if the BC is allowed or if we should add an argument to control the behavior and deprecate (but this kinda defeats this change as goal seems to make this more developer friendly)

@nicolas-grekas nicolas-grekas changed the title [DomCrawler] Return `null` on `text` and `html` methods when empty in… [DomCrawler] Return `null` on `text` and `html` methods when empty instead of an exception Sep 24, 2018

@nicolas-grekas

This comment has been minimized.

Copy link
Member

commented Sep 24, 2018

We shouldn't break BC, so a constructor argument to enable this behavior would be needed.
Then there should be a similar argument to BrowserKit\Client so that this could be also enabled there. Before doing so, do we all agree returning null is the best for DX and we should move out of the exception?

@nicolas-grekas nicolas-grekas added this to the next milestone Sep 24, 2018

@respinoza respinoza force-pushed the respinoza:domcrawler-null-on-empty-text-html branch 2 times, most recently from 19c9818 to fc5effd Sep 24, 2018

@respinoza

This comment has been minimized.

Copy link
Contributor Author

commented Sep 24, 2018

Here is a better proposal @nicolas-grekas to avoid BC as you suggested (way better), not sure about the new argument name.

Now, for the BrowserKit, new PR or two commits same PR ?

@stof

This comment has been minimized.

Copy link
Member

commented Sep 24, 2018

We still have a bunch of other methods which fail for an empty node list.

And btw, making return types nullable will not save you for doing a check to handle that case. Instead of checking whether the crawler is empty or no, you will have to check whether the value is null.

@stof

This comment has been minimized.

Copy link
Member

commented Sep 24, 2018

and regarding BC, this is still changing the contract of the class. Some code receiving a Crawler as argument will not control this boolean, and so would have to be updated to support the new signature as well. So this approach is still a BC break.

@respinoza

This comment has been minimized.

Copy link
Contributor Author

commented Sep 24, 2018

@stof looking at the original issue, it could also be empty, let's say an empty string, in this way it keeps the signature.
Do you have any suggestions? Or do you propose to drop this change?

@stof

This comment has been minimized.

Copy link
Member

commented Sep 24, 2018

Well, returning the empty string is possible for these 2 methods. But then, we still cannot remove the exception from other methods (the ones returning an object for instance)

@respinoza respinoza force-pushed the respinoza:domcrawler-null-on-empty-text-html branch 2 times, most recently from 571d6bc to e0d0842 Sep 24, 2018

@respinoza

This comment has been minimized.

Copy link
Contributor Author

commented Sep 24, 2018

A test fails but seems to be a travis problem with composer?

@nicolas-grekas
Copy link
Member

left a comment

I think I'm fine with this PR (just one minor comment)
->text() & ->html() not throwing is good for DX (but throwing for the other methods is OK also)

*
* @var bool
*/
private $emptyValue;

This comment has been minimized.

Copy link
@nicolas-grekas

nicolas-grekas Sep 25, 2018

Member

the name is strange (but I don't have a better suggestion)

@nicolas-grekas

This comment has been minimized.

Copy link
Member

commented Sep 25, 2018

Now, for the BrowserKit, new PR or two commits same PR ?

@respinoza how would you leverage this new argument? Do you create your crawlers manually?

@jakzal

This comment has been minimized.

Copy link
Member

commented Sep 25, 2018

With text() and html() methods returning an empty string how can I tell the difference between an empty text and element not found?

@respinoza

This comment has been minimized.

Copy link
Contributor Author

commented Sep 25, 2018

@jakzal the count() method still return the matches, so 0 for not found.

@respinoza

This comment has been minimized.

Copy link
Contributor Author

commented Sep 25, 2018

@nicolas-grekas we will have to pass it to the BrowserKit Client constructor and then pass it around in https://github.com/symfony/browser-kit/blob/68f233bc3a0fb6b64f48ff6318038087b49a1639/Client.php#L516`

Currently there is no way to inject a DomCrawler instance. This can also be solved by improving the DomCrawler/BrowserKit as follows

  • Add method to DomCrawler to create a new instance (similar like the current createSubCrawlermethod)
  • Add setter to pass the DomCrawler to the BrowserKit client
  • Get a fresh instance if we set a custom DomCrawler instance or create a new like we do right now
@nicolas-grekas

This comment has been minimized.

Copy link
Member

commented Sep 25, 2018

Currently there is no way to inject a DomCrawler instance

so let's allow it via a new constructor arg to BrowserKit's Client?
should be in the same PR to make it complete (including the DI configuration)

@nicolas-grekas

This comment has been minimized.

Copy link
Member

commented Sep 25, 2018

so let's allow it via a new constructor arg to BrowserKit's Client?

hum, looks like a crawler factory would be needed, so now I get your proposal.
I'd still prefer passing this factory as a constructor arg instead of a setter.

@respinoza

This comment has been minimized.

Copy link
Contributor Author

commented Sep 25, 2018

@nicolas-grekas having a factory will be way better. I will get on that.

@stof

This comment has been minimized.

Copy link
Member

commented Sep 25, 2018

I'm not sure passing a Crawler factory just to control this setting (which looks transitional) makes sense.

Injecting a Crawler indeed does not make sense, as the Crawler holds some content, so it does not make sense to inject one that the Client would not know what is already in it.

@respinoza

This comment has been minimized.

Copy link
Contributor Author

commented Sep 25, 2018

@stof the current code in BrowserKit creates a new instance or null if the class doesn't exist. The factory will be called instead, behavior is the same (new instance on method call), we will just show to return a configured instance.
Sub crawlers are created as usual.
We won't reuse the crawler.
I don't see a problem

@stof

This comment has been minimized.

Copy link
Member

commented Sep 25, 2018

Well, the question is whether we need to create a concept of factory when the only meaningful configuration to be customized by the extension point is this boolean.

Currently, this PR makes that setting a permanent option. It does not deprecate the old behavior. Is it intended ? Variable signatures based on an option are harder to use (as any code not creating the crawler could not rely on whether an exception could be thrown or no for empty crawlers, and so should keep handling the case throwing exceptions too, defeating this PR).
If this becomes a deprecated API with a migration path, making people replace the whole instantiation of the crawler is too much:

  • it makes the migration harder as they have to duplicate the core logic in 99% of cases
  • it introduces an extension point that we then have to maintain even if we don't need it for the BC layer. We cannot remove it in 5.0 (as we cannot deprecate it in 4.x if using it is mandatory to avoid another deprecation).

Thus, if the goal is to deprecate the old behavior, I think a constructor argument is the wrong way to opt-in for the new behavior: the code needing to deal with the new behavior (and so knowing whether it can opt-in or no) may not be the same than the code instantiating the crawler (and when using BrowserKit, that would mean the code instantiating the client, to configure it to configure the crawler creation...).

@respinoza

This comment has been minimized.

Copy link
Contributor Author

commented Sep 25, 2018

I don't believe we want to get rid of the old behavior so there won't be any deprecation.

I agree with your point on the variable signature. We could always have two new methods and get rid of the parameter.

(No idea how to make those methods)

@jakzal

This comment has been minimized.

Copy link
Member

commented Sep 26, 2018

I really don't like having these two distinct behaviours in one class.

I suggest we at least try to consider other ideas before proceeding this way.

Some (imperfect) ideas below.

  1. Accept an optional default in the text/html methods text($default = null). Drawback: Bc break

  2. Add new methods, like textOrDefault($default). Drawback: yet another method on an already bloated interface.

  3. Crawler has lots of methods, but perhaps a decorator would be still a better option than a flag? Since we don't have an interface, we'd also need to extend the Crawler unfortunately:

class GracefulCrawler extends Crawler
{
     // ...

     public function text()
     {
        try {
            $this->crawler->text()
        } catch (\InvalidArgumentException $e) {
            return '';
        }
    }
}
@stof

This comment has been minimized.

Copy link
Member

commented Sep 26, 2018

@jakzal creating a decorator will be a pain:

  • adding any method is a BC break, unless we also provide a base class delegating all methods in DomCrawler itself and consider that a decorator must extend our own base class to be subject to BC. Otherwise, the newly introduced method would get called on the parent class rather than on the inner object
  • things become even more complex due to returning sub-crawlers from many methods
  • changing the signature in a decorator class is still breaking the contract for any code consuming a Crawler.

At that point, it is much better to have this text extraction utility in your own code rather than in a crawler decorator.

And regarding the discussion about BC, a configurable signature is a bigger BC break than a variable signature based on a new optional argument. A variable signature creates a BC break for child class (and the migration path is clear, with the possibility to trigger a deprecation). A configurable signature creates a BC break for consuming code (and with no way to warn them) due to the signature change (as they might not be the one controlling the contrustor-based opt-in).

@nicolas-grekas

This comment has been minimized.

Copy link
Member

commented Sep 26, 2018

Option 1. is possible, using func_get_arg() to prevent any BC break. We already did so in other places.

@respinoza respinoza force-pushed the respinoza:domcrawler-null-on-empty-text-html branch 2 times, most recently from a6858eb to b3505ef Sep 29, 2018

@respinoza

This comment has been minimized.

Copy link
Contributor Author

commented Sep 29, 2018

I added the argument to the function through func_args as discussed for option one.
Looking at the related issues, I guess it will be a good idea to extend this to the attr() method?
That one already has an argument, should still do a func_args and check for the second one?

Also again not sure about the parameter name (naming things is hard), ideas welcome 😅

@respinoza respinoza changed the title [DomCrawler] Return `null` on `text` and `html` methods when empty instead of an exception [DomCrawler] return empty string on `Crawler::text()` and `Crawler::html()` instead of an exception Sep 29, 2018

@ro0NL

This comment has been minimized.

Copy link
Contributor

commented Sep 29, 2018

@respinoza what about $throw = true :) then we can also apply it on parents() etc. maybe, returning an empty list in case throw=false.

@amer-flix

This comment has been minimized.

Copy link

commented Sep 29, 2018

Passing true for every method is also a hectic approach. It would have been better if we could somehow transfer this approach from the Crawler constructor.

@jakzal

This comment has been minimized.

Copy link
Member

commented Sep 29, 2018

A default value, rather than a boolean flag, would be more flexible.

@respinoza

This comment has been minimized.

Copy link
Contributor Author

commented Sep 29, 2018

I would say no to a default value as we don't want to change the signature (return single), we could check it is a string but in practice is someone going to have another value than empty string?

Maybe the idea of having new methods to handle this might be better.

Any more ideas?

@nicolas-grekas

This comment has been minimized.

Copy link
Member

commented Sep 29, 2018

Yes, two :)

  1. name the new argument "$default" and return it if provided (and throw when not provided)
  2. do nothing and keep things as is

@respinoza respinoza force-pushed the respinoza:domcrawler-null-on-empty-text-html branch from b3505ef to 3c399cf Dec 16, 2018

@respinoza

This comment has been minimized.

Copy link
Contributor Author

commented Dec 16, 2018

@nicolas-grekas sorry for the delay, changes have been done and hopefully it can be included for 4.3

[DomCrawler] Added ability to return a default value in `text()` and …
…`html()` instead of throwing an exception when node is empty.

@nicolas-grekas nicolas-grekas force-pushed the respinoza:domcrawler-null-on-empty-text-html branch from 3c399cf to 6a4ce38 Dec 17, 2018

@nicolas-grekas

This comment has been minimized.

Copy link
Member

commented Dec 17, 2018

Thank you @respinoza.

@nicolas-grekas nicolas-grekas merged commit 6a4ce38 into symfony:master Dec 17, 2018

1 of 3 checks passed

continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
fabbot.io Your code looks good.
Details

nicolas-grekas added a commit that referenced this pull request Dec 17, 2018

feature #28581 [DomCrawler] return empty string on `Crawler::text()` …
…and `Crawler::html()` instead of an exception (respinoza)

This PR was merged into the 4.3-dev branch.

Discussion
----------

[DomCrawler] return empty string on `Crawler::text()` and `Crawler::html()` instead of an exception

…stead of an exception

| Q             | A
| ------------- | ---
| Branch?       | master
| Bug fix?      | no
| New feature?  | yes
| BC breaks?    | yes
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | #28313
| License       | MIT
| Doc PR        | symfony/symfony-docs#... <!-- required for new features -->

This changes the behavior when using text and html methods from the Crawler by returning null instead of throwing an exception.

Commits
-------

6a4ce38 [DomCrawler] Added ability to return a default value in `text()` and `html()` instead of throwing an exception when node is empty.

javiereguiluz added a commit to symfony/symfony-docs that referenced this pull request Jan 22, 2019

minor #10898 Documented the default values of text() and html() (javi…
…ereguiluz)

This PR was merged into the master branch.

Discussion
----------

Documented the default values of text() and html()

Documents symfony/symfony#28581

See also https://symfony.com/blog/new-in-symfony-4-3-domcrawler-improvements

Commits
-------

851956d Documented the default values of text() and html()

@nicolas-grekas nicolas-grekas modified the milestones: next, 4.3 Apr 30, 2019

@fabpot fabpot referenced this pull request May 9, 2019

Merged

Release v4.3.0-BETA1 #31435

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.