Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split SimplePie\File class into HTTP client and response #774

Merged
merged 27 commits into from May 21, 2023

Conversation

Art4
Copy link
Contributor

@Art4 Art4 commented Jan 24, 2023

Hi 👋

At the moment we have SimplePie\File::__construct(), where a request is handled using curl, fsockopen or file_get_contents. The response is saved in the class properties like body, headers, and so on.

This PR is a first step to allow other HTTP libraries in SimplePie. So this PR splits the File class into 2 parts:

  1. HTTP client with a request() method. The client internally calls the File class like before.
  2. The File class now implements a Response interface, that allows getting all the data like before through methods.

The concept and interfaces are interoperable with PSR-18 and PSR-7. The next step will be an implementation of SimplePie\HTTP\Client that works with a PSR-18 client under the hood.

refs #520

@Art4 Art4 mentioned this pull request Jan 24, 2023
48 tasks
@mblaney
Copy link
Member

mblaney commented Jan 25, 2023

looks good @Art4, before I merge can someone confirm that the tests are running? I'm seeing all passed but when I click through there are no details.

@Art4
Copy link
Contributor Author

Art4 commented Jan 25, 2023

@mblaney Github seems to have some problems at the moment. Github Actions is experiencing degraded performance. See https://www.githubstatus.com/

grafik

src/File.php Outdated Show resolved Hide resolved
@Art4 Art4 marked this pull request as draft January 26, 2023 08:43
@Art4 Art4 marked this pull request as ready for review January 26, 2023 09:41
@jtojnar
Copy link
Contributor

jtojnar commented Jan 26, 2023

I will try to review it later today.

@Art4
Copy link
Contributor Author

Art4 commented Jan 26, 2023

Thank you, I appreciate it. I just found another error and stabilized it with a test.

src/File.php Outdated Show resolved Hide resolved
src/HTTP/Client.php Outdated Show resolved Hide resolved
src/HTTP/Client.php Outdated Show resolved Hide resolved
src/HTTP/Client.php Outdated Show resolved Hide resolved
src/HTTP/Client.php Outdated Show resolved Hide resolved
@Art4
Copy link
Contributor Author

Art4 commented Feb 16, 2023

Yes, it is a reasonable approach if we do not want to introduce dependencies but it is not clear to me that this is a constraint that we have to uphold or that we should. Maintaining the Symfony promise sounds sufficientl reasonable to me and I believe we could have do that with much less code to maintain.

Thank you for your comment. As I explained in #782 this PR was made under the assumption that new external dependencies will be a BC break (not allowed in 1.x, but allowed in 2.x), so I'm with you that directly require PSR-17/PSR-18 implementations would require much less code. This decision is up to @mblaney and the team and would impact #777.

Duplicating and pruning is just duplicating with an extra step. The maintenance concerns still apply.
[...]
But SimplePie also contains a client (File). If we implemented the interfaces for it, we could mostly just accept PSR-18 interfaces for it without any extra layers.

I'm not sure if I unterstand you correctly. (I do not want to exclude the possibility that I fail in understanding because of the language barrier.) Do you want to implement (some of) the PSR-7/PSR-18 interfaces directly in/with SimplePie\File? Why?

I've implemented PSR-7, PSR-17 and PSR-18 already a few times in different projects. The last time was for Requests, which then resulted in its own project.

Implementing these PSRs in SimplePie would require us to implement 12 interfaces with 57 methods.

PSR Interface Methods
PSR-7 MessageInterface 11
  RequestInterface 6
  ResponseInterface 3
  StreamInterface 15
  UriInterface 16
PSR-17 RequestFactoryInterface 1
  StreamFactoryInterface 3
  UriFactoryInterface 1
PSR-18 ClientInterface 1
  ClientExceptionInterface 0
  RequestExceptionInterface 0
  NetworkExceptionInterface 0
Summary 12 57

For the interop layer in SimplePie I condensed them to 3 interfaces with 9 methods. And they will be removed again in SimplePie 2.

Namespace Interface Methods
SimplePie\Exception HttpException 0
SimplePie\HTTP Client 1
  Response 8
Summary 3 9

I cannot understand your point why maintaining this implementation concerns you more than maintaining a PSR implementation?

We cannot really mark these interfaces like HTTP\Client from #774 internal since projects would be meant to implement them.

Yes, you are right. 👍 I just realized that we cannot accept the internal SimplePie\HTTP\Client as parameter in Locator::__construct() or Sanitize::set_http_client(). Instead we must accept PSR-18 client and PSR-17 factories like in #777. I would keep this wrong behavior for now to keep this PR small but will fix this in #777. Update: This has been fixed in #777.

Sure but after removal of FileClient we would have two options:

  1. Keep the Client interface forever with a single Psr18Client, thus having an extra useless layer
  2. Deprecate Client interface and the extra classes and have SimplePie accept Psr\Http\Client\ClientInterface directly.

From what you posted, I understand you want the latter.

No, I prefer option 3 (for SimplePie 2):

  1. Remove Client interface and the extra classes and have SimplePie using Psr\Http\Client\ClientInterface directly.

That's why I want to keep SimplePie\HTTP\Client internal. This code is only a Forward compatability layer to allow a smoother upgrade. I want to delete it in SimplePie 2 again.

Update: (Sorry, just noted your follow up comment.)

Reading the comments bit more, I now realize the confusion might have come from my assumption that there would be a deprecation period for the interop layers. If Scenario A removes them without prior deprecation, the extra code might be worth it for a slightly cleaner implementation on SimplePie side before the removal (no need for ugly ifs throughout the codebase).

Yes, that is my intention. Nearly all code I write for the forward compatibility layer will be removed again in SimplePie 2.

But the downside is that consumers would have to migrate twice (custom FilePsr18Client → native PSR Client) so not sure that BC perspective can really justify that.

No, the migration should be directly custom File → native PSR Client. The Psr18Client must be completely internal.

* @license http://www.opensource.org/licenses/bsd-license.php BSD License
*/

namespace SimplePie\Exception;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts on having it under SimplePie\HTTP? That way we can remove just a single namespace later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no strong opinions about the place for exceptions. We will potentially get more Exceptions while fixing #755.

src/HTTP/Response.php Outdated Show resolved Hide resolved
@Art4
Copy link
Contributor Author

Art4 commented Mar 14, 2023

@mblaney Is there any chance that this will be merged soon?

@jtojnar
Copy link
Contributor

jtojnar commented Mar 17, 2023

I am still reviewing this.

Since my head is too small to look at everything at once, I am extracting some of the unrelated cleanups into separate pull requests and rebased and squashed the fixes into the original commits. I have also added some notes about the changes into the commit messages:

master...jtojnar:simplepie:split-file-into-client-and-response

I have also switched to SPDX syntax since the long comments make it difficult to fit multiple files on the screen at the same time.

Copy link
Contributor

@jtojnar jtojnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it took so long, I have finally managed to review this in detail it deserves.

You did a great job with this and convinced me that this approach is a reasonable interim solution.

While reviewing, I did some rebase surgery to make it easier to handle with my attention span, and while at it fixed some issues. You can see that in #815.

Comment on lines +344 to +345
* While header names are not case-sensitive, get_headers() will preserve the
* exact case in which headers were originally specified.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not true for the FileClient. I do not feel like it is important for now so I have removed it from #815.

declare(strict_types=1);
/**
* SimplePie
*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have credited PHP Framework Interoperability Group in #815.

if (strpos($header, ',') === false) {
$header = [$header];
} else {
$header_lines = explode(',', $header);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry that this might disrupt some headers. Instead, in #815 I have decided to add support for HTTP\Parser to produce PSR-7 compatible headers, use $parsed_headers internally in File as you did initially, and recompute $parsed_headers when $headers change (which should not really happen outside of tests but better safe than sorry).

* @see http://tools.ietf.org/html/rfc3986#section-4.1
* @return string the original (first requested) URL before following redirects (expect 301)
*/
public function get_permanent_uri(): string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still not wholly convinced the ResponseInterface-like object should contain RequestInterface-specific properties like URI but I guess it is fine for now.

@@ -103,7 +113,7 @@ public function find($type = \SimplePie\SimplePie::LOCATOR_ALL, &$working = null
return $this->file;
}

if ($this->file->method & \SimplePie\SimplePie::FILE_SOURCE_REMOTE) {
if (preg_match('/^http(s)?:\/\//i', $this->file->get_requested_uri())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have extracted the regex to Misc::is_remote_uri() in #815.

/**
* Get a HTTP client
*/
private function get_http_client(): Client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be worth it to add support for singletons maintained by Registry. Or maybe we could just create ClientFactory and get it in constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not worth it. This method is private and will be removed in v2 again. Moving this to Registry will lead to more BC issues.

Comment on lines +1785 to +1787
$orig_timeout = $this->timeout;
$this->timeout = 1;
$file = $this->get_http_client()->request(Client::METHOD_GET, $this->feed_url, $headers);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is weird side-channel signalling. It might be nicer to pass $this->timeout/10 (or 1 for whatever reason you have decided to change it) as an argument to the get_http_client() factory. Went with that in #815

Copy link
Contributor Author

@Art4 Art4 Mar 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This request will check if the cache has been changed. The timeout was to 1 sec. I assume that the intention was to not wait to long for a response for this check.

It then was changed to $this->timeout/10 in fe7e333, but this could lead to a float instead of int, if e.g. $this->timeout is set to 5. That's why I changed it back to 1 sec. Alternative we could use ceil() but this will be in most cases 1 or (if $this->timeout = 11-20) 2. So I decided to use 1 directly.

About the side-channel signalling I decided to use this instead of a new argument because the timeout is moved to PSR-18 HTTP client and $this->timeout will be deprecated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the context. Peeking further into history, it looks like it was 1 since the introduction of timeouts along with SimplePie_File in 5bf1814.

But I think it is still useful to keep it proportional to the master timeout to allow people with slow connection to increase it. Even if we assume majority of cached resources will not get outdated, waiting for server to reply HTTP 304 Not Modified can still take time (domain name resolution can be particularly slow). And if it turns out that the cached version is outdated, the web server will try to send an updated version. Having too low timeout can mean the request will be terminated, only to be started again with the original timeout.

So I do not actually think reducing the timeout makes much sense at all. The only explanation I can come up with is that it prevents essentially doubling the timeout when fetching a cached resource from an unresponsive server without force_cache_fallback due to re-running the failed request.
I have split the change to a separate commit in #815, including the explanation.

try {
$response = $this->get_http_client()->request(Client::METHOD_GET, $href, $headers);
} catch (HttpException $th) {
continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This terminates the loop early, skipping $done[] = $href;. Fortunately, the success case is short so should be fine to move it inside the try block. Fixed in #815

try {
$response = $this->get_http_client()->request(Client::METHOD_GET, $value, $headers);
} catch (HttpException $th) {
continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, this skips unset($array[$key]);. Fixed in #815

@Art4
Copy link
Contributor Author

Art4 commented Mar 20, 2023

@mblaney There are a couple of RRs that depend on this PR and #777 and it is a bit frustrating to see other PRs merged and resolve conflicts without getting feedback. Could you please review this PR and #777 first before other things?

* @see http://tools.ietf.org/html/rfc3986#section-4.1
* @return string the final requested url after following redirects
*/
public function get_requested_uri(): string;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also in #815, I renamed this to get_final_requested_uri so it is clearer what this does.

@jtojnar
Copy link
Contributor

jtojnar commented Mar 20, 2023

I propose we go with #815 instead. This change set is complex enough to warrant not squashing everything into a single megacommit, where refactorings like the switch to Response methods make it hard to reason about functional changes like the actual porting to FileClient. Hopefully, we (or our descendants) can have a clean git blame in the future.

@mblaney
Copy link
Member

mblaney commented Apr 2, 2023

@mblaney There are a couple of RRs that depend on this PR and #777 and it is a bit frustrating to see other PRs merged and resolve conflicts without getting feedback. Could you please review this PR and #777 first before other things?

sorry @Art4 you're right, this PR should've been dealt with before merging smaller patches. I'm happy with it so didn't need to comment, but was waiting for approval from @jtojnar before merging. I'll wait for the two of you to decide how to continue before merging anything else.

@Art4
Copy link
Contributor Author

Art4 commented May 17, 2023

sorry @Art4 you're right, this PR should've been dealt with before merging smaller patches. I'm happy with it so didn't need to comment, but was waiting for approval from @jtojnar before merging. I'll wait for the two of you to decide how to continue before merging anything else.

@mblaney I just wanted to kindly let you know that I have already shared all my thoughts and comments in this and other PRs/issues. I believe I have covered everything that needs to be said. I think it's now time for you to make a decision. Once you have made a decision, I can move forward with the next steps in #731. Thank you so much for your time and consideration.

@jtojnar
Copy link
Contributor

jtojnar commented May 18, 2023

@Art4 Just to be clear, are you okay with merging #815 in this PR’s stead? Or do you want to address the comments mentioned in above (e.g.#774 (comment)) in this PR?

@mblaney
Copy link
Member

mblaney commented May 18, 2023

thanks for the update @Art4, as the person driving this project I'm happy for you to make the decisions! :-)

I'm not sure a modified PR was necessary, but happy to merge #815 instead since that has a few improvements.

@Art4
Copy link
Contributor Author

Art4 commented May 19, 2023

@jtojnar I think I have addressed all the comments relevant to this PR. Some issues in this PR are addressed in #777, because they can't be fixed here. All open comments are either "not important for now", "fine for now" or you have already suggested improvements in #815. As long as this is not explicitly requested, I don't see the need to replicate the improvements here, but would rather discuss them in #815 (and yes, I still have some thoughts on them). The discussion is otherwise even more scattered than it already is because of the many open issues.

@mblaney This PR is the base of or mentioned by #777, #813, #815, #812 and #810. These issues are partly the basis for further issues like #826 and #833. Closing this PR will be quite confusing in retrospect.

So if you are happy with this PR for the most part, I recommend that we merge this PR as a base and discuss the other changes in the other PRs and issues.

I would also like to see a decision from you in #782 because it has far reaching meanings on issues like #777. And I'd appreciate any thoughts or reactions from you to #731.

@jtojnar
Copy link
Contributor

jtojnar commented May 19, 2023

@mblaney This PR is the base of or mentioned by #777, #813, #815, #812 and #810. These issues are partly the basis for further issues like #826 and #833. Closing this PR will be quite confusing in retrospect.

I would argue that having a clean commit history with the details in commit messages (as in #815) is more important than GitHub issues, as it facilitates git archaeology. Here, we at most get a single big squashed commit without any rationale for the changes, which will be more confusing when running git blame.

Additionally, the git repo is a more permanent record – it already survived move from the previous hosting provider while the issues did not.

You could also the contents of this PR with commits from #815 using the following:

git fetch git@github.com:jtojnar/simplepie.git split-file-into-client-and-response
git reset --hard FETCH_HEAD

That would allow us to merge this PR (#774) with a cleaner and more descriptive commit history.

(And then the same thing with #777 and #826).

@Art4
Copy link
Contributor Author

Art4 commented May 19, 2023

And here we are again at the point where a decision has to be made. @mblaney, please take over. 🙂

@mblaney
Copy link
Member

mblaney commented May 21, 2023

Thanks @Art4 and @jtojnar for the updates. I appreciate yours thoughts on commit history @jtojnar, but I would prefer authors of a PR completed their work, so will retract my earlier comment and now merge #774. I hope that's ok, if you want to open a new PR or modify #815 for any of the changes you've made it would be good to discuss them separately.

@mblaney mblaney merged commit dba1fd7 into simplepie:master May 21, 2023
8 checks passed
@jtojnar
Copy link
Contributor

jtojnar commented May 21, 2023

@mblaney

I would prefer authors of a PR completed their work

That would be nice but the quality of the software should not suffer for it. Building software is a collaborative effort anyway, so it should not be much of a problem, as long as authorship is properly credited.

Unfortunately, as it is now the history is a complete mess making any debugging using git blame and git bisect much more annoying than it needs to be. This is not much of a problem now but will be in the future, once we are trying to find out about long forgotten design decisions and get commits like 5bf1814

And even worse, there are also correctness errors in this PR (e.g. #774 (comment)). IMO the master branch should not be merged into known broken state with the plan to fix it later. That again will break git bisect.

Edit: I have updated #815. Not much we can do about deeper history but at least blame of relevant changes will end there.

@Art4 Art4 deleted the split-file-into-client-and-response branch May 21, 2023 18:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants