-
-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to process prefetched content #274
Conversation
Could you please add some context – what is the purpose of this? If it is for tests, would not it be cleaner to mock |
@jtojnar I worked a few weeks ago on a WebScrapBook server implementation in wallabag, it lets me save actual page content from my browser using the extension WebScrapBook which is useful for pages with javascript rendering or bot protection. This implies to send content to Graby without it to make a HTTP call, thus adding support for "prefetched content". |
7a9b7a2
to
8741413
Compare
src/Graby.php
Outdated
@@ -167,11 +167,11 @@ public function getConfig(string $key) | |||
* | |||
* @return array With keys html, title, url & summary | |||
*/ | |||
public function fetchContent(string $url): array | |||
public function fetchContent(string $url, string $prefetchedContent = null): array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing a content to a method called fetchContent
still feels a bit iffy to me, I would prefer a separate cleanContent
method. But then, the code has been a pile of megafunctions playing fast and loose with SRP since it was called Full-Text RSS, and I do not see a way to separate it out cleanly without a huge refactoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separating fetch and extraction logics from doFetchContent()
may be difficult because of the multipage fetcher.
Another approach instead of passing content in the fetchContent()
could be to have a separate setter like setContentAsPrefetched($content)
that we would call before calling fetchContent()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great suggestion @Kdecherf with setContentAsPrefetched
. I prefer it. Can you make the change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@j0k3r done, I think we're good to make another review
src/Graby.php
Outdated
@@ -322,7 +338,7 @@ private function doFetchContent(string $url): array | |||
|
|||
// check site config for single page URL - fetch it if found | |||
$isSinglePage = false; | |||
if ($this->config['singlepage'] && ($singlePageResponse = $this->getSinglePage($html, $effectiveUrl))) { | |||
if ($this->config['singlepage'] && null === $prefetchedContent && ($singlePageResponse = $this->getSinglePage($html, $effectiveUrl))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would not it be reasonable to pass content from web browser but still want Graby to collect multi-page data? Wallabag can always set singlepage = false
and multipage = false
to prevent this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that I don't see how the http fetch()
call occurring on lines 407 and 757 could work when you provide a prefetched content.
We could consider that the caller (wallabag in my case) should set singlepage=false and multipage=false when providing prefetched content, however there is no way to set these variables as "oneshot operations" afaik.
062b1e6
to
ff7a20d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, could you add an entry in the readme about that new behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go squash! 👍
7b90d91
to
532234c
Compare
fetchContent() now accepts an optional parameter, prefetchedContent, which can contain the content of a page that was fetched before calling Graby. If we take the example of Wallabag it gives the ability of sending the content of a page (through a browser extension for example) without making network calls to fetch the page. Signed-off-by: Kevin Decherf <kevin@kdecherf.com>
532234c
to
31135a7
Compare
fetchContent() now accepts an optional parameter, prefetchedContent, which
can contain the content of a page that was fetched outside of Graby.
If we take the example of Wallabag it gives the ability to send the
content of a page (through a browser extension for example) without
making network calls to fetch the actual page.