Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to process prefetched content #274

Merged
merged 1 commit into from
Feb 3, 2022

Conversation

Kdecherf
Copy link
Collaborator

@Kdecherf Kdecherf commented Dec 4, 2021

fetchContent() now accepts an optional parameter, prefetchedContent, which
can contain the content of a page that was fetched outside of Graby.

If we take the example of Wallabag it gives the ability to send the
content of a page (through a browser extension for example) without
making network calls to fetch the actual page.

@coveralls
Copy link

coveralls commented Dec 4, 2021

Coverage Status

Coverage increased (+0.03%) to 95.102% when pulling 31135a7 on Kdecherf:feature/offline-mode into fda6724 on j0k3r:master.

@jtojnar
Copy link
Collaborator

jtojnar commented Dec 4, 2021

Could you please add some context – what is the purpose of this? If it is for tests, would not it be cleaner to mock httpClient?

@Kdecherf
Copy link
Collaborator Author

Kdecherf commented Dec 4, 2021

@jtojnar I worked a few weeks ago on a WebScrapBook server implementation in wallabag, it lets me save actual page content from my browser using the extension WebScrapBook which is useful for pages with javascript rendering or bot protection.

This implies to send content to Graby without it to make a HTTP call, thus adding support for "prefetched content".

@Kdecherf Kdecherf force-pushed the feature/offline-mode branch 2 times, most recently from 7a9b7a2 to 8741413 Compare January 15, 2022 18:28
@Kdecherf Kdecherf marked this pull request as ready for review January 15, 2022 18:36
@Kdecherf
Copy link
Collaborator Author

This PR is ready for review I think, poke @jtojnar @j0k3r

src/Graby.php Outdated
@@ -167,11 +167,11 @@ public function getConfig(string $key)
*
* @return array With keys html, title, url & summary
*/
public function fetchContent(string $url): array
public function fetchContent(string $url, string $prefetchedContent = null): array
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing a content to a method called fetchContent still feels a bit iffy to me, I would prefer a separate cleanContent method. But then, the code has been a pile of megafunctions playing fast and loose with SRP since it was called Full-Text RSS, and I do not see a way to separate it out cleanly without a huge refactoring.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separating fetch and extraction logics from doFetchContent() may be difficult because of the multipage fetcher.

Another approach instead of passing content in the fetchContent() could be to have a separate setter like setContentAsPrefetched($content) that we would call before calling fetchContent().

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion @Kdecherf with setContentAsPrefetched. I prefer it. Can you make the change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@j0k3r done, I think we're good to make another review

src/Graby.php Outdated
@@ -322,7 +338,7 @@ private function doFetchContent(string $url): array

// check site config for single page URL - fetch it if found
$isSinglePage = false;
if ($this->config['singlepage'] && ($singlePageResponse = $this->getSinglePage($html, $effectiveUrl))) {
if ($this->config['singlepage'] && null === $prefetchedContent && ($singlePageResponse = $this->getSinglePage($html, $effectiveUrl))) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would not it be reasonable to pass content from web browser but still want Graby to collect multi-page data? Wallabag can always set singlepage = false and multipage = false to prevent this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that I don't see how the http fetch() call occurring on lines 407 and 757 could work when you provide a prefetched content.

We could consider that the caller (wallabag in my case) should set singlepage=false and multipage=false when providing prefetched content, however there is no way to set these variables as "oneshot operations" afaik.

Copy link
Owner

@j0k3r j0k3r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, could you add an entry in the readme about that new behavior?

@Kdecherf
Copy link
Collaborator Author

Kdecherf commented Feb 1, 2022

Looks good to me, could you add an entry in the readme about that new behavior?

@j0k3r 7aa0592 do you think it's enough?

@Kdecherf Kdecherf requested a review from j0k3r February 1, 2022 19:35
README.md Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
Copy link
Owner

@j0k3r j0k3r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go squash! 👍

fetchContent() now accepts an optional parameter, prefetchedContent, which
can contain the content of a page that was fetched before calling Graby.

If we take the example of Wallabag it gives the ability of sending the
content of a page (through a browser extension for example) without
making network calls to fetch the page.

Signed-off-by: Kevin Decherf <kevin@kdecherf.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants