Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pagination Detecting Incorrectly #758

Closed
petewilcock opened this issue Jan 31, 2021 · 17 comments
Closed

Pagination Detecting Incorrectly #758

petewilcock opened this issue Jan 31, 2021 · 17 comments
Assignees
Labels

Comments

@petewilcock
Copy link

petewilcock commented Jan 31, 2021

Just noticed this:

For post pagination, live site is using siteurl.com/page/1 but isn't being detected.

What it is detecting though is siteurl.com/blog/page/1 which doesn't exist and 404s on crawl.

@petewilcock
Copy link
Author

On inspection it looks like this well-meaning PR from @john-shaffer introduced an assumption of post pagination pages being prefixed with /blog:

https://github.com/leonstafford/wp2static/pull/574/files

Manually removed this and recompiled, which fixes the issue for my case.

@john-shaffer
Copy link
Contributor

Apparently some sites have posts paginated under /blog/, while in others it's under /. I can't find any useful documentation about how to determine the prefix.

@leonstafford
Copy link
Contributor

That should be sorted in https://github.com/leonstafford/static-html-output/ it's a bit of a multi-step query with some extra logic thrown in to cover the pagination_base or such.

@leonstafford
Copy link
Contributor

Looking at this code again today while doing some test tidy up and it does stand out as weird to have the /blog/ hardcoded, so will see if I can fix it up

@leonstafford
Copy link
Contributor

get_post_type_archive_link is probably the missing piece within DetectPostsPaginationURLs which I'll add in to check while building those URLs.

@leonstafford leonstafford self-assigned this Feb 26, 2021
@leonstafford
Copy link
Contributor

last commit didn't quite hit it:

/http:/localhost:4683page/1/

should have it sorted soon, now that I'm doing some testing within real site

@leonstafford
Copy link
Contributor

leonstafford commented Feb 26, 2021

To simulate the custom "Blog" or other name, that you encountered @john-shaffer, I reproduced, by creating 2 x new Pages, "Homepage" and "Blog". I then set these to Homepage and Posts page, respectively in Settings > Reading.

Then, we see the correct URL detected, in this case: /blog/page/1/.

I just need to fix the bad URL shown above, which is when get_post_type_archive_link() returns false (or maybe when it returns just the site url, I need another test for both of these conditions).

@leonstafford
Copy link
Contributor

Now looking better (missed a trailingslashit()).

Without a Posts page set in WP Settings, detects:

/page/1/

With Posts page set to /blog/, we get: /blog/page/1/

@leonstafford
Copy link
Contributor

ugh, taking my time with this one :P

was misunderstanding what I'm getting back from that fn. may opt for the one which gives either the posts url or false, as demo'd here in wp_cli's shell:

# With "sample-page" set as posts page:

wp> get_permalink( get_option( 'page_for_posts' ) )
=> string(34) "http://localhost:4683/sample-page/"

# I reset it back to no Posts page:

wp> 
/usr/html # wp shell
wp> get_permalink( get_option( 'page_for_posts' ) )
=> bool(false)
wp> get_post_type_archive_link( 'post' );
=> string(21) "http://localhost:4683"
wp> 

@john-shaffer
Copy link
Contributor

Both work fine for me, but I would suggest get_post_type_archive_link because it uses the post_type_archive_link filter.

@leonstafford
Copy link
Contributor

Sweet, will try combining the two, as I like the false response from the get_option() call when not set, then use the get_post_type_archive_link for most accurate URL

@leonstafford
Copy link
Contributor

(without the get_permalink, the get_option() call will return "0", which is still enough to test the condition)

@leonstafford
Copy link
Contributor

Put up some work in progress, currently at:

1) WP2Static\DetectPostsPaginationURLsTest::testDetectWithPostsPage
Failed asserting that two arrays are equal.
--- Expected
+++ Actual
@@ @@
 Array (
-    0 => '/blog/page/1/'
-    1 => '/blog/page/2/'
-    2 => '/blog/page/3/'
-    3 => '/blog/page/4/'
-    4 => '/blog/page/5/'
+    0 => '/blog//page/1/'
+    1 => '/page/2/'
+    2 => '/page/3/'
+    3 => '/page/4/'
+    4 => '/page/5/'
 )

/Users/leon/wp2static/tests/unit/DetectPostsPaginationURLsTest.php:318

@leonstafford
Copy link
Contributor

Fixed those disparate URLs by correcting the number of expected method calls in the unit tests.

Lastly, for tests, need to fix the extra trailing slash on the Posts Page segment. Then check again in real site.

@leonstafford
Copy link
Contributor

Without Posts page defined:

[2021-02-27T00:17:43+00:00] Detection complete. 627 URLs added to Crawl Queue.
/usr/html # wp wp2static crawl_queue list | grep page
/author/admin/page/1/
/category/uncategorized/page/1/
/homepage/
/sample-page/
/wp-includes/blocks/nextpage/block.json
page/1/

With Posts Page defined:

[2021-02-27T00:18:50+00:00] Detection complete. 626 URLs added to Crawl Queue.
/usr/html # wp wp2static crawl_queue list | grep page
/author/admin/page/1/
/blog/page/1/
/category/uncategorized/page/1/
/sample-page/
/wp-includes/blocks/nextpage/block.json

@leonstafford
Copy link
Contributor

So... one of the URLs in the tests isn't reflective of real WP case, will need to sort that out to produce same results as we're seeing in the WP site, then can fix failing tests and hopefully be done with it :D

@leonstafford
Copy link
Contributor

With Posts Page:

[2021-02-27T01:44:02+00:00] Starting to detect WordPress site URLs.
[2021-02-27T01:44:02+00:00] Detection complete. 626 URLs added to Crawl Queue.
/usr/html # wp wp2static crawl_queue list | grep page
/author/admin/page/1/
/blog/page/1/
/category/uncategorized/page/1/
/sample-page/
/wp-includes/blocks/nextpage/block.json

Without Posts Page:

[2021-02-27T01:44:54+00:00] Starting to detect WordPress site URLs.
[2021-02-27T01:44:54+00:00] Detection complete. 627 URLs added to Crawl Queue.
/usr/html # wp wp2static crawl_queue list | grep page
/author/admin/page/1/
/category/uncategorized/page/1/
/homepage/
/page/1/
/sample-page/
/wp-includes/blocks/nextpage/block.json

Looking good!

The incorrect URL in the test case was a pagination base with a leading / slash, which isn't what WP returns.

Not a very sexy exercise to go through this, but good practice for me! Thanks for reporting and inputs y'all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants