Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion pagination is broken for crawlers (Google) #1310

Closed
gwillem opened this issue Dec 18, 2017 · 9 comments
Closed

Discussion pagination is broken for crawlers (Google) #1310

gwillem opened this issue Dec 18, 2017 · 9 comments
Labels

Comments

@gwillem
Copy link
Contributor

gwillem commented Dec 18, 2017

Explanation

The noscript "next page" url is invalid, thus preventing crawlers to index followup posts.

Proof

$ curl -s 'https://discuss.flarum.org/d/7585-freeflarum-com-now-open-for-beta-access' | grep 'Next Page'
            <a href="https://discuss.flarum.org/d/7585?id=7585-freeflarum-com-now-open-for-beta-access&amp;page=2">Next Page &raquo;</a>
$ curl -s 'https://discuss.flarum.org/d/7585?id=7585-freeflarum-com-now-open-for-beta-access&amp;page=2' | grep 'Next Page'
            <a href="https://discuss.flarum.org/d/7585?id=7585&amp;amp%3Bpage=2&amp;page=2">Next Page &raquo;</a>
$ curl -s 'https://discuss.flarum.org/d/7585?id=7585&amp;amp%3Bpage=2&amp;page=2' | grep 'Next Page'
            <a href="https://discuss.flarum.org/d/7585?id=7585&amp;amp%3Bamp%3Bpage=2&amp;amp%3Bpage=2&amp;page=2">Next Page &raquo;</a>

Technical details

  • Version of Flarum: beta 7

Likely solution

The & character is erroneously html-escaped. See https://github.com/flarum/core/blob/3dcfe32b27f27f896b3aa22ec4085b6fcac005b9/views/frontend/content/discussion.blade.php#L27

@gwillem gwillem changed the title Noscript "next page" url is invalid. Discussion pagination is broken for crawlers (Google) Dec 18, 2017
@tobyzerner tobyzerner added this to the 0.1 milestone Dec 25, 2017
@franzliedke
Copy link
Contributor

franzliedke commented Jan 10, 2018

I disagree.

Instead of curl -s 'https://discuss.flarum.org/d/7585?id=7585-freeflarum-com-now-open-for-beta-access&amp;page=2' | grep 'Next Page', you should run curl -s 'https://discuss.flarum.org/d/7585?id=7585-freeflarum-com-now-open-for-beta-access&page=2' | grep 'Next Page'.

The result is then correct.

That's what any browser would do, too.

@franzliedke
Copy link
Contributor

That id parameter looks superfluous, though.

@franzliedke
Copy link
Contributor

That will be fixed by improving the URL generator to append additional parameters to the query string.

@franzliedke franzliedke removed this from the 0.1 milestone Jan 10, 2018
@gwillem
Copy link
Contributor Author

gwillem commented Jan 10, 2018

Franz, sorry for the erroneous report. However there doesn't seem to be a single "Next page" page indexed by Google. So something else is going wrong then:

https://www.google.com/search?q=site:discuss.flarum.org+inurl:%22page%3D%22

@franzliedke
Copy link
Contributor

That might mean that Google has determined the page parameter does not have any effect on page contents, probably because of the bug you mentioned here.

That is now fixed, once it is rolled out, we should check this in Google's Webmaster Console.
@tobscure Do you have access to that?

@gwillem
Copy link
Contributor Author

gwillem commented Jan 10, 2018

@franzliedke could be, however curl 'https://discuss.flarum.org/d/7585?id=7585-freeflarum-com-now-open-for-beta-access&page=2' does indeed yield the page 2 posts. Oh well, Google's ways are inscrutable.

@tobyzerner
Copy link
Contributor

tobyzerner commented Jan 11, 2018

@franzliedke I've deployed the latest changes to discuss.flarum.org. I've also granted your gmail address access to Google Analytics for discuss.flarum.org. Hopefully with this you can add discuss.flarum.org to Google Webmaster Console using GA as the verification method. Please let me know if it doesn't work.

@franzliedke
Copy link
Contributor

I have access. 😁

I already tweaked some settings, e.g. concerning the page parameter, let's see if this changes anything.

Interestingly, this graph shows the indexing status over the last year:
Search indexing status for discuss.flarum.org

What happened at the end of last April? 😱

@gwillem
Copy link
Contributor Author

gwillem commented Jan 11, 2018

Possibly penalized for duplicate content..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants