-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linkedin, Wayback Machine, and W3C Validator issues - no indexing #1639
Comments
Oh, and this is also true for the WayBack machine ... it refuses to index grav-controlled pages but indexes the static version of pages just fine. |
Quite odd, I will investigate this! |
I'm pretty sure this is related to linked in caching. For example the getgrav.org site does show when shared in linkedin, but my local machine will not. Everything i've found seems to indicate that linked in caches and stores that cache for 7 days, so no matter what you do after that to resolve an issue, it won't update. It doesn't seem that linkedin has an opengraph validator, but according to this, the page is good: http://opengraphcheck.com/result.php?url=http%3A%2F%2Fideum.com%2Fnews#.Wa9MldOGPUY I'm not sure what it could be. maybe try putting |
I too looked up Linkedin caching; before posting about this I had already waited the 7 days so I just don't see how that can be the issue. I also tried the recommended addition of a url param, but that didn't work. Further Testing The Wayback Machine instantly accepted it as did Linkedin. So that page, which is outside of grav control, though identical to an existing grav-controlled page (news), was indexed by both services. I tested numerous grav-controlled pages that we had adjusted the slugs for (so as far as LinkedIn was concerned they were effectively new pages) and still no joy. So, again, don't quite see how Linkedin caching is the issue. I do really appreciate your looking into this. It is indeed very odd. |
Oh, I always test my ogp metadata via Facebook's debugger ;) |
Is it possible that you have some security, .htaccess rule, or robots.txt file in place that could be stopping it? I'm not able to test your site via: https://validator.w3.org I get an IO error for some reason, but |
FYI:
|
I don't have any agent blocking in my htaccess, nor do I have any denials in my robots.txt or Apache conf files. I have run files and images through google robots.txt checker and all are allowed. I have run the permissions exactly as they appear in the Grav docs. (see my original post) I will, in good faith, walk line by line through the htaccess and see if I can turn up anything, but remember - all the faux files and folders I created would fall under any htaccess prohibitions and I have no grav-specific prohibitions anywhere. It is a real head-scratcher. |
I do feel like it has something to do with the fact that the w3c checker can't reach your site. When that is working, i think linkedin will start working too. |
Also you might want to check with your hosting provider as .htaccess rules can be in place from their webserver setup and just 'extended' by your local htaccess. |
Agreed. But wIth regard to second suggestion: we run a clean AWS instance that we create, spin up, and provision- no web host nonsense ;) |
I tested in Screaming Frog and I got a 200 response code for all my pages and images. |
It is definitely not the Linkedin cache - I made a new A name record which would mean a new (unshared) url and pointed it to a copy of the grav site with a clean and clear htaccess file and changed some slugs and still nothing. I am giving up and will be making a static version of the site with the original htaccess i was using for the grav version of the site. If that works, I will remove grav from the production server and just use it locally or jettison grav altogether. though that defeats the purpose of having a cms which was for others to be able to edit online. Thanks for your help. |
I still don't think it's actually a Grav issue itself. Getgrav.org is using an ubuntu 14.04 linode VPN with serverpilot.io managing nginx/apache/php and Linkedin is picking up the standard OG metadata we added with Grav's built-in metadata support. Maybe its a plugin that's affecting things, but it's more likely some server configuration that is blocking both the w3c validator and linkedin. It seems linked in is definitely more picky about things than facebook/twitter, but still, it should work like it does with getgrav.org. I mean even if there are no OG meta tags, linked in should be able to give a basic preview of the site, it's like it can't even reach it, or is blocked. |
I am not certain it is Grav itself either, but if not, it is almost certainly an interaction between something specific to Grav and a very vanilla ubuntu server setup. I just tested a clean Wordpress install and a clean Grav install. The former worked fine, the latter did not. So, it may not be Grav itself, but Grav is somehow part of the issue. I (and probably you) don't have time to hunt down the precise Grav-server interaction that is causing the problem. : ) If I do find the issue I will let you know. |
Blackhole plugin would not create a full static site (stopped at about 10 pages). So ... The non-indexing issue seems to be an interaction between Grav and AWS servers. Still trying to track down the precise interaction creating the issue. No need to respond unless you have a suggestion. I am just trying to record all my tests here. |
Disabled all plugins ... no joy. |
Created new instance of site on new server. Pointed subdomain to it. |
And I can reach '/user/pages/01.home' by providing that path (no IO - Error). |
So, after extensive testing, every folder and file is reachable (no IO - Error) except when the url is used rather than the server paths. That is, the moment '/user/pages/01.home' becomes '/home' or '/'. Yet, every other site on that testing server works via url or server path. This seems related to: https://discourse.getgrav.org/t/problem-with-https-validator-w3-org-nu/4176 |
Could you try disabling the https://learn.getgrav.org/basics/grav-configuration#debugger |
I have solved the issue, but still don't quite get the why of it or the scope. |
It would be good to know if its related to the
|
Yes, If I understood what you wanted of me. |
FYI - Grav threw an error when I turned on Gzip compression. But there was still a small save button in the upper right corner so I clicked through the error and after that all was well. Update - Fix is on production and working great. W3C Validator, LinkedIn and Wayback Machine are all happy. |
Thanks, @rhukster! That should save someone else a headache. |
I'm going to close this issue, but have marked it for documenation. |
I am having the same issue as referenced in the Grav Community Forum archive:
https://discourse.getgrav.org/t/linkedin-url-sharing-not-working/1818
So that you need not refer to it, here is the content of that post:
We are using OG matedata on our pages for social sharing and Facebook and Twitter are working OK. We have an issue/problem with LinkedIn sharing - it does not pull content from defined metadata or page itself.
We checked a combination of metadata by adding ?1 (?2, etc.) to end of URL to avoid LinkedIn cache issue but there was no progress. Then, we saved one of the pages source code as an html file and put it back on the server - and it worked! LinkedIn sharing was reading OG metadata and it was OK (as it was for other socil networks).
Does anyone have any idea what should we do to enable Linkedin URL sharing with Grav CMS?
Best regards,
Vladimir
Like Vladimir (above), I can’t share pages on Linkedin. I know it is not an Apache conf file issue, or an htaccess issue, or a bad code issue because I can create a static version of the page and it shares just fine.
I do not have caching or compression enabled.
To test, I can type the following urls into LinkedIn and see if a page preview appears:
The first url is a grav-controlled url, the latter is a static html reproduction of the news page from grav which is at the site root.
Server is a typical ubuntu mamp style setup (without the MySQL) hosted on AWS. I regularly check permissions and make sure these bases are covered:
Any help appreciated.
Thanks!
The text was updated successfully, but these errors were encountered: