Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Status: Googlebot blocked by robots.txt #3067

Closed
the-s-a-m opened this issue Nov 15, 2020 · 8 comments
Closed

Status: Googlebot blocked by robots.txt #3067

the-s-a-m opened this issue Nov 15, 2020 · 8 comments
Assignees

Comments

@the-s-a-m
Copy link

The following resources can not be loaded by googlebot because they are blocked by default by robots.txt

/system/assets/jquery/jquery-2.x.min.js
/user/plugins/form/assets/form-styles.css
/user/plugins/login/css/login.css

Thank you for fixing this error in the default configuration

@NicoHood
Copy link
Contributor

The current robots is:

User-agent: *
Disallow: /backup/
Disallow: /bin/
Disallow: /cache/
Disallow: /grav/
Disallow: /logs/
Disallow: /system/
Disallow: /vendor/
Disallow: /user/
Allow: /user/pages/
Allow: /user/themes/
Allow: /user/images/
Allow: /

Maybe it would make sense to add Allow: /user/plugins/ to the list?
Why is it important to load the listed resources? Does google even care about javascript and css?

@the-s-a-m
Copy link
Author

the-s-a-m commented Nov 30, 2020

Google told me that the access is not possible therefore I changed it to

User-agent: *
Allow: /user/pages/
Allow: /user/themes/
Allow: /user/images/
Allow: /user/plugins/
Allow: /system/assets/jquery/jquery-2.x.min.js
Disallow: /backup/
Disallow: /bin/
Disallow: /cache/
Disallow: /grav/
Disallow: /logs/
Disallow: /system/
Disallow: /vendor/
Disallow: /user/
Allow: /

and now it is working

@rhukster
Copy link
Member

Fixed by bc22c8d

@NicoHood
Copy link
Contributor

NicoHood commented Jan 1, 2021

It seems that the issue is not fixed for the builtin jquery:
/system/assets/jquery/jquery-2.x.min.js

The reason why this happens is unclear to me. It might be that Allow: *.js$ does not include script.min.js syntax (at least from my observations, as other .js scripts work. Searching around in google i found this post that suggests the following:

User-Agent: Googlebot
Allow: .css
Allow: .js

This also allows query parameters that are often used for cache busting: https://example.com/deep/style.css?something=1. And yes, User-Agent: Googlebot is important to use here! Another option is to explicitly allow the jquery path.

@NicoHood
Copy link
Contributor

NicoHood commented Jan 2, 2021

It turns out google is having random problems loading some assets. I am not yet sure if this is related to grav or not - I assume not. However the builtin jquery must be added with an additional rule, as Disallow: /system/ blocks it (i've tested that with the google robots.txt tool).

Allow: *.css$
Allow: *.js$
Allow: /system/*.js$

@rhukster
Copy link
Member

rhukster commented Jan 4, 2021

That's pretty odd, as the allow's are coming after the disallows. But if it works in your testing, them i'm good with it. Thanks.

@NicoHood
Copy link
Contributor

NicoHood commented Jan 5, 2021

I've read, that the length of the rule matters for the googlebot. That makes totally sense why Allow: /system/*.js$ is required.

I also found out, why the other resources we unable to load: Google stops loading resources, if there are too many. It seems like a long standing crawling bug, I could work around this by enabling the assets manager. See https://support.google.com/webmasters/thread/4425254?hl=en

PR: #3129

@rhukster
Copy link
Member

rhukster commented Jan 7, 2021

I had aready commited this to 1.7 branch: 14df5a6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants