New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
robots.txt doesn't behave as expected #1948
Comments
Hmmm … 🤔 I have no idea why we've set it that way. Maybe @rvagg remembers? |
nope! and the use of /dist/ instead of /download/ also rattles my bones, I desperately want /dist/ to be deprecated. I think robots.txt is up for complete revision, someone suggest a new one that makes sense and make it so. https://github.com/nodejs/nodejs.org/blob/master/static/robots.txt We had a discussion recently about the best entrypoint to the docs and I think we had differing opinions. Some people like /docs/, some do it through /api/ (I do). The docs themselves suggest doing it through /docs/latest-*/api/ is the "official" way. |
Hi, I believe the correct steps to ensure proper indexation are:
With your permission, I could make a proposal for an XML sitemap based on the output of one of the third-party tools Google suggests (although most of them seem paid or dead) at https://support.google.com/webmasters/answer/183668?hl=en Please let me know, I'd be happy to help. |
@carlos-ds Thank you for help and initiative. It'd be great if you investigate it and create PR. |
Ok thanks @alexandrtovmach . I suggest the following approach:
Does that sound like a good approach for this issue? I'd appreciate your feedback. |
@richardlau I was rechecking this issue and nodejs.org/robots.txt gives 404. Which is reference here (https://github.com/nodejs/nodejs.org/blob/main/public/robots.txt). Is this an nginx bug? Because opening nodejs.org/manifest.json works (https://github.com/nodejs/nodejs.org/blob/main/public/manifest.json) |
@ovflowd It's currently aliased -- I presume it has moved as part of the Next.js rewrite? |
true, I moved from the static folder to the root of the public folder. Can we maybe remove these alias from there? |
I made an update in this PR (nodejs/build#3139) I still believe we could try that PR out. (To see if everything is ✅) |
We can also for now make a hot-fix to the nginx and remove those aliases. But either way, I feel confident enough that the new nginx config is working. You can create a temporary file and use |
Closing as fixed. |
Our
robots.txt
file currently contain this:I'm not sure of the reason for disallowing
/docs/
, but whatever the case, I don't think it has the intended effect. Instead of removing it from Google, it just seems to remove Googles ability to show any meaningful content related to the link - but it still links to sites under/docs/
.Example: A search for "node.js util.inherits" shows this:
If you follow the "Learn why" link, you're told that:
The text was updated successfully, but these errors were encountered: