Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"robots.txt" not working #5585

Closed
lcgkm opened this issue Oct 23, 2018 · 4 comments
Closed

"robots.txt" not working #5585

lcgkm opened this issue Oct 23, 2018 · 4 comments
Labels

Comments

@lcgkm
Copy link

lcgkm commented Oct 23, 2018

We found "robots.txt" file in Vault: https://vault.example.net/ui/robots.txt

File contents:
http://www.robotstxt.org
User-agent: *
Disallow: *

But It's not working because the request for the URI, "/robots.txt", returns 404 error.
If "/robots.txt" returns 3XX STATE CODE and the location is "/ui/robots.txt" (or "robots.txt" file exists in root "/"), then it will be working.

@meirish
Copy link
Contributor

meirish commented Oct 29, 2018

Hi @lcgkm ! I think this was an oversight on our part since it part of the default in the UI framework we use. Can you describe your use case a bit more? I'm more inclined to remove it entirely since if we added a redirect it would only be present if the UI was enabled.

@lcgkm
Copy link
Author

lcgkm commented Oct 30, 2018

How to create a /robots.txt file
Where to put it
The short answer: in the top-level directory of your web server.

The longer answer:

When a robot looks for the "/robots.txt" file for URL, it strips the path component from the URL (everything from the first single slash), and puts "/robots.txt" in its place.

For example, for "http://www.example.com/shop/index.html, it will remove the "/shop/index.html", and replace it with "/robots.txt", and will end up with "http://www.example.com/robots.txt".

So, as a web site owner you need to put it in the right place on your web server for that resulting URL to work. Usually that is the same place where you put your web site's main "index.html" welcome page. Where exactly that is, and how to put the file there, depends on your web server software.

Remember to use all lower case for the filename: "robots.txt", not "Robots.TXT.

Reference:
http://www.robotstxt.org/robotstxt.html

So, the search engine, like Google, will check https://vault.example.net/robots.txt
NOT https://vault.example.net/ui/robots.txt.
We don't need add a redirect, but we need put 'robots.txt' in the right place

@meirish
Copy link
Contributor

meirish commented Oct 30, 2018

Yep I’m familiar with robots.txt. The file is part of the ui code though (at least for now). Exposing vault publically is not generally recommended, so I was asking more about why you’re doing that (if that’s what’s happening) so that we can solve the issue for you rather than jumping to the implementation. In the event of no robots.txt, a crawler wouldn’t be authorized and there’s no site map so they wouldn’t know other endpoints to visit.

@lcgkm
Copy link
Author

lcgkm commented Oct 30, 2018

Exposing vault publically is not generally recommended

Yes. I totally agree with you. It's just an assumption. We assume someone took some mistake. And as a result, Vault is exposed to public network. (this is not a present/real situation.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants