Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the page at docs/search_results.html should be more hidden (exclude it from search results; add to a robots.txt) #1027

Closed
ebbeck opened this issue Mar 11, 2016 · 6 comments

Comments

@ebbeck
Copy link

@ebbeck ebbeck commented Mar 11, 2016

https://schema.org/docs/search_results.html

this leads to a blank page. Is this a valid schema tag?

@Dataliberate
Copy link
Contributor

@Dataliberate Dataliberate commented Mar 11, 2016

Have you tried searching for something when you arrive at that page?

~Richard

On 11 Mar 2016, at 17:49, Beck Cronin-Dixon notifications@github.com wrote:

https://schema.org/docs/search_results.html

this leads to a blank page. Is this a valid schema tag?


Reply to this email directly or view it on GitHub.

@danbri
Copy link
Contributor

@danbri danbri commented Mar 16, 2016

Hmm, thanks @ebbeck - you found a bug in the structure of our site, I think.

It looks like the file at docs/search_results.html is not meant for people to find. Instead it is a template used in the search box at the top of all pages. I'll update the title of this issue to track the underlying problem.

@danbri danbri changed the title search results page is empty on schema.org the page at docs/search_results.html should be more hidden (exclude it from search results; add to a robots.txt) Mar 16, 2016
@danbri danbri closed this in 0232e7a Aug 19, 2016
@danbri
Copy link
Contributor

@danbri danbri commented Aug 19, 2016

http://webschemas.org/robots.txt

Will go out with next release to schema.org.

danbri added a commit that referenced this issue Aug 19, 2016
@Aaranged
Copy link

@Aaranged Aaranged commented Mar 2, 2017

@danbri The content of http://webschemas.org/robots.txt as currently coded instructs the search engines not to index any content on webschemas.org.

The correct markup to exclude only the search results page is:
User-agent: *
Disallow: /docs/search_results.html

@danbri
Copy link
Contributor

@danbri danbri commented Mar 2, 2017

Thanks @Aaranged - eagle eyed as ever. In this case @RichardWallis and I decided it was best not to confuse things by having the webschemas draft site show up. Depending on whether the site is running in "official" mode or webschemas-etc mode, we serve a different robots.txt - https://github.com/schemaorg/schemaorg/blob/sdo-callisto/docs/robots-blockall.txt vs https://github.com/schemaorg/schemaorg/blob/sdo-callisto/docs/robots.txt

The goryAppEngine details are in the corresponding *.yaml files. Amongst other things, the official site version should serve a simple sitemap...

@AymenLoukil
Copy link

@AymenLoukil AymenLoukil commented Mar 3, 2017

Hello all,

the recommended method to block indexing a page is meta tags.

We should add : <meta name="robots" content="noindex"> in the header of https://github.com/schemaorg/schemaorg/blob/sdo-callisto/docs/search_results.html

Diff between the two methods :
Robots.txt : Please don't crawl this page / folder but you can continue de show it in your index
Robots meta tags : You can visit this page /folder but you are not authorized to continue indexing it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.