the page at docs/search_results.html should be more hidden (exclude it from search results; add to a robots.txt) #1027

Closed
ebbeck opened this Issue Mar 11, 2016 · 6 comments

Comments

Projects
None yet
5 participants
@ebbeck

ebbeck commented Mar 11, 2016

https://schema.org/docs/search_results.html

this leads to a blank page. Is this a valid schema tag?

@Dataliberate

This comment has been minimized.

Show comment
Hide comment
@Dataliberate

Dataliberate Mar 11, 2016

Contributor

Have you tried searching for something when you arrive at that page?

~Richard

On 11 Mar 2016, at 17:49, Beck Cronin-Dixon notifications@github.com wrote:

https://schema.org/docs/search_results.html

this leads to a blank page. Is this a valid schema tag?


Reply to this email directly or view it on GitHub.

Contributor

Dataliberate commented Mar 11, 2016

Have you tried searching for something when you arrive at that page?

~Richard

On 11 Mar 2016, at 17:49, Beck Cronin-Dixon notifications@github.com wrote:

https://schema.org/docs/search_results.html

this leads to a blank page. Is this a valid schema tag?


Reply to this email directly or view it on GitHub.

@danbri

This comment has been minimized.

Show comment
Hide comment
@danbri

danbri Mar 16, 2016

Contributor

Hmm, thanks @ebbeck - you found a bug in the structure of our site, I think.

It looks like the file at docs/search_results.html is not meant for people to find. Instead it is a template used in the search box at the top of all pages. I'll update the title of this issue to track the underlying problem.

Contributor

danbri commented Mar 16, 2016

Hmm, thanks @ebbeck - you found a bug in the structure of our site, I think.

It looks like the file at docs/search_results.html is not meant for people to find. Instead it is a template used in the search box at the top of all pages. I'll update the title of this issue to track the underlying problem.

@danbri danbri changed the title from search results page is empty on schema.org to the page at docs/search_results.html should be more hidden (exclude it from search results; add to a robots.txt) Mar 16, 2016

@danbri danbri closed this in 0232e7a Aug 19, 2016

@danbri

This comment has been minimized.

Show comment
Hide comment
@danbri

danbri Aug 19, 2016

Contributor

http://webschemas.org/robots.txt

Will go out with next release to schema.org.

Contributor

danbri commented Aug 19, 2016

http://webschemas.org/robots.txt

Will go out with next release to schema.org.

danbri added a commit that referenced this issue Aug 19, 2016

@Aaranged

This comment has been minimized.

Show comment
Hide comment
@Aaranged

Aaranged Mar 2, 2017

@danbri The content of http://webschemas.org/robots.txt as currently coded instructs the search engines not to index any content on webschemas.org.

The correct markup to exclude only the search results page is:
User-agent: *
Disallow: /docs/search_results.html

Aaranged commented Mar 2, 2017

@danbri The content of http://webschemas.org/robots.txt as currently coded instructs the search engines not to index any content on webschemas.org.

The correct markup to exclude only the search results page is:
User-agent: *
Disallow: /docs/search_results.html

@danbri

This comment has been minimized.

Show comment
Hide comment
@danbri

danbri Mar 2, 2017

Contributor

Thanks @Aaranged - eagle eyed as ever. In this case @RichardWallis and I decided it was best not to confuse things by having the webschemas draft site show up. Depending on whether the site is running in "official" mode or webschemas-etc mode, we serve a different robots.txt - https://github.com/schemaorg/schemaorg/blob/sdo-callisto/docs/robots-blockall.txt vs https://github.com/schemaorg/schemaorg/blob/sdo-callisto/docs/robots.txt

The goryAppEngine details are in the corresponding *.yaml files. Amongst other things, the official site version should serve a simple sitemap...

Contributor

danbri commented Mar 2, 2017

Thanks @Aaranged - eagle eyed as ever. In this case @RichardWallis and I decided it was best not to confuse things by having the webschemas draft site show up. Depending on whether the site is running in "official" mode or webschemas-etc mode, we serve a different robots.txt - https://github.com/schemaorg/schemaorg/blob/sdo-callisto/docs/robots-blockall.txt vs https://github.com/schemaorg/schemaorg/blob/sdo-callisto/docs/robots.txt

The goryAppEngine details are in the corresponding *.yaml files. Amongst other things, the official site version should serve a simple sitemap...

@AymenLoukil

This comment has been minimized.

Show comment
Hide comment
@AymenLoukil

AymenLoukil Mar 3, 2017

Hello all,

the recommended method to block indexing a page is meta tags.

We should add : <meta name="robots" content="noindex"> in the header of https://github.com/schemaorg/schemaorg/blob/sdo-callisto/docs/search_results.html

Diff between the two methods :
Robots.txt : Please don't crawl this page / folder but you can continue de show it in your index
Robots meta tags : You can visit this page /folder but you are not authorized to continue indexing it

AymenLoukil commented Mar 3, 2017

Hello all,

the recommended method to block indexing a page is meta tags.

We should add : <meta name="robots" content="noindex"> in the header of https://github.com/schemaorg/schemaorg/blob/sdo-callisto/docs/search_results.html

Diff between the two methods :
Robots.txt : Please don't crawl this page / folder but you can continue de show it in your index
Robots meta tags : You can visit this page /folder but you are not authorized to continue indexing it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment