Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove old URLs from Google, Bing, etc. #96

Open
alanorth opened this issue May 13, 2015 · 5 comments
Open

Remove old URLs from Google, Bing, etc. #96

alanorth opened this issue May 13, 2015 · 5 comments
Assignees

Comments

@alanorth
Copy link
Member

Our current nginx configuration bends over backwards to serve https://dspace.ilri.org and https://mahider.ilri.org.

We only need to handle those links for as long as they keep coming, so should actually be pro-active and de-register links containing those domains from search indexes (ie in Google Webmaster tools).

@alanorth alanorth self-assigned this May 13, 2015
@alanorth
Copy link
Member Author

Verified as an owner of dspace.ilri.org and mahider.ilri.org and submitted a change of address request for cgspace.cgiar.org. Currently the old URLs have nearly 10,000 links in Google's index!

screen shot 2015-05-14 at 23 56 15

This is great because the handles are the same, but the domain name will now be directed to CGSpace.

@alanorth
Copy link
Member Author

Still waiting for these URLs to disappear. inurl:dspace.ilri.org shows 5,300 results right now.

Maybe I should also add the X-Robots-Tag: none HTTP Header...

@alanorth
Copy link
Member Author

Added to nginx vhost template: ilri/rmg-ansible-public@15a2bd1

@alanorth
Copy link
Member Author

alanorth commented Oct 2, 2015

Currently there are 1,600 items in Google results for dspace.ilri.org.

selection_018

I need to re-visit the Google webmaster tools to see if there's anything else I can do to remove these.

@alanorth
Copy link
Member Author

alanorth commented Oct 2, 2015

The redirect to cgspace.cgiar.org has been completed since May, so I'm not sure why these links are still in the index. I have added the CGSpace sitemap to dspace.ilri.org's console to see if Google will learn that these have moved.

Currently requests to dspace.ilri.org are handled like this:

$ http --print Hh https://dspace.ilri.org/handle/10568/33832
GET /handle/10568/33832 HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: dspace.ilri.org
User-Agent: HTTPie/0.9.2

HTTP/1.1 301 Moved Permanently
Connection: keep-alive
Content-Length: 178
Content-Type: text/html
Date: Fri, 02 Oct 2015 08:34:03 GMT
Location: https://cgspace.cgiar.org/handle/10568/33832
Server: nginx
X-Robots-Tag: none

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant