Check visitors for the google bots' user agent strings #2

dfyx · 2015-03-09T02:07:58Z

Instead of constantly googling (which needs a lot of resources and gives you a delayed result), you could just check all visitors for the user agent strings google uses: https://support.google.com/webmasters/answer/1061943?hl=en

Once you see a bot, you could probably still search for the page just to confirm that it got on the index (I'd guess it takes a few hours but I'm not sure)

AyrA · 2015-03-09T07:59:17Z

Please don't do that. I use the googlebot user agent all the time. It allows me to view some content on forums, for which other users would need to register. Either use a reverse DNS lookup or an unofficial IP list. If I visit the site with my browser it would get deleted.

dfyx · 2015-03-09T12:05:41Z

Obviously don't use the user agent as the only indicator. But it could tell you when to check the index.

fabiosussetto · 2015-03-10T09:42:36Z

visited by googlebot !== indexed in google

ErtugKaya · 2015-03-10T10:05:03Z

@dfyx meant that unless the page is visited by a Google bot, it cannot be indexed by Google. Not viceversa.

Checking if the page is indexed, should be started only after a Google bot visits it. That makes sense to me.

fabiosussetto · 2015-03-10T10:06:41Z

Got it, thanks.

AyrA · 2015-03-10T10:08:28Z

@ErtugKaya exactly. The page needs to be visited at least once from a crawler to be indexed. After that it gets tricky, because crawler could also index other search engines and then you might have a visit from search engine A, but your site might also turn up in search engine B.

Microsoft did that at least once

remram44 · 2015-03-10T14:30:00Z

So this is just an optimization? Once the site has been visited, it might take a while to appear in the index, so you'll do many queries anyway.

AyrA · 2015-03-11T12:55:53Z

if you query search engines you can do this once every 12 hours, also most browsers send the referrer header to your site. If a referrer points to a search engine, do the query for your site, and if it appears, then delete the site.

mroth · 2015-03-13T17:03:36Z

This would be an interesting technical optimization -- but in this particular case the choice to Google constantly was an intentional conceptual decision, not a technical one. (i.e. part of the crux of what made the idea enjoyable for me was the "a website that googles itself constantly" humor)

If someone would like to make this and others will find it useful, please open a new issue as a Pull Request and I will either merge it to an alternate branch or link to the fork in the README.

mroth closed this as completed Mar 13, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check visitors for the google bots' user agent strings #2

Check visitors for the google bots' user agent strings #2

dfyx commented Mar 9, 2015

AyrA commented Mar 9, 2015

dfyx commented Mar 9, 2015

fabiosussetto commented Mar 10, 2015

ErtugKaya commented Mar 10, 2015

fabiosussetto commented Mar 10, 2015

AyrA commented Mar 10, 2015

remram44 commented Mar 10, 2015

AyrA commented Mar 11, 2015

mroth commented Mar 13, 2015

Check visitors for the google bots' user agent strings #2

Check visitors for the google bots' user agent strings #2

Comments

dfyx commented Mar 9, 2015

AyrA commented Mar 9, 2015

dfyx commented Mar 9, 2015

fabiosussetto commented Mar 10, 2015

ErtugKaya commented Mar 10, 2015

fabiosussetto commented Mar 10, 2015

AyrA commented Mar 10, 2015

remram44 commented Mar 10, 2015

AyrA commented Mar 11, 2015

mroth commented Mar 13, 2015