You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm not sure this is right place/way to submit my idea. If I've made a mistake, please point me in the right direction.
Problem
It's difficult to allow search engines and only search engines to crawl your site:
User agent sniffing doesn't work because UAs can be faked
IP lookup doesn't really work because IPs may change over time and it can be hard to manage IP lists
In my case, we're using Cloudflare — one of the leading cloud cybersecurity providers — and we have to rely on bot scores or verified bot lists to decide whether we let traffic in or out. I think this is a poor solution because:
The score is subjective — it's not entirely clear how it's made up
The outcome is unknown — you can't really know which bots are going to be allowed
Solution
There could be a simple Bot-Secret header that bots add in their requests, so web servers can know if they should allow them or not. For example:
I verify my domain example.com in Google Search Console (GSC)
I generate a bot secret in the GSC admin, e.g. q02u6O6H9vVtxpIscXUNTLT7AqHJeTed
From now on, Googlebot should make requests to example.com with the following header attached:
Bot-Secret: q02u6O6H9vVtxpIscXUNTLT7AqHJeTed
In Cloudflare, I set up my WAF rule to allow requests with Bot-Secret that equals q02u6O6H9vVtxpIscXUNTLT7AqHJeTed
This way, I can know that Googlebot and only Googlebot is allowed past the WAF because only it has that secret.
If this turns into an agreed-upon standard and is supported by crawlers and cloud providers, I could:
Generate bot secrets in all search console admins and the like
List them in my cloud provider configuration or my own server middleware
Be sure that exactly these bots are being allowed and nothing and nobody else is
The text was updated successfully, but these errors were encountered:
I'm not sure this is right place/way to submit my idea. If I've made a mistake, please point me in the right direction.
Problem
It's difficult to allow search engines and only search engines to crawl your site:
In my case, we're using Cloudflare — one of the leading cloud cybersecurity providers — and we have to rely on bot scores or verified bot lists to decide whether we let traffic in or out. I think this is a poor solution because:
Solution
There could be a simple
Bot-Secret
header that bots add in their requests, so web servers can know if they should allow them or not. For example:I verify my domain example.com in Google Search Console (GSC)
I generate a bot secret in the GSC admin, e.g.
q02u6O6H9vVtxpIscXUNTLT7AqHJeTed
From now on, Googlebot should make requests to example.com with the following header attached:
Bot-Secret: q02u6O6H9vVtxpIscXUNTLT7AqHJeTed
In Cloudflare, I set up my WAF rule to allow requests with
Bot-Secret
that equalsq02u6O6H9vVtxpIscXUNTLT7AqHJeTed
This way, I can know that Googlebot and only Googlebot is allowed past the WAF because only it has that secret.
If this turns into an agreed-upon standard and is supported by crawlers and cloud providers, I could:
The text was updated successfully, but these errors were encountered: