Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bot verification using shared secret header #2711

Closed
hdodov opened this issue Jan 19, 2024 · 1 comment
Closed

Bot verification using shared secret header #2711

hdodov opened this issue Jan 19, 2024 · 1 comment

Comments

@hdodov
Copy link

hdodov commented Jan 19, 2024

I'm not sure this is right place/way to submit my idea. If I've made a mistake, please point me in the right direction.

Problem

It's difficult to allow search engines and only search engines to crawl your site:

  • User agent sniffing doesn't work because UAs can be faked
  • IP lookup doesn't really work because IPs may change over time and it can be hard to manage IP lists

In my case, we're using Cloudflare — one of the leading cloud cybersecurity providers — and we have to rely on bot scores or verified bot lists to decide whether we let traffic in or out. I think this is a poor solution because:

  • The score is subjective — it's not entirely clear how it's made up
  • The outcome is unknown — you can't really know which bots are going to be allowed

Solution

There could be a simple Bot-Secret header that bots add in their requests, so web servers can know if they should allow them or not. For example:

  1. I verify my domain example.com in Google Search Console (GSC)

  2. I generate a bot secret in the GSC admin, e.g. q02u6O6H9vVtxpIscXUNTLT7AqHJeTed

  3. From now on, Googlebot should make requests to example.com with the following header attached:

    Bot-Secret: q02u6O6H9vVtxpIscXUNTLT7AqHJeTed
  4. In Cloudflare, I set up my WAF rule to allow requests with Bot-Secret that equals q02u6O6H9vVtxpIscXUNTLT7AqHJeTed

This way, I can know that Googlebot and only Googlebot is allowed past the WAF because only it has that secret.


If this turns into an agreed-upon standard and is supported by crawlers and cloud providers, I could:

  1. Generate bot secrets in all search console admins and the like
  2. List them in my cloud provider configuration or my own server middleware
  3. Be sure that exactly these bots are being allowed and nothing and nobody else is
@reschke
Copy link
Contributor

reschke commented Jan 19, 2024

The git repo is for tracking issues in specs we work on (or have worked on).

For discussions, please use the WG's mailing list: https://lists.w3.org/Archives/Public/ietf-http-wg/

(That said: why not simply use HTTP auth for this?)

@mnot mnot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants