Coolguyzone/chore/ai rules robots#17796
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
| Allow: / | ||
| Content-Signal: ai-train=yes, search=yes, ai-input=yes | ||
|
|
||
| User-agent: Claude-Web |
There was a problem hiding this comment.
| User-agent: Claude-Web | |
| User-agent: ClaudeBot |
| User-agent: Bytespider | ||
| Allow: / | ||
| Content-Signal: ai-train=yes, search=yes, ai-input=yes | ||
|
|
||
| User-agent: CCBot | ||
| Allow: / | ||
| Content-Signal: ai-train=yes, search=yes, ai-input=yes |
There was a problem hiding this comment.
Do we want to explicitly allow trainers to train on our docs? I kind of feel like no.
There was a problem hiding this comment.
i had the same feeling but then i thought if they don't train on our current docs, then LLMs will keep showing people old ways of using Sentry which might not be ideal 😅
There was a problem hiding this comment.
Ah, good point. I guess train is a wide range of things. It's too early in content-signal land to be explicit. Sounds like content-signal is all experimental and not adopted anyway, so not sure it'll actually prevent any bot from taking an action.
There was a problem hiding this comment.
Yeah, I had this discussion with Matt, it feels like are direction so far has been to optimize the docs for LLM usage, so training seemed like the right call. I'm open to hearing any objections though.
| Sitemap: ${isDeveloperDocs ? 'https://develop.sentry.dev/sitemap.xml' : 'https://docs.sentry.io/sitemap.xml'} | ||
| Sitemap: ${sitemap} | ||
|
|
||
| User-agent: * |
There was a problem hiding this comment.
@coolguyzone since we're wildcard allowing every bot, I'm not sure it makes sense to add in more specific bots that we're also carte blanche allowing? Beside the trainers, which we might want to update in what they're allowed to do, I wonder if we remove the explicits of each bot? Otherwise, we're still missing some, like
- ChatGPT-User
- PerplexityBot
- Perplexity-User
- Meta-ExternalAgent
- cohere-ai
- Diffbot
And that list will continue to get longer/need updating as more bots are created.
There was a problem hiding this comment.
Yeah, in the future it might make sense if we are adding different rules for different bots but you're right, for now it makes sense just to collapse everything to a wildcard.
DESCRIBE YOUR PR
This PR addresses some of the issue found here: https://www.mintlify.com/score/sentry
These updates to robots.txt add AI bot rules and content signals to help agents better navigate our docs.
More on content signals: https://contentsignals.org/
Some details of this change:
(previously it had no rule at all, which is technically ambiguous)
handler, which is fine since it's a build-time env var
IS YOUR CHANGE URGENT?
Help us prioritize incoming PRs by letting us know when the change needs to go live.
SLA
Thanks in advance for your help!
PRE-MERGE CHECKLIST
Make sure you've checked the following before merging your changes:
LEGAL BOILERPLATE
Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. and is gonna need some rights from me in order to utilize my contributions in this here PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.
EXTRA RESOURCES