Skip to content

feat: exclude bot traffic from egress quotas#381

Merged
juliangruber merged 9 commits intomainfrom
detect_known_bots
Oct 30, 2025
Merged

feat: exclude bot traffic from egress quotas#381
juliangruber merged 9 commits intomainfrom
detect_known_bots

Conversation

@bajtos
Copy link
Contributor

@bajtos bajtos commented Oct 23, 2025

Record bot_name in retrieval logs. When the request includes a valid AuthOrization header with an access token for a known bot, store the bot name in the retrieval logs table.

Modify the code updating egress quotas to ignore requests from bots.

Links:

Record `bot_name` in retrieval logs. When the request includes a valid
AuthOrization header with an access token for a known bot, store the bot
name in the retrieval logs table.

Modify the code updating egress quotas to ignore requests from bots.

Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
@bajtos bajtos changed the title feat: exclude bots from egress quotas feat: exclude bot traffic from egress quotas Oct 23, 2025
const DNS_ROOT = '.filbeam.io'
const TEST_WALLET = 'abc123'
const TEST_CID = 'baga123'
const BOT_TOKENS = 'bot1_secret'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of a custom string protocol, can we turn this into JSON: const BOT_TOKENS = JSON.stringify({ bot1: secret })? This way we don't need to trust the bot provide a stable name, we own the name, they own the secret only

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to what @juliangruber has proposed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I want us to manage and own the secrets. So the mapping would be {secret: botName}.

The downside is more complexity in maintaining that object in the JSON string secret.

Since you both prefer the JSON approach, I will do what you asked for.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mapping could be both {secret: botName} or {botName: secret}, that doesn't change whether we would manage and own the secrets or not. And we don't expect a bot to have multiple keys.

* @returns {string | undefined} Bot name or the access token
*/
export function checkBotAuthorization(request, { BOT_TOKENS }) {
const allowedTokens = BOT_TOKENS.split(',').map((t) => t.trim())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @juliangruber here, using JSON here instead of custom string format would be nicer

Copy link
Contributor

@pyropy pyropy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good so far 👍🏻

@bajtos
Copy link
Contributor Author

bajtos commented Oct 24, 2025

Hey, I won't be able to finish this work before my vacation. Feel free to take it over from me if you need this before I return.


const result = await env.DB.prepare(
'SELECT * FROM retrieval_logs WHERE data_set_id = ? AND response_status = 404 and CACHE_MISS IS NULL and egress_bytes IS NULL',
'SELECT * FROM retrieval_logs WHERE data_set_id = ?',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@juliangruber juliangruber marked this pull request as ready for review October 27, 2025 00:11
@juliangruber juliangruber requested a review from pyropy October 27, 2025 00:11
@juliangruber juliangruber merged commit a649831 into main Oct 30, 2025
8 checks passed
@juliangruber juliangruber deleted the detect_known_bots branch October 30, 2025 10:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants