feat: exclude bot traffic from egress quotas#381
Conversation
Record `bot_name` in retrieval logs. When the request includes a valid AuthOrization header with an access token for a known bot, store the bot name in the retrieval logs table. Modify the code updating egress quotas to ignore requests from bots. Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
piece-retriever/test/request.test.js
Outdated
| const DNS_ROOT = '.filbeam.io' | ||
| const TEST_WALLET = 'abc123' | ||
| const TEST_CID = 'baga123' | ||
| const BOT_TOKENS = 'bot1_secret' |
There was a problem hiding this comment.
instead of a custom string protocol, can we turn this into JSON: const BOT_TOKENS = JSON.stringify({ bot1: secret })? This way we don't need to trust the bot provide a stable name, we own the name, they own the secret only
There was a problem hiding this comment.
FWIW, I want us to manage and own the secrets. So the mapping would be {secret: botName}.
The downside is more complexity in maintaining that object in the JSON string secret.
Since you both prefer the JSON approach, I will do what you asked for.
There was a problem hiding this comment.
Mapping could be both {secret: botName} or {botName: secret}, that doesn't change whether we would manage and own the secrets or not. And we don't expect a bot to have multiple keys.
piece-retriever/lib/request.js
Outdated
| * @returns {string | undefined} Bot name or the access token | ||
| */ | ||
| export function checkBotAuthorization(request, { BOT_TOKENS }) { | ||
| const allowedTokens = BOT_TOKENS.split(',').map((t) => t.trim()) |
There was a problem hiding this comment.
I agree with @juliangruber here, using JSON here instead of custom string format would be nicer
|
Hey, I won't be able to finish this work before my vacation. Feel free to take it over from me if you need this before I return. |
|
|
||
| const result = await env.DB.prepare( | ||
| 'SELECT * FROM retrieval_logs WHERE data_set_id = ? AND response_status = 404 and CACHE_MISS IS NULL and egress_bytes IS NULL', | ||
| 'SELECT * FROM retrieval_logs WHERE data_set_id = ?', |
Record
bot_namein retrieval logs. When the request includes a valid AuthOrization header with an access token for a known bot, store the bot name in the retrieval logs table.Modify the code updating egress quotas to ignore requests from bots.
Links: