Skip to content

feat: add receipts to the EgressTrackingService on retrieval#258

Merged
volmedo merged 7 commits intovic/feat/egress-tracking-servicefrom
vic/feat/retrieval-egress-tracking
Oct 1, 2025
Merged

feat: add receipts to the EgressTrackingService on retrieval#258
volmedo merged 7 commits intovic/feat/egress-tracking-servicefrom
vic/feat/retrieval-egress-tracking

Conversation

@volmedo
Copy link
Copy Markdown
Contributor

@volmedo volmedo commented Sep 25, 2025

Ref: #174

Depends on: #245 and #246

Attach a ReceiptLogger to the space/content/retrieve handler to collect retrieval receipts and add them to the EgressTrackingService to be stored, batched and sent in space/egress/track invocations.

@volmedo volmedo requested review from alanshaw and frrist September 25, 2025 11:29
@volmedo volmedo self-assigned this Sep 25, 2025
Copy link
Copy Markdown
Member

@frrist frrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see us move the record keeping of receipts closer to where they are produced. wdyt?

}

func withReceiptLogger(ets *egresstracking.EgressTrackingService) ucanretrieval.Option {
return ucanretrieval.WithReceiptLogger(func(ctx context.Context, rcpt receipt.AnyReceipt, inv invocation.Invocation) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this option is convenient I am unsure this is the right usage for it. Would we be open to performing this record keeping in the ucan handler instead? i.e. https://github.com/storacha/piri/blob/main/pkg/service/retrieval/ucan/space_content_retrieve.go#L32

This would provide us more control of which receipts get added to the batch. Based on my read of things right now, we'll be appending all receipts produced for a retrieval, which includes failures?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it is a bit weird this logic is so far from the actual handler. However, the issue is that receipts are not issued in the handlers, but in the ucanto server, i.e. handlers don't have access to them.

I can assure you this is exactly the right usage for WithReceiptLogger because I implemented it specifically for this 😄. I agree that appending failure receipts might not be great. We can probably pass more information to the receipt logging function from the ucanto server so that it can make better decisions. I'll revisit the current implementation.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, that is the result, which is part of the receipt. But the receipt contains some other stuff required for the UCAN flow (see https://github.com/storacha/go-ucanto/blob/main/core/receipt/datamodel/receipt.ipldsch).

The handler handles the request and produces a result, and the ucanto server takes care of issuing a proper receipt by adding the other bits, do the signing, etc. If we want to get the receipt as it is sent to the client, we need to hijack the server process to fetch it, and this is what I did here by letting the user pass a callback that the server invokes right before sending the receipt back to the client.

Comment on lines +76 to +80
if err := ets.AddReceipt(context.Background(), retrievalRcpt); err != nil {
log.Errorw("adding receipt to egress tracking service", "error", err)
return
}
}()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I liked this better when we were passing the context from the caller here 😄 can we add it back?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, right? The issue is that the request context is cancelled once the request is fully handled. Since this is done asynchronously near the end of the request flow, if adding the receipt causes a new egress track task to be added to the job queue, the context it gets is likely to be cancelled already when the worker gets to it. During my testing I got "context cancelled" errors from the tasks in the queue.

All in all, I think it makes sense for this to not be tied to the request context, as it is a somewhat independent action. Also, since the task will be retried a number of times and then dropped I didn't see very valuable to set a timeout. WDYT?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that spawning a goroutine per request with context.Background() effectively turns the Go scheduler into an unbounded queue. Under load this can create memory pressure, especially since there’s no back-pressure on the caller and AddReceipt does disk I/O.

Because net/http already runs each request in its own goroutine, I think it’s reasonable to block on recording the receipt. If we serve the data without recording it, the operator doesn’t get compensated, so it may be better not to serve it at all. If AddReceipt ever proves too heavy, the safer alternative would be a bounded worker pool.

Not blocking, just leaving this note for future reference.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your concerns. A bounded pool or any other rate limiting mechanism would be the way to ensure there is some back-pressure on clients, but happy to remove the goroutine in the meantime.

About making the request fail if adding the receipt fails, there is a problem with the current implementation. AddReceipt enqueues a task in a jobqueue if there is a new batch to send, so this is happening asynchronously. If we think the client should not get its content delivered if recording the receipt fails (which is a fair point), then we shouldn't use the jobqueue and just implement a synchronous version of the task.

I created #263 to help keeping this in mind.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now lets remove the go routine.

As far as receipt recording failures:

  • If the node is unable to Append a receipt to the store, due to lack of space, failure to rotate, etc. that's when I'd want to abort serving the data of the request. Since the node is unable to record a record of the content it served for payment.
  • Failing to report a batch to the egress service is also not ideal, but I view as a less critical error since a record of the content served will still exist on the filesystem, and manual intervention could be taken to "re-report" it for payment.

Comment on lines +76 to +80
if err := ets.AddReceipt(context.Background(), retrievalRcpt); err != nil {
log.Errorw("adding receipt to egress tracking service", "error", err)
return
}
}()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that spawning a goroutine per request with context.Background() effectively turns the Go scheduler into an unbounded queue. Under load this can create memory pressure, especially since there’s no back-pressure on the caller and AddReceipt does disk I/O.

Because net/http already runs each request in its own goroutine, I think it’s reasonable to block on recording the receipt. If we serve the data without recording it, the operator doesn’t get compensated, so it may be better not to serve it at all. If AddReceipt ever proves too heavy, the safer alternative would be a bounded worker pool.

Not blocking, just leaving this note for future reference.

@frrist frrist added this to the v0.0.15 milestone Sep 30, 2025
@volmedo
Copy link
Copy Markdown
Contributor Author

volmedo commented Oct 1, 2025

merging this into #246 before everything gets merged into #245. Happy to continue the discussion there.

@volmedo volmedo merged commit d9298af into vic/feat/egress-tracking-service Oct 1, 2025
9 of 12 checks passed
@volmedo volmedo deleted the vic/feat/retrieval-egress-tracking branch October 1, 2025 14:42
volmedo added a commit that referenced this pull request Oct 3, 2025
Ref: #174 

Implement an `EgressBatchStore` that allows appending
`space/content/retrieve` receipts to a batch. When the batch is above
the max, it is flushed automatically.

Things that are missing and will be implemented in follow up PRs:
- sending the batch to the egress tracking service in a
`space/egress/track` invocation
- exposing an `http.FileSystem` and adding the corresponding
`http.FileServer` endpoint

Once these are done, I'll hook it up in Alan's `space/content/retrieve`
handler.

---------

Co-authored-by: ash <alan@storacha.network>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants