New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs/design: Storage Node Graceful Exit #2734
Conversation
|
||
field node_id blob | ||
field path blob | ||
field peice_info blob |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit pick .. piece.. also instead of dt mention 'date'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed _dt to _at to be consistent
rpc Initiate(stream InitiateRequest) returns (stream InitiateResponse) {} | ||
} | ||
|
||
message InitiateRequest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a comment, but a thought... when I first read 'initiaterequest' didn't think it was referring to graceful exit , so what do you think, of naming it 'GracefulExitInitiateRequest'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to GracefulExit rpc InitiateExit(InitiateExitRequest) returns (InitiateExitResponse)
field completed_dt timestamp ( updateable ) | ||
) | ||
``` | ||
- Add `PieceAction` field to, `cache.FindStorageNodesRequest`. Update `cache.FindStorageNodesWithPreferences` to ignore exiting nodes for uploads and repairs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of piece action, can u query to find nodes that have only 'exit_initiated_dt' null?, that way no need to add a new piece action.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new PieceAction field indeed seems unnecessary. The caller can take advantage of the existing ExcludedNodes
field to list the current nodes from the pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Not sure where I was going with this one. Document updated.
- Pushes the pieces to the storage node identified in the order using `ecclient` | ||
- Sends the signed new node response to the satellite via a new `CommitPiece` method (uses `metainfo.UpdatePieces`). | ||
- Updates `bytes_deleted` with the number of bytes deleted and sets `completed_dt` to current time | ||
- Execution intervals and batch sizes should be configurable | ||
|
||
## Open Questions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"What happens if a piece is deleted when the piece is in the process of being transferred from the exiting node to other node in the network?"
I believe if I understood this correctly, the exiting storage node shall only honor download request and not delete requests or new upload request, Infact exiting SN should internal be made to accept only the download and reject accepting new uploads and/or delete requests
- When a Storage Node is exiting the network gracefully I want the satellite to have the ability to track how much egress they used for exiting so that we do not pay them for that bandwidth. | ||
- When a Storage Node is in the process of gracefully exiting the network, it shall continue to participate in the requested audits and uptimes checks. | ||
- When a Storage Node is in the process of gracefully exiting the network, it shall continue to honor to download requests. | ||
- The Storage Node keep a separate and detailed metric of network bandwidth used to serve the data to clients vs bandwidth used for graceful exit. The Storage Node shall NOT get paid for bandwidth used for graceful exit but shall get paid for the bandwidth used to serve data to clients(downloads, audits, uptime checks etc...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we are paying for audits, uptime checks etc...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For sure, we are not paying for uptime checks. We should double-check if we are paying for audits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to downloads and audit egress
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For sure, we are not paying for uptime checks. We should double-check if we are paying for audits.
We are paying for audits traffic.
- Initiates the exit by setting `nodes.exit_initiated` to current time | ||
- ``` | ||
service GracefulExit { | ||
rpc Initiate(stream InitiateRequest) returns (stream InitiateResponse) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why stream
? Do we plan to keep it open during the whole exit process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oversight. Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still open questions, TBD notes and the document is not simple enough.
- Updates `bytes_deleted` with the number of bytes deleted | ||
- Deletes the pieces that were successfully moved | ||
- On failure (ex. satellite is unavailable), successful orders limits stored in `exit_orders` should be reprocessed on the next iteration | ||
- Execution intervals and batch sizes should be configurable | ||
|
||
## Open Questions | ||
|
||
- What happens if we are doing a repair and a graceful exit on the same segment? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still a lot of critical open questions.
field completed_exit_signature blob | ||
) | ||
|
||
model exit_orders { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this table?
…r validaation sequence diagram
Reworked the document into multiple documents. Closing in favor of a new pull request |
What: When a Storage Node wants to leave the network but does not want to lose their escrow we need to have a mechanism for them to exit the network “gracefully”.
Why: Give Storage Nodes a mechanism to leave the network while receiving their escrows, and reduce repair caused by node churn on satellites.
Please describe the tests:
Please describe the performance impact:
Code Review Checklist (to be filled out by reviewer)