-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: lotus-provider: SDR Sealing pipeline #11534
Conversation
5c2bb1a
to
a4435d2
Compare
6cd75fe
to
aa5d1ce
Compare
e1fc241
to
b471e16
Compare
max int | ||
} | ||
|
||
func NewPoRepTask(db *harmonydb.DB, api PoRepAPI, sp *SealPoller, sc *lpffi.SealCalls, maxPoRep int) *PoRepTask { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please comment what I need one of these for.
} | ||
|
||
func (s *SubmitPrecommitTask) CanAccept(ids []harmonytask.TaskID, engine *harmonytask.TaskEngine) (*harmonytask.TaskID, error) { | ||
id := ids[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These CanAccept()s are not picky enough to "enable everywhere and see what happens".
Can you add a bug+TODO on them?
It would be nice to default-enable most jobs in the future and have the system balance itself with CanAccept rather than hand-tuning.
We should be thinking about where on disk the various dependencies are if we want less data movement in a world with multiple sealing machines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So only tasks which care about data locality are:
- SDR, which actually doesn't care about where it is started, because it has no large input data
- SDRTrees, which needs all the data from SDR, so that one ideally would be smart about where it runs; At minimum we should make it try to run on the same node that SDR ran on in the first place if that is possible, but even that imo shouldn't be a blocker for this PR landing
- Finalize, which really needs to run on the same machine that SDRTrees ran on (it already does that)
- MoveStorage, which needs to run on a node with long-term storage, ideally "close" to the node that SDRTrees ran on
Tasks like SubmitPrecommit and others don't even touch the data files, so they don't really need to run on any specific node.
That said it would be really nice to have some standard way in CanAccept to express some priority with which the machine "wants" to run a task, ideally without any waiting being involved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
In discussing an order of work assigned, the order of the tasks put into harmonytask.New() is the order the poller assigns the work. Therefore having very-cheap tasks first is good, but having heavier tasks in a last-to-first order ensures that work later in the pipeline gets picked-up before work earlier in the pipeline.
max int | ||
} | ||
|
||
func NewTreesTask(sp *SealPoller, db *harmonydb.DB, sc *lpffi.SealCalls, maxTrees int) *TreesTask { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please comment what it does. Could these could link to a diagram?
9e49e34
to
e3d0b21
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few little things
|
||
var configNewCmd = &cli.Command{ | ||
Name: "new-cluster", | ||
Usage: "Create new coniguration for a new cluster", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When do I use this? Where do I get SP actor addresses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now the command to do that is lotus-shed miner create
, I do want to create a separate tool for managing the miner actor, so then it will become sptool create
The commands that will live in sptool
(vs curio
/ lotus-provider
) would be the ones which interact with the miner actor, so things like create
, managing balances, owner/control addresses, and other on-chain actions could be grouped in that binary.
(I just don't want to do all this in this PR, it's way too big already)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets refer to an open bug here with these goals in mind, that way it has a continue story for watchers.
} | ||
} | ||
|
||
if cctx.Bool("overwrite") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is a new cluster, how about we encourage "base"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong opinion on this one, want to handle this one given that you're also handling the other onboarding paths? (lets us not block this PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm handling this elsewhere. Just add a few comments on what you're expectations are so I can do the right thing when I integrate this code with the other migration.
|
||
var actorSetPeeridCmd = &cli.Command{ | ||
Name: "set-peer-id", | ||
Usage: "set the peer id of your miner", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the peer ID used for? Should we mention that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is market node libp2p peer ID, which is where you send deal proposals and generally talk with the miner node.
Belongs in sptool
is the future, when we build that tool (also imo this should just be automagically called by boost / market node on setup, but it's not..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This is just a basic lotus-shed tool, those are really only meant for developer use, this one I just needed for devnet boost testing)
|
||
// If 'active' is nil, initiate the HTTP request | ||
if u.active == nil { | ||
resp, err := http.Get(u.Url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we have some kind of recovery here for over-the-Internet flakiness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
range request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can swap this for a more robust http requester later. I have one in RIBS, and we can borrow at least some of it.
That said currently the lotus-shed boost proxy command won't support interrupted streams (that is a problem and it definitely needs to be fixed, but may require boost modifications), and currently there isn't a good way to start deals with data fetched straight from the internet.
I wouldn't block this PR on this for now, but it should be fixed together with improved boost<->curio adapter code.
id) | ||
if err != nil { | ||
//return xerrors.Errorf("updating storage health in DB fails with err: %v", err) | ||
if harmonydb.IsErrSerialization(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this replaced by the serialization option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I wrote this code before adding the option; Will change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this is on db.Exec, not transaction; Should be done, but can be a follow up PR
Related Issues
Proposed Changes
Additional Info
This will be a longer-running branch for the SDR sealing pipeline in lotus-provider
TODO:
Checklist
Before you mark the PR ready for review, please make sure that:
<PR type>: <area>: <change being made>
fix: mempool: Introduce a cache for valid signatures
PR type
: fix, feat, build, chore, ci, docs, perf, refactor, revert, style, testarea
, e.g. api, chain, state, market, mempool, multisig, networking, paych, proving, sealing, wallet, deps