New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
standalone binary for chain export and inspection #3594
Conversation
e7a7047
to
fc46032
Compare
fc46032
to
5f00505
Compare
This appears to duplicate a lot of code that already exists for traversing and exporting. The intention of "standalone" is not that it doesn't depend on existing code, just that it can run without requiring a daemon. For #3555 the head/s must be read from the chain store. The CLI argument should be optional, defaulting to whatever the chain store has stored as the head (and later when 3555 is addressed, all of the heads). |
Not sure what happened with CI here, going to ignore it until I moved this from draft to "ready".
|
5f00505
to
6b51956
Compare
func newChainStateCollector(dserv format.DAGService) *chainStateCollector { | ||
return &chainStateCollector{ | ||
dagserv: dserv, | ||
} | ||
} | ||
|
||
type chainStateCollector struct { | ||
dagserv format.DAGService // Provides access to state tree. | ||
state []format.Node | ||
} | ||
|
||
// collectState recursively walks the state tree starting with `stateRoot` and returns it as a slice of IPLD nodes. | ||
// Calling this method does not have any side effects. | ||
func (csc *chainStateCollector) collectState(ctx context.Context, stateRoot cid.Cid) ([]format.Node, error) { | ||
dagNd, err := csc.dagserv.Get(ctx, stateRoot) | ||
if err != nil { | ||
return nil, errors.Wrapf(err, "failed to load stateroot from dagservice %s", stateRoot) | ||
} | ||
csc.addState(dagNd) | ||
seen := cid.NewSet() | ||
for _, l := range dagNd.Links() { | ||
if err := dag.Walk(ctx, csc.getLinks, l.Cid, seen.Visit); err != nil { | ||
return nil, errors.Wrapf(err, "dag service failed walking stateroot %s", stateRoot) | ||
} | ||
} | ||
return csc.state, nil | ||
|
||
} | ||
|
||
func (csc *chainStateCollector) getLinks(ctx context.Context, c cid.Cid) ([]*format.Link, error) { | ||
nd, err := csc.dagserv.Get(ctx, c) | ||
if err != nil { | ||
return nil, errors.Wrapf(err, "failed to load link from dagservice %s", c) | ||
} | ||
csc.addState(nd) | ||
return nd.Links(), nil | ||
|
||
} | ||
|
||
func (csc *chainStateCollector) addState(nd format.Node) { | ||
csc.state = append(csc.state, nd) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy-Pasta from here
https://github.com/filecoin-project/go-filecoin/blob/4f71dca6ec0a18e3b19445c39ef06033e51a859e/internal/app/go-filecoin/plumbing/cst/chain_state.go#L233-L277
I'm not happy about this code duplication, but it seems better than exporting newChainStateCollector
from chain_state.go, thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate on why importing from chain_state.go is worse? Without your context importing seems like the better choice. You could put this type in the chain package or in a new ipld
/dag
or the util package if you don't want to export free functions from plumbing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly because I didn't like the idea of exporting a free function from plumbing. I will move this into dag plumbing exposed as a RecursiveGet
method.
Codecov Report
@@ Coverage Diff @@
## master #3594 +/- ##
======================================
- Coverage 47% 47% -1%
======================================
Files 270 270
Lines 17259 17281 +22
======================================
Hits 8151 8151
- Misses 8001 8023 +22
Partials 1107 1107 |
6b51956
to
57f83ea
Compare
go.mod
Outdated
@@ -62,7 +60,6 @@ require ( | |||
github.com/libp2p/go-libp2p-circuit v0.1.3 | |||
github.com/libp2p/go-libp2p-core v0.2.4-0.20191022172111-a5bf2487c11d | |||
github.com/libp2p/go-libp2p-kad-dht v0.1.1 | |||
github.com/libp2p/go-libp2p-peer v0.2.0 | |||
github.com/libp2p/go-libp2p-peerstore v0.1.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
red in this file is from running go mod tidy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. I want this tool to do more to prevent accidental user datastore corruption.
log.Infof("opening filecoin datastore: %s", repoPath) | ||
badgerPath := filepath.Join(repoPath, "badger/") | ||
log.Infof("opening badger datastore: %s", badgerPath) | ||
badgerDS, err := badgerds.NewDatastore(badgerPath, badgerOpt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are not using the go-filecoin/repo abstraction there doesn't seem to be any protection against multiple processes obtaining a handle to a datastore wrapping the same path.
I'm a bit worried that this could lead to accidental corruption if someone runs this tool while a daemon is running.
Lowest effort solution is to include an all CAPs user facing warning somewhere.
I think "the right thing" here is to open up a go-filecoin repo abstraction and then grab the datastores with Datastore
and ChainDatastore
. Even if we're lucky and concurrent read-read and read-write works for badger right now it adds a hidden requirement to our datastore choice that could erode with time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are not using the go-filecoin/repo abstraction there doesn't seem to be any protection against multiple processes obtaining a handle to a datastore wrapping the same path.
Since we are using badger to open the repo it will fail to open if a repo lock exists, from the linked README.md:
Please note that Badger obtains a lock on the directories so multiple processes cannot open the same database at the same time.
Here is what a user will see if they attempt to run this binary on a repo that currently has a lock held by another (go-filecoin) process.
./chain-util export --repo=/home/frrist/.filecoin/ --out=/tmp/test.car
2019-11-11T11:25:10.586-0800 INFO chain-util/export export/export.go:44 opening repo: /home/frrist/.filecoin/
2019-11-11T11:25:10.586-0800 WARN chain-util chain-util/main.go:75 Cannot acquire directory lock on "/home/frrist/.filecoin/badger". Another process is using this Badger database.: resource temporarily unavailable
Also note that we are opening the repo in read-only mode:
https://github.com/filecoin-project/go-filecoin/pull/3594/files#diff-2bbea98c52d3e94072ef6b7ba1a9250dR27-R29
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's good! It moves this from a correctness to a bitrot thing. It still doesn't address this part of the comment:
Even if we're lucky and concurrent read-read and read-write works for badger right now it adds a hidden requirement to our datastore choice that could erode with time.
Can we assume that all ipfs datastore implementations MUST repo lock? If we can, great. If not the moment we pivot to another datastore or we open up user configured datastores we/users are going to need to satisfy a hidden requirement for interop with this tool which means there's some odds somebody won't satisfy this requirement and they'll be left wondering why the tool is corrupting their datastore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grabbing the filecoin repo lock enforces safety across all future datastore implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I might have misunderstood the initial comment, let me try again.
Since we are not using the go-filecoin/repo abstraction
Meaning since we are not calling the go-filecoin method OpenFSRepo, correct?
doesn't seem to be any protection against multiple processes obtaining a handle to a datastore wrapping the same path
There is protection against multiple processes getting a handle to a datastore (as I have shown this is provided by the badger datastore package), however since go-filecoin creates is own repo lock and since we are not checking for it to exists here, we are left open to possible corruption in the event that we change our datastore/repo implementation. Do I understand this correctly?
Is the ask that a check for filecoin repo lock is added? (as is done here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is all exactly what I'm thinking. If we use openfsrepo we get the guarantee of go-filecoin repo locking preventing contention with other software that opens the repo (go-filecoin, the migration tool...) regardless of what datastore we use.
Originally I wasn't sure if badger had its own protection which is why I blocked approval and why I approved when you told me that it does.
func newChainStateCollector(dserv format.DAGService) *chainStateCollector { | ||
return &chainStateCollector{ | ||
dagserv: dserv, | ||
} | ||
} | ||
|
||
type chainStateCollector struct { | ||
dagserv format.DAGService // Provides access to state tree. | ||
state []format.Node | ||
} | ||
|
||
// collectState recursively walks the state tree starting with `stateRoot` and returns it as a slice of IPLD nodes. | ||
// Calling this method does not have any side effects. | ||
func (csc *chainStateCollector) collectState(ctx context.Context, stateRoot cid.Cid) ([]format.Node, error) { | ||
dagNd, err := csc.dagserv.Get(ctx, stateRoot) | ||
if err != nil { | ||
return nil, errors.Wrapf(err, "failed to load stateroot from dagservice %s", stateRoot) | ||
} | ||
csc.addState(dagNd) | ||
seen := cid.NewSet() | ||
for _, l := range dagNd.Links() { | ||
if err := dag.Walk(ctx, csc.getLinks, l.Cid, seen.Visit); err != nil { | ||
return nil, errors.Wrapf(err, "dag service failed walking stateroot %s", stateRoot) | ||
} | ||
} | ||
return csc.state, nil | ||
|
||
} | ||
|
||
func (csc *chainStateCollector) getLinks(ctx context.Context, c cid.Cid) ([]*format.Link, error) { | ||
nd, err := csc.dagserv.Get(ctx, c) | ||
if err != nil { | ||
return nil, errors.Wrapf(err, "failed to load link from dagservice %s", c) | ||
} | ||
csc.addState(nd) | ||
return nd.Links(), nil | ||
|
||
} | ||
|
||
func (csc *chainStateCollector) addState(nd format.Node) { | ||
csc.state = append(csc.state, nd) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate on why importing from chain_state.go is worse? Without your context importing seems like the better choice. You could put this type in the chain package or in a new ipld
/dag
or the util package if you don't want to export free functions from plumbing.
return chain.Export(ctx, ce.Head, ce, msgStore, ce, ce.out) | ||
} | ||
|
||
// GetTipSet gets all the TipSet for a given TipSetKey from the ChainExporter blockstore. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: gets the TipSet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving approved but strong preference for grabbing filecoin repo lock for reasons provided.
to be called by chain_state plumbing
20bd263
to
bdaec20
Compare
Motivation
Create a standalone binary for exporting a blockchain from a go-filecoin repo to a car file.