Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

standalone binary for chain export and inspection #3594

Merged
merged 15 commits into from Nov 14, 2019
Merged

Conversation

frrist
Copy link
Member

@frrist frrist commented Oct 25, 2019

Motivation

Create a standalone binary for exporting a blockchain from a go-filecoin repo to a car file.

@anorth
Copy link
Member

anorth commented Oct 27, 2019

This appears to duplicate a lot of code that already exists for traversing and exporting. The intention of "standalone" is not that it doesn't depend on existing code, just that it can run without requiring a daemon.

For #3555 the head/s must be read from the chain store. The CLI argument should be optional, defaulting to whatever the chain store has stored as the head (and later when 3555 is addressed, all of the heads).

@frrist
Copy link
Member Author

frrist commented Oct 28, 2019

Not sure what happened with CI here, going to ignore it until I moved this from draft to "ready".

# pkg-config --cflags  -- /go/src/github.com/filecoin-project/go-filecoin/go-bls-sigs/libbls_signatures.pc
Failed to open '/go/src/github.com/filecoin-project/go-filecoin/go-bls-sigs/libbls_signatures.pc': No such file or directory
No package '/go/src/github.com/filecoin-project/go-filecoin/go-bls-sigs/libbls_signatures.pc' found
pkg-config: exit status 1
# pkg-config --cflags  -- /go/src/github.com/filecoin-project/go-filecoin/go-sectorbuilder/sector_builder_ffi.pc /go/src/github.com/filecoin-project/go-filecoin/go-sectorbuilder/sector_builder_ffi.pc
Failed to open '/go/src/github.com/filecoin-project/go-filecoin/go-sectorbuilder/sector_builder_ffi.pc': No such file or directory
No package '/go/src/github.com/filecoin-project/go-filecoin/go-sectorbuilder/sector_builder_ffi.pc' found
Failed to open '/go/src/github.com/filecoin-project/go-filecoin/go-sectorbuilder/sector_builder_ffi.pc': No such file or directory
No package '/go/src/github.com/filecoin-project/go-filecoin/go-sectorbuilder/sector_builder_ffi.pc' found
pkg-config: exit status 1
FAIL	github.com/filecoin-project/go-filecoin/actor [build failed]
FAIL	github.com/filecoin-project/go-filecoin/actor/builtin/account [build failed]
FAIL	github.com/filecoin-project/go-filecoin/actor/builtin/miner [build failed]
FAIL	github.com/filecoin-project/go-filecoin/actor/builtin/storagemarket [build failed]
FAIL	github.com/filecoin-project/go-filecoin/block [build failed]

@frrist frrist changed the title [wip] standalone binary for chain export and inspection standalone binary for chain export and inspection Oct 31, 2019
@frrist frrist marked this pull request as ready for review October 31, 2019 01:40
Comment on lines 103 to 102
func newChainStateCollector(dserv format.DAGService) *chainStateCollector {
return &chainStateCollector{
dagserv: dserv,
}
}

type chainStateCollector struct {
dagserv format.DAGService // Provides access to state tree.
state []format.Node
}

// collectState recursively walks the state tree starting with `stateRoot` and returns it as a slice of IPLD nodes.
// Calling this method does not have any side effects.
func (csc *chainStateCollector) collectState(ctx context.Context, stateRoot cid.Cid) ([]format.Node, error) {
dagNd, err := csc.dagserv.Get(ctx, stateRoot)
if err != nil {
return nil, errors.Wrapf(err, "failed to load stateroot from dagservice %s", stateRoot)
}
csc.addState(dagNd)
seen := cid.NewSet()
for _, l := range dagNd.Links() {
if err := dag.Walk(ctx, csc.getLinks, l.Cid, seen.Visit); err != nil {
return nil, errors.Wrapf(err, "dag service failed walking stateroot %s", stateRoot)
}
}
return csc.state, nil

}

func (csc *chainStateCollector) getLinks(ctx context.Context, c cid.Cid) ([]*format.Link, error) {
nd, err := csc.dagserv.Get(ctx, c)
if err != nil {
return nil, errors.Wrapf(err, "failed to load link from dagservice %s", c)
}
csc.addState(nd)
return nd.Links(), nil

}

func (csc *chainStateCollector) addState(nd format.Node) {
csc.state = append(csc.state, nd)
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy-Pasta from here
https://github.com/filecoin-project/go-filecoin/blob/4f71dca6ec0a18e3b19445c39ef06033e51a859e/internal/app/go-filecoin/plumbing/cst/chain_state.go#L233-L277

I'm not happy about this code duplication, but it seems better than exporting newChainStateCollector from chain_state.go, thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on why importing from chain_state.go is worse? Without your context importing seems like the better choice. You could put this type in the chain package or in a new ipld/dag or the util package if you don't want to export free functions from plumbing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly because I didn't like the idea of exporting a free function from plumbing. I will move this into dag plumbing exposed as a RecursiveGet method.

@codecov-io
Copy link

codecov-io commented Oct 31, 2019

Codecov Report

Merging #3594 into master will decrease coverage by <1%.
The diff coverage is 8%.

@@          Coverage Diff           @@
##           master   #3594   +/-   ##
======================================
- Coverage      47%     47%   -1%     
======================================
  Files         270     270           
  Lines       17259   17281   +22     
======================================
  Hits         8151    8151           
- Misses       8001    8023   +22     
  Partials     1107    1107

go.mod Outdated
@@ -62,7 +60,6 @@ require (
github.com/libp2p/go-libp2p-circuit v0.1.3
github.com/libp2p/go-libp2p-core v0.2.4-0.20191022172111-a5bf2487c11d
github.com/libp2p/go-libp2p-kad-dht v0.1.1
github.com/libp2p/go-libp2p-peer v0.2.0
github.com/libp2p/go-libp2p-peerstore v0.1.3
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

red in this file is from running go mod tidy

Copy link
Contributor

@ZenGround0 ZenGround0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. I want this tool to do more to prevent accidental user datastore corruption.

log.Infof("opening filecoin datastore: %s", repoPath)
badgerPath := filepath.Join(repoPath, "badger/")
log.Infof("opening badger datastore: %s", badgerPath)
badgerDS, err := badgerds.NewDatastore(badgerPath, badgerOpt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are not using the go-filecoin/repo abstraction there doesn't seem to be any protection against multiple processes obtaining a handle to a datastore wrapping the same path.

I'm a bit worried that this could lead to accidental corruption if someone runs this tool while a daemon is running.

Lowest effort solution is to include an all CAPs user facing warning somewhere.

I think "the right thing" here is to open up a go-filecoin repo abstraction and then grab the datastores with Datastore and ChainDatastore. Even if we're lucky and concurrent read-read and read-write works for badger right now it adds a hidden requirement to our datastore choice that could erode with time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are not using the go-filecoin/repo abstraction there doesn't seem to be any protection against multiple processes obtaining a handle to a datastore wrapping the same path.

Since we are using badger to open the repo it will fail to open if a repo lock exists, from the linked README.md:

Please note that Badger obtains a lock on the directories so multiple processes cannot open the same database at the same time.

Here is what a user will see if they attempt to run this binary on a repo that currently has a lock held by another (go-filecoin) process.

./chain-util export --repo=/home/frrist/.filecoin/ --out=/tmp/test.car
2019-11-11T11:25:10.586-0800    INFO    chain-util/export       export/export.go:44     opening repo: /home/frrist/.filecoin/
2019-11-11T11:25:10.586-0800    WARN    chain-util      chain-util/main.go:75   Cannot acquire directory lock on "/home/frrist/.filecoin/badger".  Another process is using this Badger database.: resource temporarily unavailable

Also note that we are opening the repo in read-only mode:
https://github.com/filecoin-project/go-filecoin/pull/3594/files#diff-2bbea98c52d3e94072ef6b7ba1a9250dR27-R29

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good! It moves this from a correctness to a bitrot thing. It still doesn't address this part of the comment:

Even if we're lucky and concurrent read-read and read-write works for badger right now it adds a hidden requirement to our datastore choice that could erode with time.

Can we assume that all ipfs datastore implementations MUST repo lock? If we can, great. If not the moment we pivot to another datastore or we open up user configured datastores we/users are going to need to satisfy a hidden requirement for interop with this tool which means there's some odds somebody won't satisfy this requirement and they'll be left wondering why the tool is corrupting their datastore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grabbing the filecoin repo lock enforces safety across all future datastore implementations.

Copy link
Member Author

@frrist frrist Nov 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I might have misunderstood the initial comment, let me try again.

Since we are not using the go-filecoin/repo abstraction

Meaning since we are not calling the go-filecoin method OpenFSRepo, correct?

doesn't seem to be any protection against multiple processes obtaining a handle to a datastore wrapping the same path

There is protection against multiple processes getting a handle to a datastore (as I have shown this is provided by the badger datastore package), however since go-filecoin creates is own repo lock and since we are not checking for it to exists here, we are left open to possible corruption in the event that we change our datastore/repo implementation. Do I understand this correctly?

Is the ask that a check for filecoin repo lock is added? (as is done here)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is all exactly what I'm thinking. If we use openfsrepo we get the guarantee of go-filecoin repo locking preventing contention with other software that opens the repo (go-filecoin, the migration tool...) regardless of what datastore we use.

Originally I wasn't sure if badger had its own protection which is why I blocked approval and why I approved when you told me that it does.

Comment on lines 103 to 102
func newChainStateCollector(dserv format.DAGService) *chainStateCollector {
return &chainStateCollector{
dagserv: dserv,
}
}

type chainStateCollector struct {
dagserv format.DAGService // Provides access to state tree.
state []format.Node
}

// collectState recursively walks the state tree starting with `stateRoot` and returns it as a slice of IPLD nodes.
// Calling this method does not have any side effects.
func (csc *chainStateCollector) collectState(ctx context.Context, stateRoot cid.Cid) ([]format.Node, error) {
dagNd, err := csc.dagserv.Get(ctx, stateRoot)
if err != nil {
return nil, errors.Wrapf(err, "failed to load stateroot from dagservice %s", stateRoot)
}
csc.addState(dagNd)
seen := cid.NewSet()
for _, l := range dagNd.Links() {
if err := dag.Walk(ctx, csc.getLinks, l.Cid, seen.Visit); err != nil {
return nil, errors.Wrapf(err, "dag service failed walking stateroot %s", stateRoot)
}
}
return csc.state, nil

}

func (csc *chainStateCollector) getLinks(ctx context.Context, c cid.Cid) ([]*format.Link, error) {
nd, err := csc.dagserv.Get(ctx, c)
if err != nil {
return nil, errors.Wrapf(err, "failed to load link from dagservice %s", c)
}
csc.addState(nd)
return nd.Links(), nil

}

func (csc *chainStateCollector) addState(nd format.Node) {
csc.state = append(csc.state, nd)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on why importing from chain_state.go is worse? Without your context importing seems like the better choice. You could put this type in the chain package or in a new ipld/dag or the util package if you don't want to export free functions from plumbing.

return chain.Export(ctx, ce.Head, ce, msgStore, ce, ce.out)
}

// GetTipSet gets all the TipSet for a given TipSetKey from the ChainExporter blockstore.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: gets the TipSet

@ZenGround0 ZenGround0 self-requested a review November 11, 2019 21:51
Copy link
Contributor

@ZenGround0 ZenGround0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving approved but strong preference for grabbing filecoin repo lock for reasons provided.

@frrist frrist merged commit 47a315e into master Nov 14, 2019
@frrist frrist deleted the feat/chain-export-bin branch November 14, 2019 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants