feat: implement dag import/export #3728

rvagg · 2021-06-28T06:15:09Z

New version of #2953
Fixes: #2745

WIP, just export implemented to far, no tests yet.

It has to lean on the new multiformats and I've pulled in the new IPLD codecs too in anticipation of #3556 eventually sorting out the slight dependency bloat introduced by this (i.e. we have both versions of the codecs in this branch for ipfs-cli).

rvagg · 2021-06-28T13:30:17Z

added ipfs dag import [path...] functionality too

still no tests yet, though it all works according to basic manual testing.

rvagg · 2021-06-29T12:55:38Z

@achingbrain can I get some advice on this please? I tried to mirror the go-ipfs approach export and import are "commands" rather than "core" API features, so there's more logic in here that it doesn't get to defer to "core". But I think go-ipfs uses that command infrastructure for both CLI and server. But it looks like we have two separate systems for CLI and server and everything gets implemented in 3 places - cli & http-server, and http-client, and a lot of it defers to "core", so 4 or 5 places, with grpc probably doing another one?

Maybe I should be implementing this somewhere else and deferring the input and output streams somehow? import accepts either stdin or 1 or more files. export streams to stdout (but I wouldn't mind adding an --output which go-ipfs should have got). We probably need these things wired up through to server and even http client, but that leaves the question of where this functionality needs to live and how many places it needs to be wired up to and I'm not familiar enough with the intended architecture here.

I have 8 test fixtures totalling 2.9M (1.1M if I compress them) for this feature, but I'm noticing that ipfs-cli relies almost exclusively on mocks to do its testing, which is probably not ideal for these. So I'll need suggestions on where all that should go and examples for how to set things up for repetitive runs against reset block stores.

achingbrain · 2021-06-29T19:00:54Z

@rvagg typically you'd add functionality to ipfs-core and expose it in ipfs-cli and ipfs-http-server. I think we'd want to expose this over the HTTP API same as go-ipfs so we can test the http client against it, so it would need to go into core instead of being in the cli only.

The testing strategy - what sort of tests go where and how different components are tested - is documented here: https://github.com/ipfs/js-ipfs/blob/master/docs/DEVELOPMENT.md#testing-strategy

I'm not familiar enough with the intended architecture here

There's a small world of documentation here: https://github.com/ipfs/js-ipfs/tree/master/docs

I think go-ipfs uses that command infrastructure for both CLI and server

Yes, go-ipfs generates it's CLI and HTTP API from the same code whereas for better or for worse, js-ipfs has a more manual process.

I have 8 test fixtures totalling 2.9M (1.1M if I compress them) for this feature

Ouch, can these be generated on the fly by the tests?

rvagg · 2021-06-30T07:02:39Z

@achingbrain the other dilemma I have here is the multiformats & ipld codec stack. @ipld/car uses multiformats and @ipld/dag-cbor itself already. I also need to do storage of raw bytes of blocks, plus traverse their decoded forms to gather the full DAG to export. The tools to get { cid, bytes, links } (as per this PR as it is now) from a block is easier with the new stack than it is with ipld which wants to turn a cid into a node, which is great for getting links but bytes gets discarded along the way so you have to go fishing for them elsewhere (or reimplement bits of ipld to do it).

So the choice is something like this:

Code against the existing ipld stack but suffer only the multiformats and @ipld/dag-cbor bundle expansion costs and have overly complicated code that will be awkward to undo if/when we get a full multiformats upgrade here.
Code against the new ipld stack and pull in the full set of codecs and suffer the bundle expansion costs of all of that (not massive, but not trivial), have slightly more straightforward code and be more future-proof if/when we get a full multiformats upgrade.

That might depend on timing for said upgrade so I'd appreciate your input on this.

Re test fixtures - I was wanting to import the ones that go-ipfs is using as they are, so we ensure both that the feature works properly and we have some level of parity with go-ipfs for it.

rvagg · 2021-07-02T13:24:40Z

Status: dag export and dag import are implemented in core, cli, http-client and http-server. Minimal documentation and no tests. I've tested export manually in all of the places and it works like a charm. I've only tested import on the CLI, it's a bit advanced to set up a test with HTTP and I have run out of time this week.

Still needed: unit tests in cli, http-{client,server}, and some integration tests to match go-ipfs, plus it needs some additional docs. I'd really like to get t0054 from sharness working here via JS, it's 2.9M (1.1M compressed) of fixtures, but it's got really nice CAR coverage for various edge cases.

achingbrain · 2021-07-20T11:53:33Z

@rvagg where does the BlockCount property come from? It doesn't appear to be in the output of the dag.import http command?

achingbrain · 2021-07-20T13:53:32Z

Ah, I see - it's in an un-merged PR to go-ipfs: ipfs/kubo#8237

…d with go-ipfs

rvagg · 2021-07-21T02:26:28Z

👌 looking really good, thanks @achingbrain
I really would like to see some test fixtures in here, though, it'd be good to figure out the best way to pull down large fixtures if it's going to be a problem for them to live in the repo. The ones @ribasushi put together for go-ipfs are pretty nice and cover the main edges of functionality for this feature.

achingbrain · 2021-07-26T09:13:20Z

Anecdotally for the fixture data the js implementation is 2-3.5x faster than go (ignoring test setup & teardown times):

$ npm run test:interface:http-go -- -t node

> ipfs@0.55.4 test:interface:http-go /Users/alex/Documents/Workspaces/ipfs/js-ipfs/packages/ipfs
> aegir test -f test/interface-http-go.js "-t" "node"

Test Node.js


  interface-ipfs-core over ipfs-http-client tests against go-ipfs
    .dag.export
      ✔ should export a car file (265ms)
      ✔ export of shuffled devnet export identical to canonical original (4412ms)
      ✔ export of shuffled testnet export identical to canonical original (32176ms)
    .dag.import
      ✔ should import a car file (488ms)
      ✔ should import a car file without pinning the roots (254ms)
      ✔ should import multiple car files (671ms)
      ✔ should import car with roots but no blocks (33776ms)
      ✔ should import lotus devnet genesis shuffled nulroot


  8 passing (1m)

vs

$ npm run test:interface:http-js -- -t node

> ipfs@0.55.4 test:interface:http-js /Users/alex/Documents/Workspaces/ipfs/js-ipfs/packages/ipfs
> aegir test -f test/interface-http-js.js "-t" "node"

Test Node.js


  interface-ipfs-core over ipfs-http-client tests against js-ipfs
    .dag.export
      ✔ should export a car file (136ms)
      ✔ export of shuffled devnet export identical to canonical original (1939ms)
      ✔ export of shuffled testnet export identical to canonical original (12438ms)
    .dag.import
      ✔ should import a car file (134ms)
      ✔ should import a car file without pinning the roots (99ms)
      ✔ should import multiple car files (197ms)
      ✔ should import car with roots but no blocks (14021ms)
      ✔ should import lotus devnet genesis shuffled nulroot


  8 passing (37s)

rvagg · 2021-07-26T10:06:00Z

@achingbrain any idea what the difference might be? could it be on the rpc side, or is this only explainable by the core impl, and perhaps block store mechanism?

Would be interesting to try this same thing from the go http api, maybe.

achingbrain · 2021-07-26T10:36:30Z

The original perf of this in js was really bad because we were doing sequential block writes in datastore-fs and also not being very clever about how we walk a DAG when pinning recursively - I changed it to do parallel writes and to not walk child nodes that we've seen before during the traversal and it got much faster.

Would be interesting to try this same thing from the go http api, maybe

If I understand you correctly, this is what the test runs above are doing - using the go http api and the js http api to run the same tests.

rvagg · 2021-07-26T10:52:18Z

well, 👌 , whatever's going on, this is pretty sweet, thanks again for picking it up!

rvagg · 2021-07-26T10:56:58Z

packages/ipfs-core/src/components/dag/export.js

+// blocks that we're OK with not inspecting for links
+/** @type {number[]} */
+const NO_LINKS_CODECS = [
+  dagCbor.code, // CBOR


Suggested change

dagCbor.code, // CBOR

This should be 0x51 (cbor), not 0x71 (dag-cbor) like here, but it's probably best to just remove it since we don't have an official cbor encoder (yet, no big demand atm). We need to inspect links of dagCbor blocks, since it has links, but cbor blocks don't (or .. maybe they could do, 🤷, it's not really clear what "cbor" means and its scope, but at least in Go we're going with super-minimal cbor with no tags or extras).

Ah, that makes sense, I did wonder.

rvagg · 2021-07-26T11:01:17Z

packages/ipfs-core/src/components/dag/export.js

+  const codec = await codecs.getCodec(cid.code)
+
+  if (codec) {
+    const block = Block.createUnsafe({ bytes, cid, codec })


One difference we might find with Go is that they're going to be obsessively checking CIDs are correct for the bytes whenever they're loaded, at least in cases like this. The Unsafe is giving us a shortcut, basically saying we don't care if it doesn't match, and in fact we probably trust the underlying source of blocks to have already done that check. The new go-ipld-prime integration work is providing some opportunities for "trusted" data sources that skip these checks too so they may get slight perf improvements.

In this case I don't think we need to as the blockstore creates the path to the block from the multihash extracted from the CID. I suppose you could mess around with the fs/level backing store to change the contents of the buffer stored at the relevant path but I think solving that problem is outside the scope of dag export.

Re-hashing/verifying blocks on every export would be quite expensive, definitely something we should put behind a flag if people need those kind of guarantees.

go-ipfs has a ipfs repo verify command which I guess would load every block in the repo and rehash it to ensure we've got the right bytes, there's no equivalent in js land yet.

related: the old Datastore.HashOnRead option that is false by default, due to the perf hit and "i trust my datastore" assumption https://github.com/ipfs/go-ipfs/blob/master/docs/config.md#datastorehashonread

Go will be getting the same treatment when the ipld-prime work makes its way through: https://github.com/ipfs/go-fetcher/blob/64b1f390e7ae13d96494f15a10e07527369a521d/impl/blockservice/fetcher.go#L45

I'm pretty sure (but far from certain) that any time you interact with an IPLD Node at the moment it does the whole-hog of verification and you can't bypass it. In fact there's a lot of places where you can't even avoid decoding the bytes even if you just want the bytes. Things should get a little more sophisticated as ipld-prime moves through.

…o follow links for dag-cbor

rvagg · 2021-07-27T11:29:20Z

🎉

Adds `ipfs.dag.import` and `ipfs.dag.export` commands to import/export CAR files, e.g. single-file archives that contain blocks and root CIDs. Supersedes #2953 Fixes #2745 Co-authored-by: achingbrain <alex@achingbrain.net>

feat: implement ipfs dag export <root>

0bb4104

rvagg changed the title ~~feat: implement ipfs dag export <root>~~ feat: implement dag import/export Jun 28, 2021

fix: types, make tests pass, don't allow partial DAG

de2ab6e

feat: implement ipfs dag import [path...]

cf772c9

rvagg force-pushed the rvagg/import-export branch from 3da1fee to cf772c9 Compare June 29, 2021 04:27

rvagg added 3 commits June 30, 2021 23:10

chore: migrate dag export from ipfs-cli to ipfs-core

d0eeaf1

feat: dag/export for http-{client,server}

8d30a68

feat: dag import in core, cli, http client and http server

ff6c992

achingbrain added 3 commits July 19, 2021 14:48

Merge remote-tracking branch 'origin/master' into rvagg/import-export

e863378

chore: fix linting and types

133e39e

chore: remove unused deps

3a0eb1e

achingbrain added 3 commits July 20, 2021 17:35

chore: add tests, back out block count until common approach is agree…

58f41bf

…d with go-ipfs

chore: skip dag import/export tests for message port client

95c79d0

chore: add docs

d1c0bc3

achingbrain added 2 commits July 23, 2021 18:56

chore: add fixture car files and start porting sharness test cases

be3cb5c

chore: fix tests

64e8c4a

chore: add more interface tests and unit tests for cli and http api

08f5a2e

achingbrain marked this pull request as ready for review July 26, 2021 10:37

rvagg commented Jul 26, 2021

View reviewed changes

achingbrain added 5 commits July 26, 2021 12:22

chore: fix failing test

1614c8f

chore: remove dag-cbor from codecs we do not follow links for as we d…

668d4b3

…o follow links for dag-cbor

chore: output order is not guarenteed

f844ad4

chore: fix tests

dad6878

chore: revert streaming response refactor

7ba22a2

achingbrain merged commit 700765b into master Jul 27, 2021

achingbrain deleted the rvagg/import-export branch July 27, 2021 08:01

This was referenced Sep 13, 2021

[Snyk] Fix for 1 vulnerabilities baby636/js-ipfs#29

Open

[Snyk] Fix for 1 vulnerabilities baby636/js-ipfs#32

Open

[Snyk] Fix for 1 vulnerabilities baby636/js-ipfs#34

Open

This was referenced Oct 15, 2021

[Snyk] Fix for 1 vulnerabilities baby636/js-ipfs#44

Open

[Snyk] Fix for 1 vulnerabilities baby636/js-ipfs#49

Open

[Snyk] Fix for 1 vulnerabilities baby636/js-ipfs#58

Open

This was referenced Feb 7, 2022

[Snyk] Fix for 2 vulnerabilities baby636/js-ipfs#195

Open

[Snyk] Fix for 2 vulnerabilities baby636/js-ipfs#198

Open

[Snyk] Fix for 2 vulnerabilities baby636/js-ipfs#199

Open

yng3 mentioned this pull request Jul 6, 2023

[Snyk] Fix for 1 vulnerabilities yng3/js-ipfs#42

Open

baby636 mentioned this pull request Jul 6, 2023

[Snyk] Fix for 1 vulnerabilities baby636/js-ipfs#628

Open

yng3 mentioned this pull request Jul 7, 2023

[Snyk] Fix for 1 vulnerabilities yng3/js-ipfs#61

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement dag import/export #3728

feat: implement dag import/export #3728

rvagg commented Jun 28, 2021

rvagg commented Jun 28, 2021

rvagg commented Jun 29, 2021

achingbrain commented Jun 29, 2021

rvagg commented Jun 30, 2021

rvagg commented Jul 2, 2021

achingbrain commented Jul 20, 2021

achingbrain commented Jul 20, 2021

rvagg commented Jul 21, 2021

achingbrain commented Jul 26, 2021 •

edited

Loading

rvagg commented Jul 26, 2021

achingbrain commented Jul 26, 2021

rvagg commented Jul 26, 2021

rvagg Jul 26, 2021

achingbrain Jul 26, 2021

rvagg Jul 26, 2021

achingbrain Jul 26, 2021

olizilla Jul 28, 2021

rvagg Jul 28, 2021

rvagg commented Jul 27, 2021

feat: implement dag import/export #3728

feat: implement dag import/export #3728

Conversation

rvagg commented Jun 28, 2021

rvagg commented Jun 28, 2021

rvagg commented Jun 29, 2021

achingbrain commented Jun 29, 2021

rvagg commented Jun 30, 2021

rvagg commented Jul 2, 2021

achingbrain commented Jul 20, 2021

achingbrain commented Jul 20, 2021

rvagg commented Jul 21, 2021

achingbrain commented Jul 26, 2021 • edited Loading

rvagg commented Jul 26, 2021

achingbrain commented Jul 26, 2021

rvagg commented Jul 26, 2021

rvagg Jul 26, 2021

Choose a reason for hiding this comment

achingbrain Jul 26, 2021

Choose a reason for hiding this comment

rvagg Jul 26, 2021

Choose a reason for hiding this comment

achingbrain Jul 26, 2021

Choose a reason for hiding this comment

olizilla Jul 28, 2021

Choose a reason for hiding this comment

rvagg Jul 28, 2021

Choose a reason for hiding this comment

rvagg commented Jul 27, 2021

achingbrain commented Jul 26, 2021 •

edited

Loading