Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pinning new cbor object doesn't appear to work #3570

Closed
ianopolous opened this issue Jan 6, 2017 · 88 comments
Closed

Pinning new cbor object doesn't appear to work #3570

ianopolous opened this issue Jan 6, 2017 · 88 comments
Labels
kind/bug A bug in existing code (including security flaws) topic/repo Topic repo
Milestone

Comments

@ianopolous
Copy link
Member

Version information:

go-ipfs version: 0.4.5-dev-4cb236c
Repo version: 4
System version: amd64/linux
Golang version: go1.7.1

Type:

Bug

Priority:

P0

Description:

Pinning a new cbor object created using block.put doesn't appear to work. To reproduce:

>> echo -e "\x4b\x67\x27\x64\x61\x79\x20\x49\x50\x46\x53\x21" | ipfs block put --format=cbor
zdpuAue4NBRG6ZH5M7aJvvdjdNbFkwZZCooKWM1m2faRAodRe
>> echo -e "\xd9\x01\x02\x58\x25\xa5\x03\x22\x12\x20\x65\x96\x50\xfc\x34\x43\xc9\x16\x42\x80\x48\xef\xc5\xba\x45\x58\xdc\x86\x35\x94\x98\x0a\x59\xf5\xcb\x3c\x4d\x84\x86\x7e\x6d\x31" | ipfs block put --format=cbor
zdpuApNFmG7PZ53BWxwix4HztiVDHomrvdJLTegycZb8YU5Qr
>> ipfs pin add -r zdpuApNFmG7PZ53BWxwix4HztiVDHomrvdJLTegycZb8YU5Qr
>> ipfs repo gc
>> ipfs block get zdpuApNFmG7PZ53BWxwix4HztiVDHomrvdJLTegycZb8YU5Qr
>> ipfs block get zdpuAue4NBRG6ZH5M7aJvvdjdNbFkwZZCooKWM1m2faRAodRe

The gc should NOT remove the two blocks added (it currently removes both). And the subsequent gets should succeed.
The first block is just a cbor byte array of 'gday IPFS!'
The second is just a cbor merkle link to /ipfs/zdpuAue4N...

N.B. I may not have the correct serialization for the merkle link, but as far as I can tell it is correct (a cbor tag of 258 for the multiaddr)

@ghost
Copy link

ghost commented Jan 7, 2017

Possibly related to #3453

@ghost ghost added topic/repo Topic repo kind/bug A bug in existing code (including security flaws) labels Jan 7, 2017
@ghost ghost added this to the ipfs 0.4.5 milestone Jan 7, 2017
@ghost
Copy link

ghost commented Jan 7, 2017

Also #3553

@ianopolous
Copy link
Member Author

@lgierth N.B. there are no errors returned here except for the final gets failing to find anything.

@kevina
Copy link
Contributor

kevina commented Jan 9, 2017

This could be a simple case of special cases for Protobuf nodes, in which case the fix should be simple. I'll look into it.

@kevina
Copy link
Contributor

kevina commented Jan 10, 2017

The pinning is failing for me. That is likely the problem. I am surprised it works for you:

panic: reflect.Set: value of type *cbor.CBORTag is not assignable to type map[interface {}]interface {}

However I am doing this with the daemon offline.

@kevina
Copy link
Contributor

kevina commented Jan 10, 2017

Okay, the cbor package is panicking when trying to decode the block in order to pin it. Here is the full backtrace:

panic: reflect.Set: value of type *cbor.CBORTag is not assignable to type map[interface {}]interface {}

goroutine 1 [running]:
panic(0xbc6500, 0xc4202e7860)
        /usr/local/go/src/runtime/panic.go:500 +0x1a1
reflect.Value.assignTo(0xbf88e0, 0xc420239fa0, 0x16, 0xcfef62, 0xb, 0xc04e60, 0x0, 0xc04e60, 0xbf88e0, 0xc420239fa0)
        /usr/local/go/src/reflect/value.go:2163 +0x35c
reflect.Value.Set(0xc04e60, 0xc42077a1d8, 0x195, 0xbf88e0, 0xc420239fa0, 0x16)
        /usr/local/go/src/reflect/value.go:1333 +0xa4
gx/ipfs/QmPL3RCWaM6s7b82LSLS1MGX2jpxPxA1v2vmgLm15b1NcW/cbor/go.(*reflectValue).SetTag(0xc420239f80, 0x102, 0x1152d40, 0xc420239fc0, 0x0, 0x0, 0xbf88e0, 0xc420239fa0, 0x0, 0x0)
        /home/kevina/gocode2/src/gx/ipfs/QmPL3RCWaM6s7b82LSLS1MGX2jpxPxA1v2vmgLm15b1NcW/cbor/go/cbor.go:1084 +0x100
gx/ipfs/QmPL3RCWaM6s7b82LSLS1MGX2jpxPxA1v2vmgLm15b1NcW/cbor/go.(*Decoder).innerDecodeC(0xc4201396b8, 0x1152d40, 0xc420239f80, 0xd9, 0x1, 0x1)
        /home/kevina/gocode2/src/gx/ipfs/QmPL3RCWaM6s7b82LSLS1MGX2jpxPxA1v2vmgLm15b1NcW/cbor/go/cbor.go:408 +0xf20
gx/ipfs/QmPL3RCWaM6s7b82LSLS1MGX2jpxPxA1v2vmgLm15b1NcW/cbor/go.(*Decoder).DecodeAny(0xc4201396b8, 0x1152d40, 0xc420239f80, 0xc42077a1d8, 0x16)
        /home/kevina/gocode2/src/gx/ipfs/QmPL3RCWaM6s7b82LSLS1MGX2jpxPxA1v2vmgLm15b1NcW/cbor/go/cbor.go:235 +0xc2
gx/ipfs/QmPL3RCWaM6s7b82LSLS1MGX2jpxPxA1v2vmgLm15b1NcW/cbor/go.(*Decoder).Decode(0xc4201396b8, 0xba7120, 0xc42077a1d8, 0x0, 0xc42004b140)
        /home/kevina/gocode2/src/gx/ipfs/QmPL3RCWaM6s7b82LSLS1MGX2jpxPxA1v2vmgLm15b1NcW/cbor/go/cbor.go:125 +0xb4
gx/ipfs/QmPL3RCWaM6s7b82LSLS1MGX2jpxPxA1v2vmgLm15b1NcW/cbor/go.Loads(0xc4200a26c0, 0x2b, 0x22b, 0xba7120, 0xc42077a1d8, 0x114b760, 0xc420239f40)
        /home/kevina/gocode2/src/gx/ipfs/QmPL3RCWaM6s7b82LSLS1MGX2jpxPxA1v2vmgLm15b1NcW/cbor/go/cbor.go:80 +0x1ef
gx/ipfs/QmbuuwTd9x4NReZ7sxtiKk7wFcfDUo54MfWBdtF5MRCPGR/go-ipld-cbor.Decode(0xc4200a26c0, 0x2b, 0x22b, 0x22b, 0x114b760, 0xc420239f40)
        /home/kevina/gocode2/src/gx/ipfs/QmbuuwTd9x4NReZ7sxtiKk7wFcfDUo54MfWBdtF5MRCPGR/go-ipld-cbor/node.go:19 +0x75
github.com/ipfs/go-ipfs/merkledag.decodeBlock(0x114b760, 0xc420239f40, 0xc42007d980, 0xc42004b020, 0x114b760, 0xc420239f40)
        /home/kevina/gocode2/src/github.com/ipfs/go-ipfs/merkledag/merkledag.go:111 +0xb1
github.com/ipfs/go-ipfs/merkledag.(*dagService).Get(0xc420175000, 0x114b520, 0xc42007d980, 0xc42004b020, 0x0, 0x0, 0x0, 0x0)
        /home/kevina/gocode2/src/github.com/ipfs/go-ipfs/merkledag/merkledag.go:89 +0x297
github.com/ipfs/go-ipfs/path.(*Resolver).ResolvePathComponents(0xc4202eece0, 0x114b520, 0xc4201402c0, 0xc42007cd00, 0x37, 0x37, 0xc420139a00, 0x709461, 0x0, 0xcf7106)
        /home/kevina/gocode2/src/github.com/ipfs/go-ipfs/path/resolver.go:106 +0x17f
github.com/ipfs/go-ipfs/path.(*Resolver).ResolvePath(0xc4202eece0, 0x114b520, 0xc4201402c0, 0xc42007cd00, 0x37, 0xc42030c4e0, 0xc42007cd00, 0x37, 0x0)
        /home/kevina/gocode2/src/github.com/ipfs/go-ipfs/path/resolver.go:84 +0x7b
github.com/ipfs/go-ipfs/core.Resolve(0x114b520, 0xc4201402c0, 0x0, 0x0, 0xc4202eece0, 0xc42007cd00, 0x37, 0x41f0d8, 0x30, 0xcb99e0, ...)
        /home/kevina/gocode2/src/github.com/ipfs/go-ipfs/core/pathresolver.go:57 +0x360
github.com/ipfs/go-ipfs/core/corerepo.Pin(0xc420354180, 0x114b520, 0xc4201402c0, 0xc4202ee3a0, 0x1, 0x2, 0x1, 0x0, 0x0, 0x0, ...)
        /home/kevina/gocode2/src/github.com/ipfs/go-ipfs/core/corerepo/pinning.go:35 +0x139
github.com/ipfs/go-ipfs/core/commands.glob..func72(0x1153780, 0xc420316300, 0x1152b60, 0xc420322000)
        /home/kevina/gocode2/src/github.com/ipfs/go-ipfs/core/commands/pin.go:65 +0x208
github.com/ipfs/go-ipfs/commands.(*Command).Call(0x1230780, 0x1153780, 0xc420316300, 0x0, 0x0)
        /home/kevina/gocode2/src/github.com/ipfs/go-ipfs/commands/command.go:116 +0x286
main.callCommand(0x114b520, 0xc4202f0840, 0x1153780, 0xc420316300, 0x1230780, 0x120f0e0, 0x0, 0x0, 0xc420139e60, 0x407ad3)
        /home/kevina/gocode2/src/github.com/ipfs/go-ipfs/cmd/ipfs/main.go:349 +0x49a
main.(*cmdInvocation).Run(0xc4202f0780, 0x114b520, 0xc4202f0840, 0x1141ea0, 0xc4202ee400, 0x114b520, 0xc4202f0840)
        /home/kevina/gocode2/src/github.com/ipfs/go-ipfs/cmd/ipfs/main.go:191 +0x116
main.main()
        /home/kevina/gocode2/src/github.com/ipfs/go-ipfs/cmd/ipfs/main.go:156 +0x366

@ianopolous could you try this with the daemon offline and make sure the block is valid cbor object.

@ianopolous
Copy link
Member Author

Offline I get the same error. Also if I try and pin the first object I get:

Error: pin: cannot assign []byte into Kind=map Type= map[interface {}]interface {}(nil)

The first one is just a cbor byte[]. If this isn't allowed then that needs to be made clear. My reading of the docs was that any cbor is valid (and it would be a shame if not, as that will bloat the serialization with unnecessary stuff)

The second object has the following breakdown:
0xD90102 = Tag (258)
Then a cbor byte[] of 37 bytes corresponding to the multiaddr of the hash of the first blob (/ipfs/QmVBCpx91Yb5hGCqkQbWeqQ83B8r9mbN9EMr3c6y22ePKE) I think it should be a cid not a multihash, but that is what the first command is returning over the http api (which is where I generate the cbor from)
This means that, again, it isn't a map, but a tagged byte[].

@kevina
Copy link
Contributor

kevina commented Jan 10, 2017

@ianopolous thanks,I think this is a bug in the cbor library. I am afraid I am not that familiar with cbor objects so not sure how qualified I am to fix this. I will give it another shot before giving up tomorrow (Tuesday).

@ianopolous
Copy link
Member Author

@kevina I would guess the cbor library is fine, but that ipfs is assuming the root cbor object is a map, not a general cbor object.

@whyrusleeping
Copy link
Member

Yeah, I wasnt really thinking about having base 'non-map' cbor objects being handled when writing the cbor ipld stuff. We can either disallow that for now, or i can figure out how to rewrite the handling in the package to deal with those types of cbor things.

It really comes down to whats in the ipld spec, is "Foobar" a valid ipld object?

@whyrusleeping
Copy link
Member

@ianopolous I'm thinking that the objects you created are not valid ipld-cbor objects. We're thinking that only 'map type' top level objects should be allowed. (which means we need to guard against this)

@ianopolous
Copy link
Member Author

Can I ask why? As it adds a lot of unnecessary bytes to serialization, which matters in things with lots of small objects, like merkle-btrees. It is also conceptually simpler to allow any root object. I would assume in this case that you couldn't follow any ipld path through it, as it is effectively a leaf node as far as ipld is concerned? If you do decide to continue with that path I assume you will also restrict the keys in cbor maps to Strings, and if so, you should also make that clear.

@Kubuxu Kubuxu added the need/analysis Needs further analysis before proceeding label Jan 10, 2017
@ianopolous
Copy link
Member Author

I think the same case will also be encountered if you have an ipld selector path internal to an object, which ends up at a non map cbor object.

@whyrusleeping
Copy link
Member

@diasdavid @nicola @jbenet @lgierth What do you guys think here?

@whyrusleeping
Copy link
Member

@ianopolous could you give a clear description of your usecase here? It seems like youre trying to put your own format over the top of the dag-cbor format

@ianopolous
Copy link
Member Author

Our usecase is a merkle-btree with lots of small nodes forming non leaf nodes, and leaf nodes which are actual file fragments (encrypted) and thus close to the IPFS object size limit (they are currently just a cbor byte[]).

The non leaf nodes are a list of up to 16 (label, value, target) tuples where label is the key in the btree, and value and target are (optional) multihashes which end up as merkle links in the cbor. $value points to an encrypted metadata blob which in turn points to encrypted file fragments. $target points to another merkle-btree node.

We handle the navigation of the btree client side, but need to ability to just pin the root and have the whole tree pinned. The non leaf nodes have cbor merkle links to other nodes.

@kevina kevina removed their assignment Jan 10, 2017
@whyrusleeping
Copy link
Member

if the links are encrypted, theres no way that pin can handle traversing the graph

@whyrusleeping
Copy link
Member

Also, you can use the 'raw' type for data which doesnt need to be a dag-cbor formatted object.

@ianopolous
Copy link
Member Author

The files are encrypted, the links are in the clear obviously.

@whyrusleeping
Copy link
Member

Ah, okay. What does the json representation of your cbor structure look like?

@ianopolous
Copy link
Member Author

My understanding from reading all the docs and code I could get my hands on was that I could write an arbitrary cbor object (using block put) which may contain special cbor tagged merkle links, and that recursive pinning will work based on this.

@ianopolous
Copy link
Member Author

We never use JSON because it is terrible for binary data. Given a JSON that supports byte[] then any cbor structure that restricts map keys to strings is trivially mappable to JSON + byte[].

@whyrusleeping
Copy link
Member

@ianopolous No, its not arbitrary CBOR. Its ipld objects, that are CBOR encoded, see: https://github.com/ipld/specs/tree/master/ipld

Any cbor object can be represented as json though, i'm just trying to get an idea of the structure you have visually.

@nicola
Copy link
Member

nicola commented Jan 18, 2017

Conversation on mutable links here: ipld/specs#9

@nicola
Copy link
Member

nicola commented Jan 18, 2017

Let me reframe this conversation so that we have a common vocabulary on this!
There is an ipld object that can be represented via CBOR or JSON.
However, without properly parsing the representations, they are simply CBOR/JSON objects.

  • In (a), the CBOR and JSON representation are identical, which makes it easy to convert across those.
  • In (b), they are different representations, but they are the same ipld object, once parsed. The issue with this case is that converting across formats needs a special knowledge of IPLD (that the tag transforms to a new level of key-attribute). Also, creating and modifying JSON or CBOR objects in their raw format without the IPLD library might be not simple anymore.

My opinion

  • Think of IPLD representations as something that are only touchable via an IPLD library:
    some future formats might be impossible to edit in their raw representation, or in some cases it would not make sense to have a / extra key/value. (I can give an example if you need it).
  • Parsers with knowledge?: I think that the fundamental question is

    do we want anyone to properly parse raw cbor or only clients that know that that cbor should be read in the ipld way?

  • Solution (a) does not take advantage of the CBOR tagging system
    if someone is trying to read CBOR without an IPLD library, then they are probably not interested in the IPLD features, so it is fine if the tags disappear or are flattened out (as many libraries might do) with a generic CBOR converter.

Please, tell me if you see some concerns in this.

@whyrusleeping
Copy link
Member

@nicola I agree, ipld objects really should be intended to be dealt with by ipld libraries. Dealing with the tag stuff isnt that much work, and any cbor tool will convert it to a 'tag' object in json (which will properly roundtrip). As an example, @ianopolous did all the tag stuff correctly reading the spec before i even knew about it. It was then pretty simple for me to switch my code over to using the tags. @diasdavid If you really feel strongly about this, i'd love to hear a case where using the tag bites the user in a way thats really hard to deal with.

@whyrusleeping
Copy link
Member

However, i will ask that when we move forward with the tag approach, we select a value between 40 and 95 (the first unassigned range). using 258 consumes one extra byte per link and i'm all about saving bytes

@whyrusleeping
Copy link
Member

Is anyone opposed to using a tag of 42? Its in the unallocated range here: https://www.iana.org/assignments/cbor-tags/cbor-tags.xhtml and should be fair game. If no opposition is voiced, i'll move forward with that tomorrow.

Also, we need to reach consensus on the multiaddr thing...

@jbenet
Copy link
Member

jbenet commented Jan 19, 2017

On the multiaddr thing:

What would it mean if an ipld object had a merkle link with a multiaddr that was not content addressed, e.g. /ip4/8.8.8.8/tcp/8080 ? Or do you mean that all ipld paths are valid multi-addresses, but not the other way around?

The last part. i meant that "all ipld paths are valid multi-addresses, but not the other way around", and for now no mutable links. (this may relax in the future, but at this time, no mutable links).

we have a restriction that an IPLD link should be a valid CID in the resolver

Why? IPLD links should be able to be paths, like: /ipld/<cid>/a/b/c. not just a cid.

@jbenet
Copy link
Member

jbenet commented Jan 19, 2017

Though, since it looks like @whyrusleeping and @diasdavid both assumed that links were only CIDs (not sure why ;P), i'm ok to relax the spec constraints there, and say that for now it's only a CID, and not a full path. if people want to do a full path and enforce it now, speak now! (i prefer paths)

@ianopolous
Copy link
Member Author

@whyrusleeping Indeed, given a standard cbor library, it was easy to implement an ipld parser using the tags.

In my opinion, mutable (authenticated) links should be a layer up. Otherwise even a simple pinning operation could require arbitrary ipns lookups each with their own validity scheme, which would combine into something very hard to reason about. If the links are restricted to cids (or cid+path), then the resulting structures are very easy to understand and reason about.

I say this even though my use case would be a lot easier with ipns links in ipld (I have a hierarchy of write access controlled directories, each with their own ipns key). For me it would great to be able to pin the root and be done, but I can also handle this on the application level, and keep the ipld structure simple.

@whyrusleeping
Copy link
Member

Alright, going to move forward with all this tomorrow. The cid's put in the links will be prefixed with a multibase code for binary (a single zero byte) for consistency. Doing this will allow us to easily upgrade to multiaddr paths later on (if buf[0] != 0 )

@daviddias
Copy link
Member

@diasdavid If you really feel strongly about this, i'd love to hear a case where using the tag bites the user in a way thats really hard to deal with.

I don't feel strongly about it, I just raised the concern for the question made "How can we make this simple". We are going to document a lot anyway, it is just more a bunch of notes saying "use these libraries".

we have a restriction that an IPLD link should be a valid CID in the resolver
Why? IPLD links should be able to be paths, like: /ipld//a/b/c. not just a cid.

The resolver knows how to resolve paths, but I haven't made tests for links to be paths (missed that one). Nothing that can't be added :)

@whyrusleeping
Copy link
Member

go-ipfs will not support paths in links for a long time. please don't implement that, it won't interop.

@jbenet
Copy link
Member

jbenet commented Jan 19, 2017

@ianopolous did you implement path support, or just cid support?

@dignifiedquire
Copy link
Member

if people want to do a full path and enforce it now, speak now! (i prefer paths)

Please do paths, even if go doesn't support them currently it should at least be able to ignore them such that other implementations can use paths, they are one of the big features of IPLD.

@ianopolous
Copy link
Member Author

@jbenet Originally I used general multi-addresses, then changed it to cids, but it's easy to change back. It's just a parser.

@whyrusleeping
Copy link
Member

The spec for multiaddr ipld paths doesnt even exist yet. When we get to the point where we know what we're wanting to implement there then maybe we can think about implementing it

@whyrusleeping
Copy link
Member

Alright, heres my PR to the CBOR code: ipfs/go-ipld-cbor#5

I changed the cbor tag to 42, and changed links to contain a binary-multibase (the byte 0) prefixes binary CIDs. Please review.

@ianopolous
Copy link
Member Author

It's great to see the answer to the ultimate question of life, the universe and everything making it into the ipld spec. :-)

@whyrusleeping
Copy link
Member

@ianopolous I've merged the changes into master, and i've tested pinning a few different cbor objects, as well as added a test for it. Mind testing it independently and reporting back?

I'm going to optimistically close this issue (open it again if i'm wrong)

@whyrusleeping whyrusleeping removed the need/analysis Needs further analysis before proceeding label Jan 22, 2017
@ianopolous
Copy link
Member Author

Hmm, with 0.4.5-pre2-416f025 it hangs if I try and pin the first object in this issue (the cbor byte[]) through the cli, e.g.:

echo -e "\x4b\x67\x27\x64\x61\x79\x20\x49\x50\x46\x53\x21" | ipfs block put --format=cbor
ipfs pin add zdpuAue4NBRG6ZH5M7aJvvdjdNbFkwZZCooKWM1m2faRAodRe

@ianopolous
Copy link
Member Author

@whyrusleeping I also don't seem to have the permissions to reopen the issue.

@whyrusleeping whyrusleeping reopened this Jan 22, 2017
@whyrusleeping
Copy link
Member

@ianopolous hrm... doing that gets me 'merkledag not found'. Is that object a link?

@whyrusleeping
Copy link
Member

For some reason ipfs thinks that object has a link to: zdpuAsFzbmVjkDjySRJNAw2ndtAogPUVqCSZgP4MhxVRxVcE4... Unclear why. investigating.

@ianopolous
Copy link
Member Author

ianopolous commented Jan 22, 2017

It's just a cbor byte array. No links. The byte array contents are the UTF8 bytes of "g'day IPFS!"

@whyrusleeping
Copy link
Member

huh, it doesnt appear the cbor code likes roundtripping that byte array...

@whyrusleeping
Copy link
Member

@ianopolous lol, i found the issue. Using echo adds an extra newline to the end of the input, which when using 'block put', gets added as part of the block. But when the cbor library parses this block, it ignores that newline since its not part of the actual format, resulting in a different hash.

If you use printf instead of echo there, you should get the expected behaviour.

I'm not sure if i'm calling this a bug yet...

@ianopolous
Copy link
Member Author

@whyrusleeping Sorry, my bad. I can confirm that pinning a cbor byte[] works now. As well as pinning a merkle link to said byte[].
Cheers! Now to figure out why the http api is giving me multihashes not cids.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws) topic/repo Topic repo
Projects
None yet
Development

No branches or pull requests

8 participants