Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not clear what "raw" means in various CLI flags relating to format, or how to get those behaviors #8376

Open
Tracked by #8373
warpfork opened this issue Aug 24, 2021 · 7 comments

Comments

@warpfork
Copy link
Member

warpfork commented Aug 24, 2021

(I'm going to be transcribing a bunch of notes from slack and other places, mostly as quotes. The result will not be organized. You're dealing with it the same way I'm stuck dealing with it, reader. Enjoy.)

Original description: "Bug where can only get raw bytes using the git codec but not round trip it. It should not do the first or also do the second."


Some example behavior that seems confusing:

❯ echo '{"foo" : 5}' | ipfs dag put --input-enc=raw
bafyreigg3nfoc7bnomgpdmziymlhhy3tgsistf2s36js2nwsfrumtjirba
🏠/workspace
❯ ipfs dag get --format=git-raw bafyreigg3nfoc7bnomgpdmziymlhhy3tgsistf2s36js2nwsfrumtjirba
{"foo" : 5}
🏠/workspace took 5s
❯ echo '{"foo" : 5}' | ipfs dag put --input-enc=git-raw
Error: unrecognized object type: {"foo"

There is a long internal slack thread about this: https://protocollabs.slack.com/archives/C01MFLTLKTP/p1629207570249300 That link will require login and will also break someday. ┐_(ツ)_┌

@warpfork warpfork mentioned this issue Aug 24, 2021
3 tasks
@warpfork warpfork changed the title Bug where can only get raw bytes using the git codec but not round trip it. It should not do the first or also do the second. Not clear what "raw" means in various CLI flags relating to format, or how to get those behaviors Aug 24, 2021
@warpfork
Copy link
Member Author

My interpretation of the sample behavior in the initial post, initially made a week ago in slack DM, and now replicated here for posterity, was:

yes, that made sense to me

you told dag put that you're giving it raw bytes, and to transcode it (that's probably what i would've called the 'format' flag) to dag-json immediately as it's saving it.

so the data saved is going to be the literal bytes {"/":{"bytes":"eyJmb28iIDogNX0NCg"}}, and the CID should say dag-json.

when you ask for dag get later, it's doing pretty much the opposite transcoding.

@aschmahmann asked:

ok, so that means that if what I wanted to do was "take the following dag-json bytes and store them as dag-json" I need to do either:

ipfs dag put --input-enc=dag-json --format=dag-json or ipfs block put --format=dag-json then right?

... and asked if that sounded a little nuts.

I emoji-replied "no", as in, that doesn't sound nuts.

(And to explain that a little more, I mean: while none of those flags are what I would've expected them to be called, the behavior overall doesn't seem logically inconsistent or holey, at least.)

I continued to remark:

i think this means that if the block you dag get was a dag-json object of {"foo":"bar"} and you asked for --format=raw, it would say that it failed to transcode, though.

I offered the interpretation:

so i think there's a conflation going on between "raw, which is a codec, and can only contain exactly one node which must be of kind=bytes" and "raw, a concept for these APIs which says 'please do not attempt to transcode anything'"

does that seem to usefully disambiguate the problem?

@aschmahmann :

yep, I think that's about right. We've transitioned raw in the ipfs dag commands from "a raw chunk of bytes" to "the representation of bytes via the raw codec"

Okay. So we seem to have agreed here, at that time, last week, that we have problems around clarity of expectations and the nomenclature for what "raw" ... means. That seems to be the root issue that needs resolving before taking any other decisive action.


I further offered:

maybe ipfs should add an ipfs dag transcode function which doesn't store anything, it just does the transcodes. that might make for some good example material and be generally useful. (throwing it out there, idk if that's actually a thing to bother doing today.)

I continue to not be attached to whether or not that's a good idea, or a high priority.

@warpfork
Copy link
Member Author

Here's another transcript that seems important to hoist from being ephemeral and inaccessible in slack:

@willscott had attested, in a slack:

the behavior you describe sounds correct, imo

you've taken raw bytes, and encoded them in dag-json. you got back the dag-json representation of bytes, which is as described.

you can set --format=raw i think to store that directly as those bytes

@aschmahmann asked:

Right, but what if I have some dag-json file and I just want to import it? That seems like we'd want ipfs block put --format=dag-json right? Or are you thinking people should use ipfs dag put --input-enc=dag-json --format=dag-json?

@willscott :

if you have dag-json to import as a dag, you should ipfs dag put --input-enc=dag-json

@aschmahmann :

Unless you specify --format too the data will be transcoded to dag-cbor , right?

@willscott :

i think that's default, yes

@aschmahmann

Because we're transcoding (and validating) the data the codec is required here.Until we fix ipfs block put this now means that the only way to import say dag-cose data into a go-ipfs node without having the codec installed is via car file.

@willscott

right. you can't say to just trust you that it's actually in some other format than you imported it as

At this point, because we were in the context of still trying to merge #7976, @willscott asked:

I don't know the semantics of block put fully. do we need to spill over to also modify that command as part of this merge?

And I think we reached the relevant conclusion of this thread when @aschmahmann replied:

I'm realizing that this was broken (i.e. no way to put dag-json codec data) before the IPLD prime stuff came in, so I'm not going to block merging (or the RC) on this.
We do need to fix this though.

And @willscott agreed:

i think it's probably okay for that to remain as an unsafe entry as it exists today

@warpfork
Copy link
Member Author

Continuing that DM with @aschmahmann , I also observed:

i'm a bit terrified that --format means very different things on these commands. Is the following accurate?

  • on dag put it means "transcode this before storage"
  • on dag get it means "transcode what i'm about to print to you"
  • on block put it means... "trust me, i encoded this right already, just force the cid to state this code?"

this seems dangerously confusing to me.

the odds of someone starting by reading the dag docs, and then going to block and assuming that --format is still a relatively safe/sane/checks-itself thing, and then footguns themselves (and anyone else who tries to read that probably now mis-coded block) pretty hard, is... high

@aschmahmann confirmed that this is a roughly correct interpretation of what those flags do.

@warpfork
Copy link
Member Author

Okay. I believe I've transcribed all the most relevant stuff from ephemeral locations. 😓

ISTM: the next step on this is figuring out what the ipfs commands are actually supposed to do.

I do not know how to propose any useful code changes until those expectations are made clear.

I also do not believe anything we're looking at is an IPLD layer concern at this point. We're entirely talking about CLI flags this entire time -- and legacy ones at that. I'd frankly like to back myself away from this issue.

@warpfork
Copy link
Member Author

As @aschmahmann and I continue to talk about it in Zoom right now, his updated statement is:

It's unclear if the problem here is:

  • that git-raw codec is being really weirdly lenient during decode?
  • or that the ipfs dag get subcommand is generally being weirdly lenient or is transcoding more calmly than expected?
  • or something, generally, about htis feels wrong

@aschmahmann
Copy link
Contributor

@warpfork the thread you've copied is largely unrelated to the issue described in the original post.

Poking around a bit, the reason why the git-raw codec happily interprets bytes is these lines of code https://github.com/ipfs/go-ipld-git/blob/master/marshal.go#L21-L25 which don't really have an explanation for why we encode raw bytes as a Git blob object (although I see the general idea). On the other hand decoding expects the strict Git format https://github.com/ipfs/go-ipld-git/blob/cd342110918004a57fe7c3e6d3f9fb79f65c16ce/unmarshal.go#L12.

Does lenient encoding make sense here? If so I guess we can close this issue, although I suspect strict encoding is generally what's expected from codecs.

@warpfork
Copy link
Member Author

Ah okay. If we traced the root of our questions over there, should we make an issue in that repo?

(I agree that seems at least "hmmm 🤔", but do not have an instant judgement on whether its worth taking an action on, or what that action would be. And I'm not sure who owns that choice or what the priority should be.)

Then -- should we close this issue? FWIW, I think the consistency issue with the --format flags remains terrifying, but I don't really care if this issue remains open for it either. (I'm not bullish on github issues being useful ways to track holistic design concerns like that.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants