-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Handling of Alternative Multibases #5349
Comments
If we allow CIDs to carry a multibase, setting the base should be fairly easy to do with the new |
I'm not actually sure this is user-friendly. If I run: ipfs pin f017012205be9e545b52def2d18a922146c6cdec6bcbf4007b6ee7bec3341673bbf1d8c06 zb2rhfwrxLo7wm3ZT2ishCdimPKWdRCe3KuxHFQYMR8Kyj96x And get back a bunch of different CIDs in different bases, that's going to be a bit confusing.
I agree with your analysis here. Predictability is key. Personally, I'd go with 1 and add a global flag (possibly with an environment variable as a fallback?).
Damn... We should be converting them back and forth (well, we should be using a binary RPC transport). Oh well. Regardless, we can use an environment variable, we just have to forward it as an argument. We could also start converting them back. That is, just replace every "string that is actually a CID" with a CID (and make sure that go can unmarshal everything correctly). That would actually be better from a backwards compat standpoint. How does that sound? We could also mix in a bit of 3 but I'm wary about that. It's easy for commands that take one argument and spit out one argument but not something we can do reliably for commands that take many arguments. |
Regardless of what we do, I really don't want to carry display information with CIDs themselves. |
We will have to agree to disagree. I don't think it will be confusing at all if we return the same string. I.e.
|
This will be an API change. As |
@kevina If our endgame is to move to base32 as the universal default, users should not need to think about getting "the right" encoding unless they explicitly choose to do so. If we look at this from that perspective, option 1 is not a problem, it is a feature. It may be actually a good UX idea to "normalize" to base32 by default, enforcing its universal use. If user needs to interact or get different encoding then config or CLI That being said, option 2 is also fine (but not ideal, as users may need to always pass |
In that case, I agree. I was thinking about the
Hm. Yeah, I forgot about that (technically the correct solution but a massive breaking change). |
One way to fix this is to introduce a new type, maybe called 'APICid` that has a type: var APICidBase := mbase.Base58BTC // effects the as String method, not JSON output
const APICidJSONBase := mbase.Base32
type APICid stuct {
str string // always in APICidJSONBase
}
func FromCid(*cid Cid) {}
func FromString(str string) {} // may not be necessary
func (c APICid) AsString() {
return WithBase(APICidBase)
}
func (c APICid) WithBase(enc mbase.Encoder) string {
if CidV0 string or already in enc {
return c.str
}
// convert Cid to base enc
}
func (c APICid) AsCid() *cid.Cid {...}
func (c APICid) UnmarshalJSON(b []byte) error {...}
// encodes the cid as a JSON string
func (c APICid) MarshalJSON() ([]byte, error) {
return json.Marshal(c.str)
} Then convert the type of anything that is a Cid String to APICid. We might be able to get away with just calling To change the global base the variable APICidBase can be changed. This will be a client side change so we can use a global variable now. |
Another solution is to use refmt. It has support for explicitly specifying how different types should be encoded to/from JSON. |
Perhaps. I would need to spend some time investigating that option. The idea of a new type is that it works within our existing framework.... |
Fair enough. That's actually a pretty clean way to do it.
Won't that cause problems with CIDv0? |
I am still of the option that will should just allow multibase prefixes on CidV0. @whyrusleeping was against it but I discussion feels unfinished as I do not see how it can cause harm. However, since we made that decision that multibase prefixes on CidV0 are not allowed, CidV0 will simply remain the same while CidV1 will be converted to |
The client side option didn't work out quite as planned so now I am of the opinion that it is better to do it on the server side in order to move things forward. There are now two outstanding p.r. #5777 and #5789. There are mostly identical except one does most of the conversion on the client side and the other the server side. From a comment in #5777:
For completeness there is also a third alternative solution that @magik6k left as a review comment in #5789 with regard to do the conversion within the CoreAPI which I responded to in that p.r. I created two separate p.r. so we can compare the impact of the change and make an informed decision. In both implementations I simplified things as best I could so hopefully one of them can get in A.S.A.P. and before any more command lib changes. Once this is in the next step is to provide (at minimum) a CC: @Stebalien |
So if this is badly time constrained, I'm probably fine with the current server-side approach, it's not super optimal, but gives us something to iterate on. |
@magik6k thanks, that the idea, something as non-invasive as possible so we can clean it up later. |
@Stebalien @alanshaw if it will help move things along may I offer the following compromise:
Right now we don't use true CIDs in many places in the API in Thoughts? |
IMHO it should be supported on all the HTTP endpoints that return CIDs or none. From an outsider's point of view supporting it on some endpoints but not others will seem arbitrary, will be frustrating and might cause some confusion. I didn't realise this would be quite so problematic and I don't wish to use up your time unnecessarily. I feel this has gone on for a bit too long now and I don't want the reason to be because of something I have suggested. I think your time would be better used to actually work on the switch to base32. All I really need is consensus with the go team on whether to support it or not on the HTTP endpoints and I can get ipfs/js-ipfs#1552 merged. I believe we should add support but if go-ipfs isn't able and there's no intention to do so then there's no point in js-ipfs supporting it and I will back out my changes. For the record, there are no bad feelings here, lets just get the decision made and push this forwards 😄 🚢 |
I think the JS and Go implementation are structured differently. The Go implemenation uses a strict client/server model. For
None is probably not a good option because things like paths are best done via the server side, otherwise we will have to reparse the path client side to fix the CID (at least in the I proposed this because I believe @Stebalien wanted to make it possible to switch to a more compact binary API such as using CBOR for the response rather than JSON. If a binary API is used it would make sense that the CIDs are stored as binary strings rather then encoded using a multibase. In this case the CIDs will need to be converted client side.
Your suggestion was only part of the reason I decided to to use a server side implementation and you are not the reason this has gone on for so long. |
Very good point. I agree with all the arguments for doing this server side (paths) and I also agree that we can't really do it for "real" (not just text) CIDs. Given some IPLD object, we don't want to reach in and change the version of it's CIDs (internally). This also gives us a pretty clear boundary: if we have a string CID, we convert on the server; if we have a CID "object", we convert on the client. It's a bit annoying that we don't do everything either on the server or the client but I can't see a way to do that. |
Thank you @kevina. This was a well thought out and clear analysis of the problem. |
Yes, agree. I wasn't actually including endpoints that return paths when I said it should be supported on all the HTTP endpoints that return CIDs or none. Thanks for your input @Stebalien and also @kevina for taking the time to consider this so carefully! The following is a list of endpoints that JS IPFS supports that return a CID (or a CID in a path). I've put ✅ next to the ones that will support
|
Sorry I overlooked those commands. @Stebalien do we want to do these server side?
I skipped |
So, really, I'd like to avoid changing the CID version for the However, having the options available but not applying them may be a bit confusing. I wonder if these options shouldn't, in fact, be global but should be added to every command where they apply. Thoughts?
We just need to consistently draw the line. If we format CIDs in wantlist/stat, we should also do so in the
I'm fine either way but I think this matters more for JavaScript. @alanshaw? |
@Stebalien form an implementation standpoint fixing |
So, I think we can do this by specifying the JSON encoder for the command (server-side) and using something like refmt to encode instead of Example because it's a bit confusing: package main
import (
"fmt"
"github.com/ipfs/go-cid"
"github.com/multiformats/go-multibase"
"github.com/polydawn/refmt/json"
"github.com/polydawn/refmt/obj/atlas"
)
func main() {
at := atlas.MustBuild(
atlas.BuildEntry(cid.Cid{}).Transform().TransformMarshal(atlas.MakeMarshalTransformFunc(func(c cid.Cid) (interface{}, error) {
b, _ := c.StringOfBase(multibase.Base32)
return map[string]string{"/": b}, nil
})).Complete(),
)
c, _ := cid.Parse("zb2rhkm1A8HXsfcGeuvKvAJtcksYSGspSj5udWoCZ6cpZk7Zr")
obj := map[string]interface{}{"foo": c}
marshaled, _ := json.MarshalAtlased(
json.EncodeOptions{},
obj,
at,
)
fmt.Printf("%s", marshaled)
} We can probably use the same logic client-side. However, I'm just throwing this out there so we consider it. I'll leave any decisions up to you and @alanshaw (and @lidel and anyone else needing this from the browser). |
In the JS PR the cid-base option is added per command.
We don't have a filestore API yet in HTTP, core or CLI so whatever gets decided here can be implemented when the time comes. Likewise we don't have a dag HTTP API yet (although we do have core and partial CLI) hence why I didn't list
When you say API, do you mean HTTP, CLI or core or all of them? We're trying to get away from core returning string CIDs (paths are fine) but I'm totally fine with the CLI and HTTP API to "always uses the requested base for CIDs and auto-upgrades CIDs from v0 to v1 when necessary except in the |
I think it is better as a global option because it allows you to blindly use the option in scripts without having to worry if the command supports it. You can also do something like: alias ipfs="ipfs --cid-base=base64url" if for example you want or need all output in a different multibase. I do plan to propose config options for changing the defaults, but sometimes you want to change the default just for that script or local environment. Also the config options as I envision them will have slightly different effects that the command line option. For example |
I think this is something we should consider in the future. What I want to do now is make sure anything we do now allows us to change to something like this in the future, preferable in a way that is invisible to the end-user. |
I would add |
However, if we have this list of excluded commands, we need to be clear when we don't apply these functions. |
Until we switch to something like I agree somewhat that having the server also respect the |
Let's just do the best we can. |
@alanshaw
Let me know if anything is unclear. |
Implements an option for the CLI, HTTP API and core (where appropriate) that will allow the user to pick the multibase encoding for any CIDs that are returned as strings. **NOTE** Using the CID base option in `bitswap`, `dag` and `object` APIs **WILL NOT** auto upgrade your CID to v1 if it is a v0 CID and **WILL NOT** apply the encoding you specified. This is because these APIs return IPLD objects with links and changing the version of the links would result in a different hash for the node if you were to re-add it. Also, the CID you used to retrieve the node wouldn't actually refer to the node you got back any longer. [Read this](ipfs/kubo#5349 (comment)) for further context. This PR adds a `--cid-base` option to the following **CLI commands**: * `jsipfs bitswap stat` * `jsipfs bitswap unwant` * `jsipfs bitswap wantlist` * `jsipfs block put` * `jsipfs block stat` * `jsipfs add` * `jsipfs ls` * `jsipfs object get` * `jsipfs object links` * `jsipfs object new` * `jsipfs object patch add-link` * `jsipfs object patch append-data` * `jsipfs object patch rm-link` * `jsipfs object patch set-data` * `jsipfs object put` * `jsipfs object stat` * `jsipfs pin add` * `jsipfs pin ls` * `jsipfs pin rm` * `jsipfs resolve` Note: these two MFS commands _already_ implement the `--cid-base` option: * `jsipfs files ls` * `jsipfs files stat` It also adds `?cid-base=` query option to the following **HTTP endpoints**: * `/api/v0/bitswap/wantlist` * `/api/v0/bitswap/stat` * `/api/v0/bitswap/unwant` * `/api/v0/block/put` * `/api/v0/block/stat` * `/api/v0/add` * `/api/v0/ls` * `/api/v0/object/new` * `/api/v0/object/get` * `/api/v0/object/put` * `/api/v0/object/stat` * `/api/v0/object/links` * `/api/v0/object/patch/append-data` * `/api/v0/object/patch/set-data` * `/api/v0/object/patch/add-link` * `/api/v0/object/patch/rm-link` * `/api/v0/pin/ls` * `/api/v0/pin/add` * `/api/v0/pin/rm` * `/api/v0/resolve` It adds a `cidBase` option to the following **core functions**: * `resolve` License: MIT Signed-off-by: Alan Shaw <alan.shaw@protocol.ai>
As we switch to using CidV1 base32 encoding the question is how we should handle this displaying of CID in alternative multibases.
This issue is only about the output. For the input will we already handle CIDv1 in any base supported by
go-multibase
.Here are the alternatives, as I see it, and what I estimate it will take to do it.
1. Don't Preserve the multibase, Use Settings or Defaults to determine multibase
Accept CidV1 in any multibase, but ignore the base that is used. Output the CIDs based on either a global
--cid-base
flag or a config value.This means that the string a user passes in may not be the same string that is returned. For example
This also means that #5234 (ipfs resolve should respect CID base) will not be implemented.
The most straightforward way to implement this is replacing every call of
cid.String()
with something likecid.Encode(base)
(see ipfs/go-cid#60) and make sure the base is available at the point when this method needs to be called. TheWithValue
functionally on contexts can be helpful here.As most CID's are converted to strings on the server side, rather than the client, doing a quick hack such as setting a global variable is not going to work.
Another way to implement this, at least when working with the API, is to modify our API to make CID's a special type and then set the multibase as part of the writer that formats the output for this user. This means extra CPU cycles as the CID is converted from a binary represenation to a string twice. If we go this route I recommend the API be hardcoded to use
base32
as is a more efficient encoding than our current default ofbase58btc
.2. Preserve the multibase. Children of DAGs inherit multi-base of parent when known.
When a CID is decoded preserve the multibase used with the CID so when it is converted back to a string the same multibase will be used. In addition associate the CID of any children of the DAG with the same multibase prefix. When a CID does not have a multibase associated use the settings from the command line or config option.
ipfs pin add
will then display the same base. In addition commands likeipfs ls
will use the same base to display directory entries as the hash used for the parent.Implementing this will involve associating a multibase with CID's when it is known (see ipfs/go-cid#66). It will also involve setting the multibase of children CID's in the DAG code. New CID's will also need to get the base set somewhere.
After thinking about this I think this will involve less code change than #1 as the base does not need to be set everywhere, although the changes may not be as straightforward.
I think this is the most user friendly approach.
3. Attempt to Preserve the multibase.
This solution does not store the multi-base with the CID but makes an attempt to output the same multibase on a ad-hoc bases. For the current implementation of
ipfs resolve
it turned out to be easy.ipfs pin add
is more difficult. It was in fact after looking atipfs pin add
that I decided that #2 will be required to do it properly, especially when multiple CIDs are given on the command line.Implementing this involves doing #1 and in addition detecting the multibase of any CIDs used and setting and using the correct base for the output. Things can get tricky when CID's with different mutlibases are used at once. This will involve more code that either other solution.
#5289 started effort towards this solution (see the code for
ipfs resolve
andipfs ls
)I do not like this solution as it will only work some of the time and prefer we go with #1 or #2 for more predictable output.
The text was updated successfully, but these errors were encountered: