Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider encoding: Safebase #51

Open
DonaldTsang opened this issue Mar 14, 2019 · 12 comments
Open

Consider encoding: Safebase #51

DonaldTsang opened this issue Mar 14, 2019 · 12 comments

Comments

@DonaldTsang
Copy link

DonaldTsang commented Mar 14, 2019

https://github.com/kstenerud/safe-encoding has safe16, safe32, safe64, safe80 and safe85 for HTML/XML, JSON, URL and POSIX file names.

@DonaldTsang
Copy link
Author

https://github.com/tbaumgard/hybrid64 also has a better version of "zbase64"

@DonaldTsang
Copy link
Author

@Stebalien @lidel so, what alternate encoding can we bring on to the table to make multibase more diverse?

@DonaldTsang
Copy link
Author

I also kept a list of other bases in kstenerud/safe-encoding#3 and kstenerud/safe-encoding#5 and kstenerud/safe-encoding#6

@Stebalien
Copy link
Member

We have a very limited namespace here (there are only so many prefix characters we can use) so making multibase "more diverse" isn't really something I'd like to do (we already have several useless encodings I'd like to drop).

However, I do like those encodings. Are there any projects currently using them?

@DonaldTsang
Copy link
Author

DonaldTsang commented May 14, 2019

@Stebalien sadly right now there are no projects reported that are using those encodings.

But for diversity's sake, is it possible to "hack" the namespaces by using two-byte blocks? I would hope that such safe encodings could gradually phase out RFC-like encoding for some use cases.

@Stebalien
Copy link
Member

But for diversity's sake, is it possible to "hack" the namespaces by using two-byte blocks?

We've thought about this a bit. We could, e.g., allocate the s prefix for "safe" encodings. I'd be fine with this from a spec standpoint, but implementing it will be a bit of work.

Regardless, I'm not sure how to break the adoption/implementation chicken and egg situation. For us, the order-preserving property really isn't a sufficient motivation to implement this across the board and switch to it unless there's some kind of critical mass.

@DonaldTsang
Copy link
Author

@Stebalien the other option would be to instead create a human-oriented translation layer between safebase hash encodings used externally and actual RFC encoded hashes used internally.

@Stebalien
Copy link
Member

Not sure I follow.

@DonaldTsang
Copy link
Author

@Stebalien external IO and human input would be done in safebase, then there will be a translation layer between the multibase itself and the input to convert it to zbase32 and RFC-compliant base64, such that the user does not need to handle RFC-compliant zbase32 or base64 or other encodings within multibase.

@DonaldTsang
Copy link
Author

@lidel sorry, maybe this sound weird but it is a Christmas wish to see this through.

@lidel
Copy link
Member

lidel commented Jan 7, 2020

Internally, in IPFS/libp2p/IPLD, we use unwrapped, binary multihash, so base does not matter outside userland. Creating translation layer for text representations sounds like reinvention of CIDv1, which already allows a single multihash to be represented in multiple bases when in text form.

Looking at it from pragmatic perspective, adding safebase support to CID means not only adding it to the list in this repo, but to every library implementing it, such as go-cid, js-cid etc. This is a significant effort and given the fact that safebases are not being used in real world won't happen until there are libraries, community need or desired characteristic that is hard to ignore.

For now, in contexts where safehash provides real value, one can use it "in userland": extract binary multihash from CID (binary_multihash = new CID('QmHash').multihash) and do safehash(binary_multihash).

@mikeal
Copy link

mikeal commented Jan 7, 2020

Internally, in IPFS/libp2p/IPLD, we use unwrapped, binary multihash, so base does not matter outside userland.

Mostly, yes. dag-json uses base encoded CID’s internally in its format but uses a consistent base encoding and should probably be updated to not support alternative base encodings in order to maintain consistency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants