Self identifying base encodings
Multibase is a protocol for distinguishing base encodings and other simple string encodings, and for ensuring full compatibility with program interfaces. It answers the question:
Given data d encoded into string s, how can I tell what base d is encoded with?
Base encodings exist because transports have restrictions, use special in-band sequences, or must be human-friendly. When systems chose a base to use, it is not always clear which base to use, as there are many tradeoffs in the decision. Multibase is here to save programs and programmers from worrying about which encoding is best. It solves the biggest problem: a program can use multibase to take input or produce output in whichever base is desired. The important part is that the value is self-describing, letting other programs elsewhere know what encoding it is using.
Table of Contents
- Multibase By Example
The Format is:
<varint-base-encoding-code> is used according to the multibase table. Note that varints (bases above 127) are not yet supported, but planned.
Multibase Table v1.0.0-RC (semver)
The current multibase table is here:
encoding codes name identity 0x00 8-bit binary (encoder and decoder keeps data unmodified) base1 1 unary tends to be 11111 base2 0 binary has 1 and 0 base8 7 highest char in octal base10 9 highest char in decimal base16 F, f highest char in hex base32 B, b rfc4648 - no padding - highest letter base32pad C, c rfc4648 - with padding base32hex V, v rfc4648 - no padding - highest char base32hexpad T, t rfc4648 - with padding base32z h z-base-32 - used by Tahoe-LAFS - highest letter base58flickr Z highest char base58btc z highest char base64 m rfc4648 - no padding base64pad M rfc4648 - with padding - MIME encoding base64url u rfc4648 - no padding base64urlpad U rfc4648 - with padding
These encodings are being considered:
base128 base-emoji 😎 base emoji base65536 ᔰ base65536 utf8 utf16
Multibase By Example
Consider the following encodings of the same binary string:
4D756C74696261736520697320617765736F6D6521205C6F2F # base16 (hex) JV2WY5DJMJQXGZJANFZSAYLXMVZW63LFEEQFY3ZP # base32 YAjKoNbau5KiqmHPmSxYCvn66dA1vLmwbt # base58 TXVsdGliYXNlIGlzIGF3ZXNvbWUhIFxvLw== # base64
And consider the same encodings with their multibase prefix
F4D756C74696261736520697320617765736F6D6521205C6F2F # base16 F BJV2WY5DJMJQXGZJANFZSAYLXMVZW63LFEEQFY3ZP # base32 B zYAjKoNbau5KiqmHPmSxYCvn66dA1vLmwbt # base58 z MTXVsdGliYXNlIGlzIGF3ZXNvbWUhIFxvLw== # base64 M
The base prefixes used are:
F, B, z, M.
Is this a real problem?
Yes. If i give you
"1214314321432165" is that decimal? or hex? or something else? See also:
Why the strange selection of codes / characters?
The code values are selected such that they are included in the alphabets of the base they represent. For example,
F is the base code for
base16 (hex), because
F is in hex's 16 character alphabet. Note that the alphabets here are ASCII or UTF8 compliant. We have not found a case needing something else.
So that we have no limitation on functions or lengths. Implementation note: you do not need to implement varints until the standard multibase table has more than 127 functions.
What kind of varints?
An Most Significant Bit unsigned varint, as defined by the multiformats/unsigned-varint.
Don't we have to agree on a table of base encodings?
Yes, but we already have to agree on base encodings, so this is not hard. The table even leaves some room for custom encodings.
- Add yours here!
Warning: obviously multibase changes the first byte(s) depending on the encoding. Do not expect the value to be exactly the same. Remove the multibase prefix before using the value.
Contributions welcome. Please check out the issues.
Check out our contributing document for more information on how we work, and about contributing in general. Please be aware that all interactions related to multiformats are subject to the IPFS Code of Conduct.
Small note: If editing the README, please conform to the standard-readme specification.