Skip to content

Latest commit

 

History

History
66 lines (43 loc) · 3.54 KB

bcr-2020-004-bc32.md

File metadata and controls

66 lines (43 loc) · 3.54 KB

The BC32 Data Encoding Format

BCR-2020-004

© 2020 Blockchain Commons

Authors: Wolf McNally, Christopher Allen
Date: May 5, 2020


Introduction

The [Bech32] checksummed base32 format was introduced to transport segregated witness signatures for Bitcoin. It has several desirable qualities:

  • The payload and checksum data use a limited character set compatible with both URI syntax and QR code alphanumeric encoding mode.
  • The character set uses the letters A-Z in a case-modal fashion (i.e., one can use all upper case or all lower case but the algorithm rejects mixed-case.)
  • The character set is chosen to minimize ambiguity, and the ordering is chosen to minimize the number of pairs of similar characters.
  • Built-in error detection and optional error location for payloads <80 characters.
  • Because the character set used are all "word characters", most text editors will select the entire sequence of characters when it is double-clicked.

However, it also has a major drawback when considered for general data encoding:

  • The characters allowed for the human readable part ("HRP") conflict with both URI syntax and QR code alphanumeric encoding mode.
  • The HRP is not useful to identify a wide variety of general structured data types.

Proposal

This document proposes a new, more general data encoding format BC32 that is based on Bech32 but makes the following modifications:

  • The HRP and numeral 1 divider are no longer included.
  • TODO: Checksum parameters are defined to handle error correction and identification for larger payloads.

Relative Efficiency

The following table compares the relative efficiency of various binary-to-text encoding methods. base32 encoding (and BC32 excluding the checksum) is 12.5% less efficient than base64, but also 12.5% more efficient than hexadecimal. The benefit gained by using only 32 characters is seamless compatibility with both QR code alphanumeric encoding and URI syntax.

Format Number of characters Number of bits per character Efficiency compared to raw binary
raw binary 256 8.0 100%
base64 64 6.0 75%
base45 (QR Code alphanumeric) 45 5.49185 68.65%
base32, bech32, BC32 (not including checksum) 32 5.0 62.5%
hexadecimal 16 4.0 50%

✳️ Note: Although it would appear that base32 is always less efficient than base64, this is not the case when encoding a payload for transport in QR codes. Because BC32 uses a character set optimized for the more efficient QR code alphanumeric mode, a payload of 1000 random bytes results in a QR code 13% less dense when the payload is encoded with BC32, compared to the same payload encoded as Base64.

Implementations

Current implementations:

Test Vectors

Input BC32 Encoded
"Hello world" (UTF-8) == 48656c6c6f20776f726c64 fpjkcmr0ypmk7unvvsh4ra4j
d934063e82001eec0585ee41ab5d8e4b703a4be1f73aec21e143912c56 my6qv05zqq0wcpv9aeq6khvwfdcr5jlp7uawcg0pgwgjc4shjm6xu

References

Unused References