For certain Bech32 strings, deleting or inserting a single character produces a string that is still valid #51

jonathanknowles · 2019-05-28T08:49:43Z

Overview

For certain Bech32 strings, deleting or inserting just a single character can produce a string that is still valid when tested against the reference implementations.

Implementations Tested

Cases

There appear to be two main cases:

Insertion: if a valid Bech32 string has the suffix p, inserting a single q character immediately before the p will produce another valid Bech32 string.
Deletion: if a valid Bech32 string has the suffix qp, removing the q character will produce another valid Bech32 string.

These production rules can apparently be applied repeatedly, either by deleting many q characters, or by inserting many q characters, producing a valid Bech32 string after each step.

Examples

The string ii2134hk2xmat79tp is a valid Bech32 string, with:

human-readable part ii2
data part 34hk2xm
checksum at79tp

Insertion

By inserting one or more q characters immediately before the final p character, we can produce more strings that are valid Bech32 strings:

ii2134hk2xmat79tqp
ii2134hk2xmat79tqqp
ii2134hk2xmat79tqqqp
ii2134hk2xmat79tqqqqp

Deletion

Conversely, by deleting one or more q characters located immediately before a final p character, we can go the other way. The following are all valid Bech32 strings:

eyg5bsz1l2mrq5ypl40hqqqp
eyg5bsz1l2mrq5ypl40hqqp
eyg5bsz1l2mrq5ypl40hqp
eyg5bsz1l2mrq5ypl40hp

Outstanding Question

It's currently unknown to us whether this is a property of the specification itself, or merely a feature of the reference implementations that we tested.

Background

We ran a suite of property tests against a modified version of the reference implementation, designed to simulate simple user errors such as omitting a character from, or inserting an extra character into, pseudo-randomly generated Bech32 strings.

The text was updated successfully, but these errors were encountered:

sipa · 2019-05-29T06:06:32Z

Hi,

thanks for analysing this. This is not by design, but it can be explained from the specification. It's also a bit more general than what you explain (basically: take a bech32 string, xor a 1 into the last character, then push or pop as many 'q's as you like, and then xor a 1 into the last character again... should always give you a valid new bech32 string).

We should document this.

jonathanknowles · 2019-05-29T10:19:18Z

thanks for analysing this.

No worries! We discovered this in our testing, and wanted to share the finding.

harding · 2019-11-18T04:04:05Z

I can't fully replicate the claims of the OP when using BIP173 addresses for Bitcoin. Specifically, the "Cases" section seems to imply one or more qs may always be added or removed in the penultimate position of a bech32 string ending in p. However, in my testing (code below), I find that:

A q inserted in the penultimate position of a string ending with p will only result in a failure to catch the insertion with about 50% probability. (Obviously that's still bad.)
A q deleted from the penultimate position of a string ending with p will almost always be caught (in my testing, they were always caught, but I only tested thousands of combinations for this analysis). Of course, if a q is successfully inserted as per above, it can later be removed.

Here's code for checking the frequency of detecting an insertion error that uses the reference python library. It's easy to modify the regex and slice to check for deletion errors.

from hashlib import sha256
import segwit_addr
import re

## Make results completely reproducible by using a shachain initialized
## from a seed.
SEED='foo'

## Print the seed into the log
print('Seed: "{}"'.format(SEED))

## Initialize the program value from our seed
program = sha256(bytes(SEED, 'utf-8')).digest()

typos_missed=0
typos_caught=0
iterations=0
while True:
    ## Derive a new (dummy) witness program
    program = sha256(bytes(program)).digest()

    ## Create a v1 segwit address for the program.  Use v1 because its
    ## only current restrictions are a length between 2 and 40 bytes (inclusive)
    addr = segwit_addr.encode('bc', 1, program)

    ## If the address ends with 'p', attempt inserting a 'q' in the
    ## penultimate position and then see if the address decodes
    if re.search('p$', addr):
        mod_addr = addr[0:-1] + 'qp'
        if segwit_addr.decode('bc', mod_addr) == (None, None):
            typos_caught += 1
        else:
            typos_missed += 1
    else:
        continue

    iterations += 1
    if iterations % 1e4 == 0:
        print("Missed:", typos_missed / iterations)
        print("Caught:", typos_caught / iterations)
        print("p$ addresses checked:", iterations)
        break

I don't think this changes anything significantly for this issue, but I wanted to mention it as I was doing some other analysis on the frequency of uncaught deletion errors and couldn't understand why, in millions of random addresses, I hadn't yet encountered a problem with any addresses mutated from '...qp' to '...p'.

sipa · 2019-11-18T07:19:40Z

@harding Nice find. What you're seeing is the effects of the BIP173 address encoding beyond Bech32 itself.

A segwit address is encoded in Bech32 as one symbol for the witness version, plus an 8bit-to-5bit conversion of the witness program bytes. This conversion has the following properties:

Not every resulting length is legal (e.g. 32 bytes turns into 52 symbols, but 31 bytes turns into 50 symbols; anything with 51 symbols for the witness program is illegal).
Padding bits in the conversion must be 0.

When you're turning a final "qp" into "p" for an otherwise 32-byte witness program, you end up with an invalid length (turning "qqp" into "p" should sometimes work, though).

When you're turning "p" into "qp", one symbol from the checksum now becomes a data symbol. That data symbol becomes the final part of the 8-to-5 converted witness program in BIP173 addresses, so it must be correctly padded. If the original checksum bits on those places were not zero, the result will not be a valid address.

sipa · 2019-12-14T01:59:30Z

For those not following the bitcoin-dev mailinglist, I've posted a more elaborate analysis of this (and other) kinds of errors in https://gist.github.com/sipa/a9845b37c1b298a7301c33a04090b2eb.

I'll work on including the findings into BIP173, and proposing a variant of the algorithm that's resistant to all known classes of errors (with failure probability 1/2^30 or better).

roconnor-blockstream · 2020-07-16T15:38:19Z

I've created a proposal to amend BIP-173 to address this issue, and would appreciate any feedback.

See: sipa/bech32#51

jonathanknowles changed the title ~~For certain Bech32 strings, deleting or inserting just a single character will produce a string that is **still valid**.~~ For certain Bech32 strings, deleting or inserting a single character will produce a string that is still valid May 28, 2019

jonathanknowles changed the title ~~For certain Bech32 strings, deleting or inserting a single character will produce a string that is still valid~~ For certain Bech32 strings, deleting or inserting a single character produces a string that is still valid May 28, 2019

jonathanknowles mentioned this issue May 28, 2019

Bech32 decode function successfully decodes certain classes of corrupted strings. cardano-foundation/cardano-wallet#324

Closed

3 tasks

matheusd mentioned this issue Nov 26, 2019

zpay32: Fix broken last tagged field lightningnetwork/lnd#3767

Merged

ccorcoveanu mentioned this issue Apr 7, 2020

Deleting or inserting characters produce valid strings bitcoinjs/bech32#34

Closed

glozow mentioned this issue Aug 13, 2020

Add discussion question for bech32 mutability chaincodelabs/seminars#24

Closed

kayabaNerve mentioned this issue Dec 12, 2020

Bech32 checksums easily pass for mutated strings. MerosCrypto/Meros#269

Closed

sipa mentioned this issue Feb 3, 2021

Add support for Bech32m #56

Merged

sipa closed this as completed in #56 Feb 3, 2021

SebastienGllmt mentioned this issue Feb 23, 2021

CIP-0014: User-facing Asset Fingerprint cardano-foundation/CIPs#64

Merged

TheWaWaR added a commit to TheWaWaR/ckb-cli that referenced this issue Jun 22, 2021

fix: use bech32m to fix bech32 security bug

a411f2a

See: sipa/bech32#51

TheWaWaR mentioned this issue Jun 22, 2021

fix: use bech32m to fix bech32 security bug nervosnetwork/ckb-cli#404

Closed

CipherWang mentioned this issue Sep 22, 2021

RFC21: propose a new full address format nervosnetwork/rfcs#239

Merged

prettyirrelevant mentioned this issue Aug 19, 2022

add support for taproot btc rotki/rotki#4714

Merged

stl1988 mentioned this issue Dec 20, 2022

Why not bech32m for nip19? nostr-protocol/nips#102

Closed

schildbach mentioned this issue Jan 30, 2023

SegwitAddress: include length check for Taproot witness programs bitcoinj/bitcoinj#2663

Merged

BenWestgate mentioned this issue Jul 25, 2023

Source code for error correction? BlockstreamResearch/codex32#54

Open

bordalix mentioned this issue Nov 14, 2023

Ark encoding ark-network/ark#6

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

For certain Bech32 strings, deleting or inserting a single character produces a string that is still valid #51

For certain Bech32 strings, deleting or inserting a single character produces a string that is still valid #51

jonathanknowles commented May 28, 2019 •

edited

Loading

sipa commented May 29, 2019

jonathanknowles commented May 29, 2019 •

edited

Loading

harding commented Nov 18, 2019

sipa commented Nov 18, 2019

sipa commented Dec 14, 2019 •

edited

Loading

roconnor-blockstream commented Jul 16, 2020

For certain Bech32 strings, deleting or inserting a single character produces a string that is still valid #51

For certain Bech32 strings, deleting or inserting a single character produces a string that is still valid #51

Comments

jonathanknowles commented May 28, 2019 • edited Loading

Overview

Implementations Tested

Cases

Examples

Insertion

Deletion

Outstanding Question

Background

sipa commented May 29, 2019

jonathanknowles commented May 29, 2019 • edited Loading

harding commented Nov 18, 2019

sipa commented Nov 18, 2019

sipa commented Dec 14, 2019 • edited Loading

roconnor-blockstream commented Jul 16, 2020

jonathanknowles commented May 28, 2019 •

edited

Loading

jonathanknowles commented May 29, 2019 •

edited

Loading

sipa commented Dec 14, 2019 •

edited

Loading