Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For certain Bech32 strings, deleting or inserting a single character produces a string that is still valid #51

Open
jonathanknowles opened this issue May 28, 2019 · 4 comments

Comments

@jonathanknowles
Copy link

@jonathanknowles jonathanknowles commented May 28, 2019

Overview

For certain Bech32 strings, deleting or inserting just a single character can produce a string that is still valid when tested against the reference implementations.

Implementations Tested

Cases

There appear to be two main cases:

  1. Insertion: if a valid Bech32 string has the suffix p, inserting a single q character immediately before the p will produce another valid Bech32 string.

  2. Deletion: if a valid Bech32 string has the suffix qp, removing the q character will produce another valid Bech32 string.

These production rules can apparently be applied repeatedly, either by deleting many q characters, or by inserting many q characters, producing a valid Bech32 string after each step.

Examples

The string ii2134hk2xmat79tp is a valid Bech32 string, with:

  • human-readable part ii2
  • data part 34hk2xm
  • checksum at79tp

Insertion

By inserting one or more q characters immediately before the final p character, we can produce more strings that are valid Bech32 strings:

  • ii2134hk2xmat79tqp
  • ii2134hk2xmat79tqqp
  • ii2134hk2xmat79tqqqp
  • ii2134hk2xmat79tqqqqp

Deletion

Conversely, by deleting one or more q characters located immediately before a final p character, we can go the other way. The following are all valid Bech32 strings:

  • eyg5bsz1l2mrq5ypl40hqqqp
  • eyg5bsz1l2mrq5ypl40hqqp
  • eyg5bsz1l2mrq5ypl40hqp
  • eyg5bsz1l2mrq5ypl40hp

Outstanding Question

It's currently unknown to us whether this is a property of the specification itself, or merely a feature of the reference implementations that we tested.

Background

We ran a suite of property tests against a modified version of the reference implementation, designed to simulate simple user errors such as omitting a character from, or inserting an extra character into, pseudo-randomly generated Bech32 strings.

@jonathanknowles jonathanknowles changed the title For certain Bech32 strings, deleting or inserting just a single character will produce a string that is **still valid**. For certain Bech32 strings, deleting or inserting a single character will produce a string that is still valid May 28, 2019
@jonathanknowles jonathanknowles changed the title For certain Bech32 strings, deleting or inserting a single character will produce a string that is still valid For certain Bech32 strings, deleting or inserting a single character produces a string that is still valid May 28, 2019
@sipa

This comment has been minimized.

Copy link
Owner

@sipa sipa commented May 29, 2019

Hi,

thanks for analysing this. This is not by design, but it can be explained from the specification. It's also a bit more general than what you explain (basically: take a bech32 string, xor a 1 into the last character, then push or pop as many 'q's as you like, and then xor a 1 into the last character again... should always give you a valid new bech32 string).

We should document this.

@jonathanknowles

This comment has been minimized.

Copy link
Author

@jonathanknowles jonathanknowles commented May 29, 2019

thanks for analysing this.

No worries! We discovered this in our testing, and wanted to share the finding.

@harding

This comment has been minimized.

Copy link

@harding harding commented Nov 18, 2019

I can't fully replicate the claims of the OP when using BIP173 addresses for Bitcoin. Specifically, the "Cases" section seems to imply one or more qs may always be added or removed in the penultimate position of a bech32 string ending in p. However, in my testing (code below), I find that:

  1. A q inserted in the penultimate position of a string ending with p will only result in a failure to catch the insertion with about 50% probability. (Obviously that's still bad.)

  2. A q deleted from the penultimate position of a string ending with p will almost always be caught (in my testing, they were always caught, but I only tested thousands of combinations for this analysis). Of course, if a q is successfully inserted as per above, it can later be removed.

Here's code for checking the frequency of detecting an insertion error that uses the reference python library. It's easy to modify the regex and slice to check for deletion errors.

from hashlib import sha256
import segwit_addr
import re

## Make results completely reproducible by using a shachain initialized
## from a seed.
SEED='foo'

## Print the seed into the log
print('Seed: "{}"'.format(SEED))

## Initialize the program value from our seed
program = sha256(bytes(SEED, 'utf-8')).digest()

typos_missed=0
typos_caught=0
iterations=0
while True:
    ## Derive a new (dummy) witness program
    program = sha256(bytes(program)).digest()

    ## Create a v1 segwit address for the program.  Use v1 because its
    ## only current restrictions are a length between 2 and 40 bytes (inclusive)
    addr = segwit_addr.encode('bc', 1, program)

    ## If the address ends with 'p', attempt inserting a 'q' in the
    ## penultimate position and then see if the address decodes
    if re.search('p$', addr):
        mod_addr = addr[0:-1] + 'qp'
        if segwit_addr.decode('bc', mod_addr) == (None, None):
            typos_caught += 1
        else:
            typos_missed += 1
    else:
        continue

    iterations += 1
    if iterations % 1e4 == 0:
        print("Missed:", typos_missed / iterations)
        print("Caught:", typos_caught / iterations)
        print("p$ addresses checked:", iterations)
        break

I don't think this changes anything significantly for this issue, but I wanted to mention it as I was doing some other analysis on the frequency of uncaught deletion errors and couldn't understand why, in millions of random addresses, I hadn't yet encountered a problem with any addresses mutated from '...qp' to '...p'.

@sipa

This comment has been minimized.

Copy link
Owner

@sipa sipa commented Nov 18, 2019

@harding Nice find. What you're seeing is the effects of the BIP173 address encoding beyond Bech32 itself.

A segwit address is encoded in Bech32 as one symbol for the witness version, plus an 8bit-to-5bit conversion of the witness program bytes. This conversion has the following properties:

  • Not every resulting length is legal (e.g. 32 bytes turns into 52 symbols, but 31 bytes turns into 50 symbols; anything with 51 symbols for the witness program is illegal).
  • Padding bits in the conversion must be 0.

When you're turning a final "qp" into "p" for an otherwise 32-byte witness program, you end up with an invalid length (turning "qqp" into "p" should sometimes work, though).

When you're turning "p" into "qp", one symbol from the checksum now becomes a data symbol. That data symbol becomes the final part of the 8-to-5 converted witness program in BIP173 addresses, so it must be correctly padded. If the original checksum bits on those places were not zero, the result will not be a valid address.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.