Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decoding binary QR codes produces incorrect output #55

Closed
matheusmoreira opened this issue Jun 1, 2019 · 5 comments
Closed

Decoding binary QR codes produces incorrect output #55

matheusmoreira opened this issue Jun 1, 2019 · 5 comments

Comments

@matheusmoreira
Copy link
Contributor

matheusmoreira commented Jun 1, 2019

Decoding QR codes containing binary data with zbar 0.23 produces incorrect output.

I can reliably recover base64-encoded binary data encoded as alphanumeric and binary QR codes. However, binary data encoded directly in a binary QR code isn't decoded correctly by zbar.

Here's a script that reproduces the issue:

dd if=/dev/urandom bs=1024 count=1 > test.bin
base64 > test.b64                  < test.bin

qrencode -8 -o test.bin.8.png < test.bin
qrencode -8 -o test.b64.8.png < test.b64
qrencode -c -o test.b64.c.png < test.b64

zbarimg --raw test.bin.8.png | head -c -1 > test.bin.8.decoded.raw
zbarimg --raw test.b64.8.png | head -c -1 > test.b64.8.decoded.raw
zbarimg --raw test.b64.c.png | head -c -1 > test.b64.c.decoded.raw

sha1sum test.bin test.bin.8.decoded.raw \
        test.b64 test.b64.8.decoded.raw test.b64.c.decoded.raw

# 39378ba14d291710ac6458ced16fb0b29e901b84  test.bin
# 35138ed378bde1975e1908b3467dd8453b333bb4  test.bin.8.decoded.raw
# c4983f3be7f4a2ef3a63c9c9aa162155ffead6db  test.b64
# c4983f3be7f4a2ef3a63c9c9aa162155ffead6db  test.b64.8.decoded.raw
# c4983f3be7f4a2ef3a63c9c9aa162155ffead6db  test.b64.c.decoded.raw

I want to decode binary data that's been QR encoded and printed on paper. I verified that the problem exists when trying to decode scanned images as well as video captured with zbarcam.

@matheusmoreira matheusmoreira changed the title Decoding binary Decoding binary QR codes produces incorrect output Jun 1, 2019
@martinvonwittich
Copy link

Yes, I've noticed that too. I believe that zbarimg is trying to convert the binary result into UTF-8, and thereby accidentally mangles it. Note that a lot of suspect 0xc2 are inserted into the result:

martin@dogmeat ~ % diff -u <(cat test.bin|xxd -c 1 -p) <(cat decoded.bin|xxd -c 1 -p) | head -n 40
--- /proc/self/fd/13    2019-09-12 12:41:45.593341520 +0200
+++ /proc/self/fd/14    2019-09-12 12:41:45.589341484 +0200
@@ -1,182 +1,275 @@
 6d
+c2
 8d
 3a
 14
 73
 79
+c2
 be
-ea
+c3
+aa
+c2
 82
 6e
+c2
 8e
 4c
+c2
 80
-f1
+c3
+b1
 4f
-c4
-f6
+c3
+84
+c3
+b6
 24
-c8
-f0
+c3
+88
+c3
+b0

From https://www.fileformat.info/info/unicode/utf8.htm:

For any character equal to or below 127 (hex 0x7F), the UTF-8 representation is one byte. It is just the lowest 7 bits of the full unicode value. This is also the same as the ASCII value.

For characters equal to or below 2047 (hex 0x07FF), the UTF-8 representation is spread across two bytes. The first byte will have the two high bits set and the third bit clear (i.e. 0xC2 to 0xDF). The second byte will have the top bit set and the second bit clear (i.e. 0x80 to 0xBF).

And all bytes in decoded.bin that are prefixed with 0xc2 or 0xc3 are in fact > 127, while all other bytes are not.

Also, when I open test.bin and decoded.bin in vim and do :set fileencoding?, vim will identify test.bin as latin1, and decoded.bin as utf-8. That cannot be accidental.

I originally had assumed that --raw would instruct zbarimg not do to any mangling, but that's apparently not the case. It'll only prevent zbarimg from prefixing the decoded result with QR-Code:.

Seems like what we need would be a separate option --decode-raw that causes zbarimg not to mangle the decoded data?

@matheusmoreira
Copy link
Contributor Author

matheusmoreira commented Sep 12, 2019

Thanks for your insight!

The binary data decoding code is in the qr_code_data_parse function:

case QR_MODE_BYTE: {
  unsigned char *buf;
  unsigned       c;
  int            len;

  len = qr_pack_buf_read(&qpb, LEN_BITS[len_bits_idx][2]);
  if (len < 0)
    return -1;

  /*Check to see if there are enough bits left now, so we don't have to
     in the decode loop.*/
  if (qr_pack_buf_avail(&qpb) < len << 3)
    return -1;

  entry->payload.data.buf = buf = (unsigned char *) malloc(len * sizeof(*buf));
  entry->payload.data.len = len;

  while (len-- > 0) {
    c = qr_pack_buf_read(&qpb, 8);
    self_parity ^= c;
    *buf++ = (unsigned char) c;
  }
} break;

It doesn't seem to be mangling the data at this stage. I'll keep looking for the source of the problem.

@matheusmoreira
Copy link
Contributor Author

Tracing the calls:

  1. qr_code_data_parse
  2. qr_code_decode
  3. qr_reader_try_configuration
  4. qr_reader_match_centers
    • Recurses for double QR codes
  5. _zbar_qr_decode
    • Calls qr_code_data_list_extract_text after matching and parsing the QR code

The qr_code_data_list_extract_text function starts with these declarations:

iconv_t sjis_cd;
iconv_t utf8_cd;
iconv_t latin1_cd;
iconv_t big5_cd;

/*This is the encoding the standard says is the default.*/
latin1_cd = iconv_open("UTF-8", "ISO8859-1");
/*But this one is often used, as well.*/
sjis_cd = iconv_open("UTF-8", "SJIS");
/*This is a trivial conversion just to check validity without extra code.*/
utf8_cd = iconv_open("UTF-8", "UTF-8");
/* add support for big5 encoding. */
big5_cd = iconv_open("UTF-8", "BIG-5");

So it appears even binary QR codes are always treated as text. There's code to detect character encoding. Later it switches on the QR code mode:

/* DONE: This handles a multi-byte sequence split between
   multiple data blocks. */
case QR_MODE_BYTE:
case QR_MODE_KANJI: {
  // copy byte to bytebuf
  in = (char *) entry->payload.data.buf;
  inleft = entry->payload.data.len;
  memcpy(bytebuf_text + bytebuf_ntext, in, inleft * sizeof(*bytebuf_text));
  bytebuf_ntext += inleft;
} break;

matheusmoreira added a commit to matheusmoreira/zbar that referenced this issue Nov 5, 2019
If a QR code doesn't specify the text encoding of binary data through
Extended Channel Interpretation, ZBar tries to guess the encoding.
This unconditional character set conversion makes it impossible
to recover other types of binary data stored in the QR code.

The QR decoder now supports the ZBAR_CFG_RAW_BINARY config option.
If set, it will output the bytes without converting them to text.
This allows access to the actual bytes encoded in a binary QR code.

Closes: mchehab#55
Thanks: Martin von Wittich <martin.von.wittich@iserv.eu>
Signed-off-by: Matheus Afonso Martins Moreira <matheus.a.m.moreira@gmail.com>
matheusmoreira added a commit to matheusmoreira/zbar that referenced this issue Nov 5, 2019
If a QR code doesn't specify the text encoding of binary data through
Extended Channel Interpretation, ZBar tries to guess the encoding.
This unconditional character set conversion makes it impossible
to recover other types of binary data stored in the QR code.

The QR decoder now supports the ZBAR_CFG_BINARY configuration option.
If set, it will output the bytes without converting them to text.
This allows access to the actual bytes encoded in a binary QR code.

Closes: mchehab#55
Thanks: Martin von Wittich <martin.von.wittich@iserv.eu>
Signed-off-by: Matheus Afonso Martins Moreira <matheus.a.m.moreira@gmail.com>
matheusmoreira added a commit to matheusmoreira/zbar that referenced this issue Dec 11, 2019
If a QR code doesn't specify the text encoding of binary data through
Extended Channel Interpretation, ZBar tries to guess the encoding.
This unconditional character set conversion makes it impossible
to recover other types of binary data stored in the QR code.

The QR decoder now supports the ZBAR_CFG_BINARY configuration option.
If set, it will output the bytes without converting them to text.
This allows access to the actual bytes encoded in a binary QR code.

Closes: mchehab#55
Thanks: Martin von Wittich <martin.von.wittich@iserv.eu>
Signed-off-by: Matheus Afonso Martins Moreira <matheus.a.m.moreira@gmail.com>
matheusmoreira added a commit to matheusmoreira/zbar that referenced this issue Dec 26, 2019
If a QR code doesn't specify the text encoding of binary data through
Extended Channel Interpretation, ZBar tries to guess the encoding.
This unconditional character set conversion makes it impossible
to recover other types of binary data stored in the QR code.

The QR decoder now supports the ZBAR_CFG_BINARY configuration option.
If set, it will output the bytes without converting them to text.
This allows access to the actual bytes encoded in a binary QR code.

Closes: mchehab#55
Thanks: Martin von Wittich <martin.von.wittich@iserv.eu>
Signed-off-by: Matheus Afonso Martins Moreira <matheus.a.m.moreira@gmail.com>
@hifi
Copy link

hifi commented Feb 12, 2024

For anyone stumbling across this from search engines, you need to add -Sbinary option for zbarimg to get the fixed behavior that was released back in 0.23.1.

The man page was never updated to include that option so it's very easy to miss.

@matheusmoreira
Copy link
Contributor Author

matheusmoreira commented Feb 12, 2024

@hifi

That's true... I guess I forgot to update the manuals since I use the Arch Wiki so much. I edited the paperkey Arch Wiki article back in the day with instructions on how to use the binary decoding mode. It's the original use case I wanted this feature to enable. I also described the new options in the --help output and answered a stackoverflow question about this.

Anyone want to make a pull request?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants