UnicodeDecodeError #154

RWAP · 2016-08-17T14:14:07Z

I have connected a USB receipt printer to a BananaPi and then use Python to read the descriptors for the printer.

The code contains the following lines:

callString = callString + 'idVendor=' + padded_hex(int(cfg.idVendor),4) + ' ' + \
    'idProduct=' + padded_hex(int(cfg.idProduct),4) + ' ' + \
    'bcdDevice=' + str(cfg.bcdDevice) + ' ' + \
    'iManufacturer="' + str(usb.util.get_string(cfg,cfg.iManufacturer) ) + '" ' + \
    'iProduct="' + str(usb.util.get_string(cfg,cfg.iProduct) ) + '" ' + \
    'iSerialNum="' + str(usb.util.get_string(cfg,cfg.iSerialNumber) ) + '" ' +\
    'IPNPstring="'. + IPNPstring + '"'

Whilst my code works fine with a HP printer, with the USB receipt printer I get the error:

File "/g_printer_define.py", line 90, in
'IPNPstring="'. + IPNPstring + '"'
File "/usr/local/lib/python2.7/dist-packages/usb/util.py", line 330, in get_string
return buf[2:buf[0]].tostring().decode('utf-16-le')
File "/usr/lib/python2.7/encodings/utf_16_le.py", line 16, in decode
return codecs.utf+16+le+decode(input, errors, True)
UnicodeDecodeError: 'utf16' codec can't decode byte 0x31 in position 26: truncated data

I have asked the printer manufacturer for the details which their device passes for those strings, but should have thought that pyusb should be able to handle any characters permitted by the USB protocol?

The text was updated successfully, but these errors were encountered:

RWAP · 2016-08-18T11:55:38Z

The lsusb -v output is:

Bus 004 Device 002: ID 154f:0517 SNBC CO., Ltd 
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               1.10
  bDeviceClass            0 (Defined at Interface level)
  bDeviceSubClass         0 
  bDeviceProtocol         0 
  bMaxPacketSize0        64
  idVendor           0x154f SNBC CO., Ltd
  idProduct          0x0517 
  bcdDevice            1.f0
  iManufacturer           1 BEIYANG
  iProduct                2 BTP-R580(U)
  iSerial                 0 
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength           39
    bNumInterfaces          1
    bConfigurationValue     1
    iConfiguration          0 
    bmAttributes         0xc0
      Self Powered
    MaxPower              100mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           3
      bInterfaceClass         7 Printer
      bInterfaceSubClass      1 Printer
      bInterfaceProtocol      2 Bidirectional
      iInterface              0 
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x02  EP 2 OUT
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0040  1x 64 bytes
        bInterval               1
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x82  EP 2 IN
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0040  1x 64 bytes
        bInterval               1
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x85  EP 5 IN
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0040  1x 64 bytes
        bInterval               1
Device Status:     0x0001
  Self Powered

Some more testing shows the error happens when I call:
callString = callString + 'iProduct="' + str(usb.util.get_string(cfg,cfg.iProduct) ) + '" '

It now reports:
UnicodeDecodeError: 'utf16' codec can't decode byte 0x31 in position 22: truncated data

Very odd indeed!

RWAP · 2016-08-18T12:47:59Z

Hmm - that is interesting (although I might be clutching at straws), by changing the encoding in util.py to read:
return buf[2:buf[0]].tostring().decode('utf-8')

I can see that at the end of the iProduct identifier, there is a single byte - hex 31

My guess is that it is a mistake on the part of the manufacturer as the actual model is BTP-R580(U)II

Perhaps util.py needs to be able to pad out the string?

RWAP · 2016-08-18T13:16:45Z

After some testing, it is confirmed - this can be handled easily in util.py by changing the final lines from


    if hexversion >= 0x03020000:
        return buf[2:buf[0]].tobytes().decode('utf-16-le')
    else:
        return buf[2:buf[0]].tostring().decode('utf-16-le')

To read:

    if hexversion >= 0x03020000:
        return buf[2:buf[0]].tobytes().decode('utf-16-le')
    else:
        if buf[0]&1 != 0:
            descriptor = buf[2:buf[0]].tostring() + '\0'
            return descriptor.decode('utf-16-le')
        return buf[2:buf[0]].tostring().decode('utf-16-le')

Is it worth implementing this in the main code?

walac · 2016-08-18T14:53:54Z

I am not a unicode expert, but if I had to guess between a mistake on the python implementers and printer manufacturer, I would guess the manufacturer made a mistake. I don't feel comfortable implementing hacks to overcome a bug in a single case, as this may affect valid unicode strings I am not aware of.

RWAP · 2016-08-18T16:30:24Z

I have been speaking with the printer manufacturer about this and it will be interesting to see why the string has this extra character.

I cannot see how padding out the string can do any harm, as any odd length series of bytes will always break the decode('utf-16-le') call and force an error in any event (because every character is represented by 2 bytes in utf-16-le, or 4 bytes in utf-16 - so surely better returning some sort of valid string rather than throwing an error

I notice that when the buffer is fetched, there is a check to see if it is an even number of bytes for the whole buffer and an error is thrown if not.

Possibly there should be a flag to allow programmers to decide on how to deal with these rare cases of odd length buffers / values...

walac · 2016-08-18T20:35:47Z

I see a odd sized two byte string as a device bug, and I prefer the bug be fixed in the right place, if possible. But if you want to submit a pull request for a new function that handle theses cases, I think it is ok.

jonasmalacofilho · 2019-12-12T03:49:05Z

It's definitively a bug on the device side, but we could take the same approach as lsusb, which discards any odd trailing byte.

mcuee added the enhancement label Sep 19, 2018

jonasmalacofilho mentioned this issue Dec 12, 2019

Handle missing null terminator situation for device descriptors #204

Closed

jonasmalacofilho self-assigned this Dec 12, 2019

jonasmalacofilho closed this as completed in ca2433f Dec 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnicodeDecodeError #154

UnicodeDecodeError #154

RWAP commented Aug 17, 2016

RWAP commented Aug 18, 2016

RWAP commented Aug 18, 2016

RWAP commented Aug 18, 2016 •

edited

walac commented Aug 18, 2016

RWAP commented Aug 18, 2016 •

edited

walac commented Aug 18, 2016

jonasmalacofilho commented Dec 12, 2019 •

edited

UnicodeDecodeError #154

UnicodeDecodeError #154

Comments

RWAP commented Aug 17, 2016

RWAP commented Aug 18, 2016

RWAP commented Aug 18, 2016

RWAP commented Aug 18, 2016 • edited

walac commented Aug 18, 2016

RWAP commented Aug 18, 2016 • edited

walac commented Aug 18, 2016

jonasmalacofilho commented Dec 12, 2019 • edited

RWAP commented Aug 18, 2016 •

edited

RWAP commented Aug 18, 2016 •

edited

jonasmalacofilho commented Dec 12, 2019 •

edited