Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError #154

Closed
RWAP opened this issue Aug 17, 2016 · 7 comments
Closed

UnicodeDecodeError #154

RWAP opened this issue Aug 17, 2016 · 7 comments
Assignees

Comments

@RWAP
Copy link

RWAP commented Aug 17, 2016

I have connected a USB receipt printer to a BananaPi and then use Python to read the descriptors for the printer.

The code contains the following lines:

callString = callString + 'idVendor=' + padded_hex(int(cfg.idVendor),4) + ' ' + \
    'idProduct=' + padded_hex(int(cfg.idProduct),4) + ' ' + \
    'bcdDevice=' + str(cfg.bcdDevice) + ' ' + \
    'iManufacturer="' + str(usb.util.get_string(cfg,cfg.iManufacturer) ) + '" ' + \
    'iProduct="' + str(usb.util.get_string(cfg,cfg.iProduct) ) + '" ' + \
    'iSerialNum="' + str(usb.util.get_string(cfg,cfg.iSerialNumber) ) + '" ' +\
    'IPNPstring="'. + IPNPstring + '"'

Whilst my code works fine with a HP printer, with the USB receipt printer I get the error:

File "/g_printer_define.py", line 90, in
'IPNPstring="'. + IPNPstring + '"'
File "/usr/local/lib/python2.7/dist-packages/usb/util.py", line 330, in get_string
return buf[2:buf[0]].tostring().decode('utf-16-le')
File "/usr/lib/python2.7/encodings/utf_16_le.py", line 16, in decode
return codecs.utf+16+le+decode(input, errors, True)
UnicodeDecodeError: 'utf16' codec can't decode byte 0x31 in position 26: truncated data

I have asked the printer manufacturer for the details which their device passes for those strings, but should have thought that pyusb should be able to handle any characters permitted by the USB protocol?

@RWAP
Copy link
Author

RWAP commented Aug 18, 2016

The lsusb -v output is:

Bus 004 Device 002: ID 154f:0517 SNBC CO., Ltd 
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               1.10
  bDeviceClass            0 (Defined at Interface level)
  bDeviceSubClass         0 
  bDeviceProtocol         0 
  bMaxPacketSize0        64
  idVendor           0x154f SNBC CO., Ltd
  idProduct          0x0517 
  bcdDevice            1.f0
  iManufacturer           1 BEIYANG
  iProduct                2 BTP-R580(U)
  iSerial                 0 
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength           39
    bNumInterfaces          1
    bConfigurationValue     1
    iConfiguration          0 
    bmAttributes         0xc0
      Self Powered
    MaxPower              100mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           3
      bInterfaceClass         7 Printer
      bInterfaceSubClass      1 Printer
      bInterfaceProtocol      2 Bidirectional
      iInterface              0 
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x02  EP 2 OUT
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0040  1x 64 bytes
        bInterval               1
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x82  EP 2 IN
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0040  1x 64 bytes
        bInterval               1
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x85  EP 5 IN
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0040  1x 64 bytes
        bInterval               1
Device Status:     0x0001
  Self Powered

Some more testing shows the error happens when I call:
callString = callString + 'iProduct="' + str(usb.util.get_string(cfg,cfg.iProduct) ) + '" '

It now reports:
UnicodeDecodeError: 'utf16' codec can't decode byte 0x31 in position 22: truncated data

Very odd indeed!

@RWAP
Copy link
Author

RWAP commented Aug 18, 2016

Hmm - that is interesting (although I might be clutching at straws), by changing the encoding in util.py to read:
return buf[2:buf[0]].tostring().decode('utf-8')

I can see that at the end of the iProduct identifier, there is a single byte - hex 31

My guess is that it is a mistake on the part of the manufacturer as the actual model is BTP-R580(U)II

Perhaps util.py needs to be able to pad out the string?

@RWAP
Copy link
Author

RWAP commented Aug 18, 2016

After some testing, it is confirmed - this can be handled easily in util.py by changing the final lines from


    if hexversion >= 0x03020000:
        return buf[2:buf[0]].tobytes().decode('utf-16-le')
    else:
        return buf[2:buf[0]].tostring().decode('utf-16-le')

To read:

    if hexversion >= 0x03020000:
        return buf[2:buf[0]].tobytes().decode('utf-16-le')
    else:
        if buf[0]&1 != 0:
            descriptor = buf[2:buf[0]].tostring() + '\0'
            return descriptor.decode('utf-16-le')
        return buf[2:buf[0]].tostring().decode('utf-16-le')

Is it worth implementing this in the main code?

@walac
Copy link
Member

walac commented Aug 18, 2016

I am not a unicode expert, but if I had to guess between a mistake on the python implementers and printer manufacturer, I would guess the manufacturer made a mistake. I don't feel comfortable implementing hacks to overcome a bug in a single case, as this may affect valid unicode strings I am not aware of.

@RWAP
Copy link
Author

RWAP commented Aug 18, 2016

I have been speaking with the printer manufacturer about this and it will be interesting to see why the string has this extra character.

I cannot see how padding out the string can do any harm, as any odd length series of bytes will always break the decode('utf-16-le') call and force an error in any event (because every character is represented by 2 bytes in utf-16-le, or 4 bytes in utf-16 - so surely better returning some sort of valid string rather than throwing an error

I notice that when the buffer is fetched, there is a check to see if it is an even number of bytes for the whole buffer and an error is thrown if not.

Possibly there should be a flag to allow programmers to decide on how to deal with these rare cases of odd length buffers / values...

@walac
Copy link
Member

walac commented Aug 18, 2016

I see a odd sized two byte string as a device bug, and I prefer the bug be fixed in the right place, if possible. But if you want to submit a pull request for a new function that handle theses cases, I think it is ok.

@jonasmalacofilho
Copy link
Member

jonasmalacofilho commented Dec 12, 2019

It's definitively a bug on the device side, but we could take the same approach as lsusb, which discards any odd trailing byte.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants