Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query_devices() name decoding problem (MME, DirectSound) #72

Closed
Takashi-T opened this issue Feb 1, 2017 · 17 comments
Closed

query_devices() name decoding problem (MME, DirectSound) #72

Takashi-T opened this issue Feb 1, 2017 · 17 comments

Comments

@Takashi-T
Copy link

Takashi-T commented Feb 1, 2017

query_devices() crashes:

    File "C:\Program Files\Anaconda3\lib\site-packages\sounddevice.py", line 712, in query_devices
    _lib.Pa_HostApiTypeIdToHostApiIndex(_lib.paMME)) else 'utf-8'),
   UnicodeDecodeError: 'mbcs' codec can't decode bytes in position 0--1: No mapping for the Unicode   character exists in the target code page.

Changing the "mbcs" at line 710 to "utf-8" fixed the problem in my environment.

OS: Windows 7 SP1 and Windows 10 (Japanese)
Python 3.5
sounddevice ver 0.3.6 (wheel from http://www.lfd.uci.edu/~gohlke/pythonlibs/)

@mgeier
Copy link
Member

mgeier commented Feb 2, 2017

Can you please provide a screenshot of the output of query_devices() after your fix?

Does the same error happen if you run the following in a terminal?

python3 -m sounddevice

I don't have a Windows system myself, so you'll have to help me out a bit.
I changed MME to use 'mbcs' because of what @raecke said in #30 (comment).
So are some MME devices using 'mbcs' and others 'utf-8'?

@Takashi-T
Copy link
Author

Takashi-T commented Feb 3, 2017

Thank you for looking into this.

Here is the screen shot of query_devices() call.
screenshot-querydevices

And this is the result of a run from the console (without the fix). The same Python error occurred.
run-from-console

As I understand, international customization of Windows used locally devised encoding scheme (in 1980s) For instance, in Japan, Microsoft used a proprietary 2-bytes coding scheme called shift-jis. Later, Microsoft adopted unicode as their internal string representation with tiny encoding translation layers elsewhere to/from legacy coding scheme to talk to old applications and drivers that only speak legacy one.
So, my guess is that there are some audio driver or some old portaudio implementation that is still speaking old local encoding strings. Unfortunately, I don't know how to identify the encoding in use from OS version etc.

@raecke
Copy link

raecke commented Feb 3, 2017

python -m sounddevice works with my computer without problems. My windows versionis Windows7.

>>>sounddevice.__version__
'0.3.6'
>>>sounddevice.get_portaudio_version()
(1246720, u'PortAudio V19.6.0-devel, revision 396fe4b6699ae929d3a685b3ef8a7e97396139a4')

I cannot find a it at the moment, but I once saw a commit in the portaudio sources that switched the device name encoding to UTF8 also for WMME and DSOUND.

@mgeier
Copy link
Member

mgeier commented Feb 3, 2017

Would it make sense to try it with 'mbcs' first and in case of a UnicodeDecodeError do it again with 'utf-8'?

@raecke
Copy link

raecke commented Feb 3, 2017

No, decoding UTF8 encoded data with MBCS does not raise an exception if MBCS is e.g. cp1252 (Germany). The result is just wrong.
First trying to decode with UTF8 and when this fails trying to decode with MBCS could work reasonably well.

>>> u"Gerät".encode('utf8').decode('mbcs')
u'Ger\xc3\xa4t'
>>> u"Gerät".encode('mbcs').decode('utf8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in position 3: unexpected end of data

@mgeier
Copy link
Member

mgeier commented Feb 3, 2017

@raecke OK, thanks for checking this out!

@taktot Can you please try the same thing on your system with a few kana? And probably some kanji as well, just to be sure ...

@raecke
Copy link

raecke commented Feb 3, 2017

I have found a discussion about devicenames in portaudio (ticket #224).

The change to convert them to UTF-8 was made at 2014-04-11, but only applies if the portaudio library is compiled with _UNICODE or UNICODE defined.

@Takashi-T
Copy link
Author

Takashi-T commented Feb 4, 2017

@mgeier I tried what @raecke did with some Japanese characters. I can make it fail both ways. I think it really depends whether the encoded bit pattern happens to be a legal pattern in another encoding.

Example 1: Failed both ways.

testmsg = u"サウンドデバイス音響装置さうんどでばいす"

testmsg.encode("utf-8").decode("mbcs")
UnicodeDecodeError Traceback (most recent call last)
in ()
----> 1 testmsg.encode("utf-8").decode("mbcs")

UnicodeDecodeError: 'mbcs' codec can't decode bytes in position 0--1: No mapping for the Unicode character exists in the target code page.

testmsg.encode("mbcs").decode("utf-8")
UnicodeDecodeError Traceback (most recent call last)
in ()
----> 1 testmsg.encode("mbcs").decode("utf-8")

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x83 in position 0: invalid start byte

Example 2. With lucky string, I can do utf-8 to mbcs without exception although the decoded
result does not make any sense.

testmsg = u"音"

testmsg.encode("utf-8").decode("mbcs")
Out[16]: '髻ウ'

testmsg.encode("mbcs").decode("utf-8")
UnicodeDecodeError Traceback (most recent call last)
in ()
----> 1 testmsg.encode("mbcs").decode("utf-8")

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte

@mgeier mgeier changed the title query_devices() 0.3.6 host API name decoding problem query_devices() name decoding problem (MME, DirectSound) Feb 5, 2017
@mgeier
Copy link
Member

mgeier commented Feb 5, 2017

@taktot Thanks for testing, that's good news! We only have a problem if 'mbcs' raises an exception and 'utf-8' doesn't.

I made a PR that relies on this behavior: #73.
Can you please check if it works for you?
Can you please try it with both the bundled DLL and the Gohlke DLL?

@raecke Thanks for finding the relevant PortAudio changes!

Do we know (from inside of the sounddevice module) if the library was compiled with UNICODE/_UNICODE or not?
I didn't specify anything like this in the bundled DLLs (https://github.com/spatialaudio/portaudio-binaries), but it may have been set by some other process ...
What about the Gohlke DLLs?

@raecke
Copy link

raecke commented Feb 5, 2017

What are the Gohlke DLLSs? I know the website with all the WHL files, but where are portaudio DLLS?

@mgeier
Copy link
Member

mgeier commented Feb 6, 2017

@raecke AFAIK the DLLs are not individually available for download, but you can simply unzip the WHL files to get at the respective DLLs.

@raecke
Copy link

raecke commented Feb 9, 2017

The portaudio_x86.dll found in sounddevice-0.3.6-cp27-cp27m-win32.whl is compiled with UNICODE. The bundled DLL is not compild with UNICODE.
The MME device names from portaudio_x86.dll are encoded in UTF-8.
The MME device names from the bundled DLL are encoded in MBCS.

I think there is no officially way to check if the DLL was compiled with UNICODE.

@raecke
Copy link

raecke commented Feb 9, 2017

The branch try-first-utf8-then-mbcs works with both DLLs as expected. (Windows7, 32-bit, Python 2.7)

@Takashi-T
Copy link
Author

The "try-first-utf8-then-mbcs" branch works fine also in my environment (Win10, 64bit, Python3.5).
Thank you for the fix.

@mgeier
Copy link
Member

mgeier commented Feb 10, 2017

@raecke Thanks a lot for checking this out! Should I try to define UNICODE/_UNICODE for future builds of the bundled DLLs? Or does it also have some disadvantages?

@taktot Thanks for testing! Did you have a chance to test it on both versions of the DLL?

@mgeier mgeier closed this as completed in e04dc1c Feb 12, 2017
@raecke
Copy link

raecke commented Feb 14, 2017

I am not sure if you should use UNICODE for the bundled DLLs.

  • I am not sure if the ASIO SDK has problems when compiling with UNICODE.
  • The Portaudio code might also have some problems. For example I think the error message for
    paUnanticipatedHostError generated by the function PaWinUtil_CoInitialize in
    pa_win_coinitialize.c will just be missing if compiled with UNICODE.

@mgeier
Copy link
Member

mgeier commented Feb 14, 2017

@raecke Thanks for the information!

I'll keep the DLLs unchanged for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants