Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export to mid/wav fails when file/directory tree contains diacritics or Cyrillic characters #1957

Open
msjasinski opened this issue Mar 25, 2024 · 14 comments

Comments

@msjasinski
Copy link

msjasinski commented Mar 25, 2024

Hydrogen version * : 1.2.3
Operating system + version : Windows 11 64bit (and earlier versions too)
Audio driver + version : PortAudio


Export to mid or wav files fails if the filename/directory structure (e.g. D:\Hydrogen\emptySong.wav) contains letters/characters such as ążźćśł (Polish) or фыва (Russian). On the other hand, these characters work fine in .h2song files.

@theGreatWhiteShark
Copy link
Contributor

Hey @msjasinski ,

Thanks for reporting!

Exporting the song to Lilypond or the current drumkit does not work either. But only on Windows. On Linux everything works fine.

I'll have a look.

@theGreatWhiteShark
Copy link
Contributor

Could you check whether you are able to export MIDI or WAV files using this version of Hydrogen?

Drumkit import and export, however, will still not work with Cyrillic scripts or the Polish additions to the Latin alphabet in either filename or parent folders. That's a limitation of the compression library we use.

@msjasinski
Copy link
Author

Hello @theGreatWhiteShark !
I tested it thoroughly and it works!!! Thank you very much!
Keep up the good work!
All the best

@msjasinski
Copy link
Author

A related bug - when trying to open Hydrogen files (.h2song), containing characters as described above, from File Explorer (or similar - but not from Hydrogen open menu), I still get an error message
hydrogen1_31gg8sugTI

@theGreatWhiteShark
Copy link
Contributor

A related bug - when trying to open Hydrogen files (.h2song), containing characters as described above, from File Explorer (or similar - but not from Hydrogen open menu), I still get an error message

Puh. That's a hard one. I can reproduce it but I am afraid I can not fix it.

The encoding bugs within Hydrogen I was able to fix by enforcing UTF8 encoding and relying on Qt's - the framework we use - builtin functions for file interaction. It probably uses the UTF16 versions of the Windows API and everything works fine.

But arguments passed to the application during startup seem to be more difficult to handle. There QT uses the system encoding with seemingly no way to overwrite this behavior. But since both your and mine encoding is set wrong and does not allow for Cyrillic characters, Hydrogen only receives a messed up path with all non Latin-1 characters being lost and without any way to determine the songs original location.

(Why the encoding is off after installing the language kit and being able to set keyboard to e.g. Russian and write Cyrillic letters? No idea. I'm not a Windows person. But from the perspective of Hydrogen Windows is telling us that it does not support these characters.)

@theGreatWhiteShark
Copy link
Contributor

@msjasinski could you do me a favor and install this version of Hydrogen and attach the log messages?

I patched it to report the system's encoding. Just to be entirely sure we are talking about the same issue here.

@elpescado
Copy link
Contributor

Drumkit import and export, however, will still not work with Cyrillic scripts or the Polish additions to the Latin alphabet in either filename or parent folders. That's a limitation of the compression library we use.

That shortcoming of libarchive might already have been addressed:

libarchive/libarchive#2016

@elpescado
Copy link
Contributor

Alternatively, maybe using archive_read_open_fd with fd obtained from _wopen on Windows instead of archive_read_open_filename would work on Windows?

@theGreatWhiteShark
Copy link
Contributor

That shortcoming of libarchive might already have been addressed:

libarchive/libarchive#2016

Hmm. I'm not sure. Within the PR they stated the patch is only affecting native Windows builds. But we ship a version obtained from the MSYS2 repos. I don't know much about our Windows toolchain or libarchive in particular but I wouldn't be surprised if the library was configured to use the POSIX interface provided by MSYS instead of the underlying Windows API.

Alternatively, maybe using archive_read_open_fd with fd obtained from _wopen on Windows instead of archive_read_open_filename would work on Windows?

I thought about this too but decided not to implement it. I'm just not familiar enough with stability and backward compatibility of the Windows API, possible friction when putting it next to MSYS2 code etc. Handling archives is such a vital part of Hydrogen that I'm a little afraid to break things for Windows users. Especially since I am not using this OS.

I read this document: https://github.com/libarchive/libarchive/wiki/Filenames#the-problem and got the impression UTF-8 support is not yet "solved" in libarchive. But I get that this is an important topic for some users and I will have another look (and come up with at least a workaround).

@theGreatWhiteShark
Copy link
Contributor

A related bug - when trying to open Hydrogen files (.h2song), containing characters as described above, from File Explorer (or similar - but not from Hydrogen open menu), I still get an error message

@msjasinski I added a wiki page on how to fix this issue by tweaking the Windows settings.

@elpescado
Copy link
Contributor

Hmm. I'm not sure. Within the PR they stated the patch is only affecting native Windows builds. But we ship a version obtained from the MSYS2 repos. I don't know much about our Windows toolchain or libarchive in particular but I wouldn't be surprised if the library was configured to use the POSIX interface provided by MSYS instead of the underlying Windows API.

It's been a while since I've used Windows, but I was under impression that MSYS is a collection of POSIX shell utilities (bash, fileutils etc), but the actual compiler is MinGW, i.e. the "native" Windows build of GCC that links with msvcrt, as opposed to Cygwin a.k.a. "POSIX-on-Windows GCC". But I might be wrong, GCC on Windows is ultra-confusing.

@theGreatWhiteShark
Copy link
Contributor

I took a look at the source code of libarchive and things are way more easy than I thought. They have dedicated methods for UTF-16 Windows API calls, like archive_write_open_filename_w. I wasn't aware of them previously as I used the man pages they linked on their official github page for reference. But it seems these are generated on FreeBSD and all the Windows-specific stuff was removed by #ifdefs. How inconvenient!

I'll rewrite import/export using these functions. Import with Cyrillic characters in drumkit path already works.

@msjasinski
Copy link
Author

msjasinski commented Apr 8, 2024

@msjasinski could you do me a favor and install this version of Hydrogen and attach the log messages?

I patched it to report the system's encoding. Just to be entirely sure we are talking about the same issue here.

Here it is:
log.txt

Same if directory has no diacritics but filename does have them.
If case of no diacritics, the file loads OK.

@theGreatWhiteShark
Copy link
Contributor

Here it is:
log.txt

Same if directory has no diacritics but filename does have them.
If case of no diacritics, the file loads OK.

👍🏿 Nice. That's exactly how things are on my local machine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants