-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove Non-ASCII Character Clobbering #324
Conversation
IT.TXT even explicitly mentions the ability to use characters that libxmp currently converts to periods for some reason: 3) A couple of people have asked about ASCII characters > 128. I'm sorry - I can't let you have 'em on anywhere else except the message editor. Furthermore, this comment only pertains to the interface of IT.EXE and not the IT spec. In other words, instrument names and other strings should not care about character encodings (but I will leave those fixes for other pull requests).
Libxmp doesn't touch 'em, so nothing needs to be done. 👍 |
I don't personally see an issue with allowing characters >127 but IMO control codes should probably still be filtered out unless there's a good reason for them. With -DDEBUG enabled said control codes will be printed out, causing terminal beeps and potentially other unwanted side effects. |
The song message is an encoding-independent array of bytes. We are unable to assume what constitutes a control code. |
I don't really have the desire to argue about what is and isn't a control code since that is well-defined for most platforms libxmp supports. Intentionally providing unsanitized strings to the API (and console) by default seems in bad taste to me, though. Deferring this one to @cmatsuoka. |
I agree with @AliceLR |
This illustrates a fundamental misunderstanding of the issue. Perhaps I failed to communicate well. Whether or not any given byte represents a control code is constituted by the character encoding, not the platform. |
In an ideal world it would be great if we could convert the platform-specific characters to something similar in the running system so we could show messages as the original author intended, but dealing with encodings, especially inside the library, is not worth the pain. I would add proper message display to the next generation module player wishlist and leave libxmp as is for now. One possible way to implement it in libxmp-ng could be to have an "original platform" attribute in formats and offer a way to retrieve sanitized or raw bytes for each displayable piece of data, so that the application using the library could convert and display it (but for that we would also need a better API that doesn't expose structure fields directly.) |
Regarding the specific usage with IT modules, I see the specs allow it but I think the actual output the user would see (typically utf-8) would not match what the author intended. Also I agree that control characters for the host system should be filtered out to prevent it to mess with terminal output. What encoding is Windows using these days? @sagamusix How does OpenMPT deal with characters > 127 in the module message? |
OpenMPT itself currently doesn't do any codepage conversion for a number of reasons. libopenmpt however does faithful codepage conversion to UTF-8 (for some formats it's trivial, for other formats it requires guesswork based on which tracker was used to save the file, etc...). Its library interface has used UTF-8 since day 1. |
I think that it would be better to expose the raw text without conversion, and to include a function to request the IBM code page number of the text (or zero if it is unknown). (I, personally, would rather to deal with the raw text directly and not deal with UTF-8.) I think that the command-line interface should include filtering out control codes, but the library should not do so. |
This looks like a WONTFIX, yes? |
Not sure why it needs to be rejected - if this sort of "fixup" was only done for IT files, I see no reason to keep it, because arguably with all other file formats it will already be possible to retrieve text with unknown encoding through the API. If libxmp wanted to make sure that no extended characters with unknown encoding are provided through its API, this should be applied in a central place for all formats, not just for IT. |
IT.TXT even explicitly mentions the ability to use characters that libxmp currently converts to periods for some reason:
Furthermore, this comment only pertains to the interface of IT.EXE and not the IT spec. In other words, instrument names and other strings should not care about character encodings (but I will leave those fixes for other pull requests).