Remove Non-ASCII Character Clobbering #324

NoSuck · 2021-03-11T14:07:34Z

IT.TXT even explicitly mentions the ability to use characters that libxmp currently converts to periods for some reason:

3) A couple of people have asked about ASCII characters > 128. I'm
   sorry - I can't let you have 'em on anywhere else except the message
   editor.

Furthermore, this comment only pertains to the interface of IT.EXE and not the IT spec. In other words, instrument names and other strings should not care about character encodings (but I will leave those fixes for other pull requests).

IT.TXT even explicitly mentions the ability to use characters that libxmp currently converts to periods for some reason: 3) A couple of people have asked about ASCII characters > 128. I'm sorry - I can't let you have 'em on anywhere else except the message editor. Furthermore, this comment only pertains to the interface of IT.EXE and not the IT spec. In other words, instrument names and other strings should not care about character encodings (but I will leave those fixes for other pull requests).

NoSuck · 2021-03-11T14:22:48Z

... I will leave those fixes for other pull requests.

Libxmp doesn't touch 'em, so nothing needs to be done. 👍

AliceLR · 2021-03-17T01:28:13Z

I don't personally see an issue with allowing characters >127 but IMO control codes should probably still be filtered out unless there's a good reason for them. With -DDEBUG enabled said control codes will be printed out, causing terminal beeps and potentially other unwanted side effects.

NoSuck · 2021-03-23T02:11:49Z

The song message is an encoding-independent array of bytes. We are unable to assume what constitutes a control code.

AliceLR · 2021-03-23T08:00:15Z

I don't really have the desire to argue about what is and isn't a control code since that is well-defined for most platforms libxmp supports. Intentionally providing unsanitized strings to the API (and console) by default seems in bad taste to me, though. Deferring this one to @cmatsuoka.

sezero · 2021-03-23T08:34:10Z

I agree with @AliceLR

NoSuck · 2021-03-24T09:11:11Z

I don't really have the desire to argue about what is and isn't a control code since that is well-defined for most platforms libxmp supports.

This illustrates a fundamental misunderstanding of the issue. Perhaps I failed to communicate well.

Whether or not any given byte represents a control code is constituted by the character encoding, not the platform.

cmatsuoka · 2021-03-29T20:20:50Z

In an ideal world it would be great if we could convert the platform-specific characters to something similar in the running system so we could show messages as the original author intended, but dealing with encodings, especially inside the library, is not worth the pain. I would add proper message display to the next generation module player wishlist and leave libxmp as is for now.

One possible way to implement it in libxmp-ng could be to have an "original platform" attribute in formats and offer a way to retrieve sanitized or raw bytes for each displayable piece of data, so that the application using the library could convert and display it (but for that we would also need a better API that doesn't expose structure fields directly.)

cmatsuoka · 2021-03-29T20:37:37Z

Regarding the specific usage with IT modules, I see the specs allow it but I think the actual output the user would see (typically utf-8) would not match what the author intended. Also I agree that control characters for the host system should be filtered out to prevent it to mess with terminal output.

What encoding is Windows using these days?

@sagamusix How does OpenMPT deal with characters > 127 in the module message?

sagamusix · 2021-03-29T21:51:50Z

OpenMPT itself currently doesn't do any codepage conversion for a number of reasons. libopenmpt however does faithful codepage conversion to UTF-8 (for some formats it's trivial, for other formats it requires guesswork based on which tracker was used to save the file, etc...). Its library interface has used UTF-8 since day 1.

zzo38 · 2022-08-08T20:21:10Z

I think that it would be better to expose the raw text without conversion, and to include a function to request the IBM code page number of the text (or zero if it is unknown).

(I, personally, would rather to deal with the raw text directly and not deal with UTF-8.)

I think that the command-line interface should include filtering out control codes, but the library should not do so.

sezero · 2023-06-20T23:20:56Z

This looks like a WONTFIX, yes?

sagamusix · 2023-06-23T17:38:57Z

Not sure why it needs to be rejected - if this sort of "fixup" was only done for IT files, I see no reason to keep it, because arguably with all other file formats it will already be possible to retrieve text with unknown encoding through the API. If libxmp wanted to make sure that no extended characters with unknown encoding are provided through its API, this should be applied in a central place for all formats, not just for IT.

AliceLR mentioned this pull request Jan 28, 2022

libxmp 4.6 API changes? #560

Open

sezero closed this Jun 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove Non-ASCII Character Clobbering #324

Remove Non-ASCII Character Clobbering #324

NoSuck commented Mar 11, 2021

NoSuck commented Mar 11, 2021

AliceLR commented Mar 17, 2021

NoSuck commented Mar 23, 2021

AliceLR commented Mar 23, 2021

sezero commented Mar 23, 2021

NoSuck commented Mar 24, 2021

cmatsuoka commented Mar 29, 2021 •

edited

Loading

cmatsuoka commented Mar 29, 2021 •

edited

Loading

sagamusix commented Mar 29, 2021

zzo38 commented Aug 8, 2022

sezero commented Jun 20, 2023

sagamusix commented Jun 23, 2023

Remove Non-ASCII Character Clobbering #324

Remove Non-ASCII Character Clobbering #324

Conversation

NoSuck commented Mar 11, 2021

NoSuck commented Mar 11, 2021

AliceLR commented Mar 17, 2021

NoSuck commented Mar 23, 2021

AliceLR commented Mar 23, 2021

sezero commented Mar 23, 2021

NoSuck commented Mar 24, 2021

cmatsuoka commented Mar 29, 2021 • edited Loading

cmatsuoka commented Mar 29, 2021 • edited Loading

sagamusix commented Mar 29, 2021

zzo38 commented Aug 8, 2022

sezero commented Jun 20, 2023

sagamusix commented Jun 23, 2023

cmatsuoka commented Mar 29, 2021 •

edited

Loading

cmatsuoka commented Mar 29, 2021 •

edited

Loading