-
-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to change text encoding post-parse #27
Comments
Just a btw i'm one of the clone hero developers that we were talking on discord a while back, for which I also messaged this basic idea in that group message @melanchall |
Hi Matthew, Do you need to have ability to change encoding somewhere in the future after file is parsed or you need to try different encodings during file reading? For first case I suggest just get bytes from string and convert it to string using different encodings. Then you can set For second case I suggest to introduce some callback for text parsing to Please describe your task in more details so we can come to best solution. Max |
@melanchall The issue with doing that is that is that decoding text from bytes into a string is not a lossless process. For example if you have a latin1 encoding decoded as UTF-8 or ASCII unsupported characters will be replaced with alternative codepoints in those encoding ranges. |
I don't really specifically need access to the text parsing stage of decoding as mentioned in option 2, that wouldn't really work too well in how the application is setup. Having access to change encoding after the file is parsed would be most useful like i mentioned originally. Basically what I'm doing is parsing lyrics from midi files, and some of these lyrics are in various different encodings. Mainly UTF-8 and ISO-8859-1. Currently i'm checking if "U+FFFD" is produced and re trying parsing that phrase as a different encoding through a tweak in ReadContent but i'd like to avoid this hack if i can. |
I bow to option 2. My arguments:
When I'm designing API I should always think about different cases and most versatile solution. Introducing something like "encoding" in MIDI file creates more troubles than profit. As I mentioned in option 2 Also conversion between different encodings is lossless in terms of bytes. At now you can just use this code without modifying the library code: var originalEncoding = readingSettings.TextEncoding ?? Encoding.ASCII;
var bytes = originalEncoding.GetBytes(textEvent.Text);
// convert bytes to any encoding and set Text property of event Not good but will work. So it seems implementing text parsing callback is a best option. You wrote:
Can you explain why it wouldn't really work too well? |
@mdsitton So is it good to have text parsing callback? |
I've added |
So right now if you need to try and handle multiple midi files of various text encodings at runtime you are somewhat out of luck and would need to continuously re-parse the whole midi until the correct encoding is found.
Going from this:
to something like this (this isnt 100% working code but just something to get an idea of what i'm saying here)
It would need some other api changes perhaps with how settings and/or encodings are handled.
The text was updated successfully, but these errors were encountered: