Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lsmd: decode Shift JIS from halfwidth titles #55

Merged
merged 1 commit into from Mar 27, 2020

Conversation

@mistydemeo
Copy link
Contributor

mistydemeo commented Mar 20, 2020

Japanese minidiscs can contain single-byte characters in this field. I found this on a disc that had track titles written exclusively using single-byte katakana. That means it's not safe to print this field without decoding it, too.

Japanese minidiscs can contain single-byte characters in this field.
I found this on a disc that had track titles written exclusively
using single-byte katakana. That means it's not safe to print this
field without decoding it, too.
@thp
thp approved these changes Mar 20, 2020
Copy link
Collaborator

thp left a comment

Looks good to me, but I wasn't able to test it. Would I be able to test it e.g. by pasting certain text into SonicStage to rename a track on a normal MiniDisc and then try to list the tracks with lsmd?

@mistydemeo

This comment has been minimized.

Copy link
Contributor Author

mistydemeo commented Mar 20, 2020

Good question. I was using an existing disc instead of creating one for it.

cc @ticky, who may know how to safely create a disc that can exercise this!

@ticky

This comment has been minimized.

Copy link
Contributor

ticky commented Mar 20, 2020

As with #56, I think it currently requires you to either have SonicStage running in Japanese, or to paste text converted into JIS into netmdcli. I’m hoping to get to look into getting JIS conversion working for setting titles with netmdcli soon and then both of these should be easily testable!

@ticky

This comment has been minimized.

Copy link
Contributor

ticky commented Mar 20, 2020

Oh, and for what it’s worth this was tested with the same player and discs as #56 😄

@ticky

This comment has been minimized.

Copy link
Contributor

ticky commented Mar 23, 2020

Okay, came up with a way to test: if your system has iconv installed (it almost certainly does if you’re running on a unix-like!), you should be able to run this to get a valid JIS_X0201 string to write:

echo "ネコ" | iconv -t JIS_X0201

(Noting that JIS_X0201 can only handle halfwidth katakana characters, so if you want to come up with your own track title string you’ll need to look those up! This just says “cat” 🐈)

Then, you can combine that with netmdcli (you don’t need #56 for this to work, though note that you will be seeing garbled text from netmdcli afterwards without it) to do something like this:

netmdcli retitle 0 "$(echo "ネコ" | iconv -t JIS_X0201)"

Which will set the first track title to a JIS_X0201 representation of that string. Thereafter you will see that the lsmd track listing shows you something like this as the track title:

Ⱥ

Which is not what we want! Conferring with this patched version, which should show the original ネコ as expected!

@ticky

This comment has been minimized.

Copy link
Contributor

ticky commented Mar 23, 2020

It’s now also possible to write halfwidth katakana titles using the patched version from #56 😃

@ticky

This comment has been minimized.

Copy link
Contributor

ticky commented Mar 23, 2020

Oh, one note on this is I discovered it’s missing a complementary update to the disc title reporting @mistydemeo!

Traceback (most recent call last):
  File "netmd/lsmd.py", line 82, in <module>
    show_uuids=options.uuids)
  File "netmd/lsmd.py", line 9, in main
    listMD(md, show_uuids)
  File "netmd/lsmd.py", line 43, in listMD
    md_iface.getDiscTitle(True).decode('shift_jis_2004'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc8 in position 22: ordinal not in range(128)
@thp

This comment has been minimized.

Copy link
Collaborator

thp commented Mar 23, 2020

Here's some related info (might be outdated):
http://www.minidisc.org/brian_youn/mzn1/page2.html#kanjisupport

@ticky

This comment has been minimized.

Copy link
Contributor

ticky commented Mar 23, 2020

Thanks! I just worked out last night how the multi-byte encoding works, and it looks like it goes unused in libnetmd. It looks like you can opt to switch encoding from JIS X0201 to Shift JIS, but Shift JIS is supported only by Japanese players and the specifically kanji-capable remote control models - thus, Shift JIS wise katakana titles work on my RM-MC35ELK, but don’t work on my US MZ-M10. I’ll look to expand my PR to handle them both.

This also means that if anyone’s written titles using the wide character support in SonicStage, current versions of netmdcli ignore the titles entirely!

@thp

This comment has been minimized.

Copy link
Collaborator

thp commented Mar 23, 2020

I've tried creating a Python 3 branch in #60, maybe you can implement your changes on top of that? If you grep for shift_jis in netmd/ you can already find some mentions of it in dump_md.py and upload.py. It looks like it would make more sense for getTrackTitle() and getDiscTitle() to return proper objects and not move the responsibility to the caller (in fact, both have a boolean wchar argument already).

My proposal:

  • If wchar=True, try to decode (Shift JIS?) and return a Python 3 str (Unicode string)
  • If wchar=False, return a normal Python 3 bytes object (Byte string)

This way, all code that calls getTrackTitle() and getDiscTitle() would behave the same way.

@mistydemeo

This comment has been minimized.

Copy link
Contributor Author

mistydemeo commented Mar 23, 2020

If wchar=False, return a normal Python 3 bytes object (Byte string)

Non widechar strings are still JIS X 0201, an ASCII superset that contains katakana. It still needs to be decoded.

@thp

This comment has been minimized.

Copy link
Collaborator

thp commented Mar 23, 2020

Okay, came up with a way to test: if your system has iconv installed (it almost certainly does if you’re running on a unix-like!), you should be able to run this to get a valid JIS_X0201 string to write: [ ...]

I was able to test it on the first track of a normal MD using:

netmdcli/netmdcli rename 0 "$(echo "ネコ" | iconv -t Shift_JIS)" 

Note that retitle changes group names, rename changes track names.

I was using an MZ-RH1 and the RM-MC38EL that came with it showed the correct characters (at least they "look" similar to the ones in the echo command).

I can also confirm that with the current "master" branch, Ⱥ is what gets erroneously shown when just running netmdcli/netmdcli.

@thp

This comment has been minimized.

Copy link
Collaborator

thp commented Mar 23, 2020

Furthermore, when putting the titled disc into an MZ-R909 and using an RM-MC11EL remote, both the device's display and the remote show the title the same way, so Shift JIS seems to work fine for at least those specific characters (but might be that there's some overlap between different JIS encodings, so not sure which JIS encoding is the "correct" one).

@ticky

This comment has been minimized.

Copy link
Contributor

ticky commented Mar 23, 2020

Yep, for half-width katakana like the string I pasted for you it’ll likely work everywhere; those are present at the same code points across both JIS X0201 and Shift_JIS.

@mistydemeo

This comment has been minimized.

Copy link
Contributor Author

mistydemeo commented Mar 23, 2020

Shift JIS is a superset of JIS X 0201. All single-byte codepoints in JIS X 0201 are the same in Shift JIS.

@thp thp merged commit 20c6c57 into linux-minidisc:master Mar 27, 2020
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.