Half-width katakana, a couple kanji, and a lot of revisions#151
Half-width katakana, a couple kanji, and a lot of revisions#151bluetoad07 wants to merge 9 commits intothe-moonwitch:mainfrom bluetoad07:various_jp_additions
Conversation
|
The rationale to include the Also @bluetoad07, did you include the |
|
Yes I included those. I think I copy-pasted the wrong list there. |
|
If you ever feel like doing a lot more, you could support JIS X 0208, which is the standard used by pretty much all legacy Japanese computers such as NEC PC-98 and IBM/MS DOS/V. Here is the list of JIS X 0208 Level 1 kanji: JIS X 0208 Level 2: Special characters, NEC (PC-98), and IBM (DOS/V) extensions: If you include all of these, you have all the kanji available in IBM/Microsoft codepage 932, which is the DOS/V and legacy Windows (3.x/9x) character set. I think we can consider that the exhaustive kanji set for legacy computers, as that was the dominent system from 1990 to 2000, and does include all the NEC PC-98 kanji as well. Of couse, that's a whole 6716 kanji to add… Edit: Forgot 1 kanji ( |
|
It'll be a long time from now but I'll try to support as many as I can eventually! Thanks for the lists! |
|
Made some small touch-ups to existing characters while this is still open. Mostly subtle changes, but う, ぅ, ゔ, ウ, ゥ, and ヴ all have new designs for consistency and clarity (the U sounds were bothering me today apparently.) |
|
Okay now I'm done. |
|
This also fixes カ/ガ being swapped in current releases, thanks :) |
|
A couple more consistency changes, I also redid a lot of the half-width katakana with better designs. I'm going to update the root comment shortly to show the new things. |
|
You forgot to include in your test grid the more recent foreign sounds available in the full-width Katakana block: ヴヵヶヷヸヹヺ ( |
|
Fixed! Thanks for pointing that out :) |
|
I went through all CP932 and realized I forgot 5 kanjis ( Also, here is the list of all the other characters that are currently missing from Cozette to complete the CP932 support: This means these characters and the 6720 kanjis from the three groups above would make all of CP932 supported. I already included So that should leave us with I also built a complete charmap of CP932 so it's easier to find out all characters: Edit: I included the large circle and circled digits and kanjis in my PR #144. |
|
Thank you!! The map makes it much easier to visualize what's left. In my next PR, which will be for the JLPT N5 kanji set, I will hopefully include all the remaining non-kanji you listed, as well as Also, is the ideographic space not already in the font? It doesn't show up in the changelog but it's functioned properly and appeared correctly in fontforge since my characters were first added. |
|
Yeah, sorry, I based my list on the Also, if you want more tools to help you with your enormous task, I just finished building a FIGlet font that contains all the CP932 characters from the original DOS/V using octants, this makes it very quick and easy to display any of the Japanese character within your terminal to check out how they were designed back then as 16×16 glyphs: You can get my |
|
Actually, I was wrong, there was only 1 extra kanji I finished examining the whole set of glyphs from DOS/V and can confirm this list contains them all. Everything matches perfectly! I suspect |
|
I'll add Edit: Now in my PR, I forgot to mention it in the commit, but it was bundled in |
|
I've been thinking about this CJK thing… Considering:
I think it could be smart to create a separate It would probably also make merging pull requests easier since someone working on kanjis would not be impacted by other PRs touching the main |
|
I like this idea! I don't exactly intend on supporting the full 6716 in JIS X 0208, partly because some kanji are definitely too dense to represent at this size and partly because I don't want to. But I am planning to do at least ~300, so this would help a lot with organization and minimizing merging issues. I do think some of the more common kanji should remain in the base font though, maybe just what I've added so far, what do you think? |
|
I'm pretty sure there are rules for strokes reductions that makes it possible to include almost all of them, and DOS/V included all of them as 16×16 glyphs with 1px on the left and 1px at the bottom that always seems to be clear for spacing, so they're 15×15, not that far off 11×12 if you make them as large as possible. It is common for kana and kanji to be larger than latin letters, so that could be acceptable to maximize the chances of being able to include them all someday. As for the number you're working on, 300 is already a nice number of glyphs to design. I'm more thinking about long term, for example if Japanese users figure they want them all, and 22 users contribute 300 glyphs each, you suddenly have the whole JIS X 0208 and need an adequate build system if you want to build lightweight versions of the font. It's easier to merge files than to prune out sets of characters that are in different Unicode blocks to reduce the set of characters. @slavfox will probably look at our discussions next week and have the final say in both design decisions and build system. |
|
BTW, I noticed the existing |
|
Microsoft "MS Gothic" at 12pt can display the whole set of JIS X 0208 kanjis as 11×11 pixels. You can You can follow the Note that font is copyrighted, this is just to show that if we pick that size, it's possible to achieve the complete JIS X 0208 with some efforts. |
|
Yeah, I noticed that with the units too. I will redesign that when I do those. By the way, thank you very much for your help with this!! I am still early on in learning Japanese and your knowledge on both the language and its codecs is incredibly useful. Do you know where I can find more info on kanji stroke reduction? I can't seem to find much on it. Edit: You answered my question as I typed it, thanks! |
|
Unfortunately, I don't have any document on strokes reduction, I only saw it mentioned on the Wikipedia page about Meiryo. MS Gothic and MS Mincho were the original Windows 3.1 Japanese fonts, and because of the low resolution of screens when they were designed, they used embedded bitmaps and apparently applied strokes reduction as part of their TrueType hinting (guides to displace pixels to where they look better instead of where they should be according to the original character shape) to stay legible with a limited number of pixels. Meiryo on the other hand is a newer font that takes advantage of higher resolution and ClearType (Microsoft sub-pixel rendering), so it doesn't contain embedded bitmaps and doesn't adjust the strokes for lower resolutions. I guess one of the ways to find out about which strokes are ok to simplify would be to experiment with MS Gothic at several resolutions and see how each kanji behaves when it reaches the embedded bitmaps. I wonder if there is a public domain font that contains similarly optimized 11×11 embedded bitmaps. Maybe it is worth investigating Kochi Gothic/Mincho. |
|
I did some more tests and managed to merge the Shinonome12/東雲12 public domain font into Cozette. It definitely looks better than whatever I could work on to try to bring 6355 kanji to Cozette, and adding the 361 missing NEC+IBM extensions in the same style to achieve full CP932 compatibility seems like a reachable goal. Another candidate would be Naga10 as 10×10 might better fit Cozette latin characters size, but that's yet another format, I'll have a look when I have some free time. Also, looking at other fonts, Kanji often take almost the whole square. I agree with your observation about Kanji vs Kana sizes, but Kana are also often larger than latin alphanumeric. So maybe the sweet spot to keep the best legibility would be to use the 11×11 Kanji and make the Kana slightly larger to be between latin and Kanji. |
|
I considered making kanji bigger, as most fonts I've seen have them around 10% bigger than kana glyphs. But at 11px, they would be 22% bigger, and I find the size difference somewhat jarring. I've tried 10px too, but that puts them off center and looks odd as well, so I ultimately decided against it. Luckily, today in my own quests for a 9px font to use as reference, I ran across this stack exchange post which mentions a 9px font used in Etrian Odyssey on the DS, which is apparently easily understood, especially in comparison to slightly smaller sizes. I unfortunately cannot track this font down, but it's still useful in reassuring that it's actually possible to make a full readable set at this size.
The other fonts you've mentioned will be very helpful in learning stroke reduction patterns and techniques for making efficient use of the space I have, but I definitely want to stick with 9x9 glyphs. Thank you! |
|
I completely agree. Naga10's original web site at http://www.vector.co.jp/authors/VA013391/ is not online anymore. I believe that Naga10 was probably used as the base or inspiration for the font in your screenshot, but isn't the same font. Some kanji are identical ( We could even support both Naga10 or an improved version of it and Shinonome12 as options, as I believe Shinonome12 will stay more legible for people less used to some characters. With my idea of having kanjis in separate files we can merge, this could be easily achieved, and the work to be able to import Shinonome12 is already done, I have a file ready. Keep in mind old font might be public domains or free software because back in the early days of the Internet, users wanted information to be freely available and would contribute to such project to make life better for everyone, but a font used in a DS game might be from a commercial company and be unusable because of licensing terms. I believe Naga10 license could work for Cozette, here's a translation of its terms: I'm not well versed in open source licensing details though, so it would be good to have someone more knowledgeable having a look at it. Anyway, this is all interesting and good news in general, I didn't even think we'd be able to support all the 6355 JIS Lvl1&2 Kanjis and now I have one design working and probably soon another size option. At least we'll be able to experiment and pick the best size from the start and get on a good track if designing specific glyphs. |
|
Hi! Is this ready for me to start merging? |
|
Yep! Thank you! |
|
Merged manually and releasing now, thank you! |






I actually saved the changes this time :]
Anyways, more JP stuff from me!
This PR includes:
Here's a map with all the updated kana designs, as well as the new kanji:

And the new half-width katakana:

The kanji were surprisingly fun to make, and given how derivative they are of each other I will absolutely do a lot more soon!
Thanks! <3