Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assigning a unicode code #30

Open
philipt18 opened this issue Mar 16, 2016 · 22 comments
Open

Assigning a unicode code #30

philipt18 opened this issue Mar 16, 2016 · 22 comments

Comments

@philipt18
Copy link

Is there a way to assign a unicode code to a key? I actually want to assign an emoji to a key, and figure the easiest way to do that is to assign the unicode code for that emoji, but I'm not sure how to do it. Thanks.

@haata
Copy link
Member

haata commented Mar 16, 2016

Oh, I so want this too. As a tl;dr depending on the OS there might be a simple way.

Otherwise, read this for how I plan on supporting this in general -> https://deskthority.net/workshop-f7/hid-io-sideband-input-device-protocol-spec-t12927.html

@philipt18
Copy link
Author

That's an impressive bit of work. What is the 'simple way' you mentioned? For which OSes?

Does the USB HID standard support sending 16-bit unicode characters with HID LANGIDs? I might be misreading this, but is there some way to use this mechanism to support sending unicode directly?

Right now I'm planning on building a keyboard with my daughter to use initially with a Raspberry Pi 3 hooked up to an old monitor. I was hoping to turn the number pad into an emoji pad, and have custom keycaps printed up. Eventually she'll graduate from the Pi to something else more mainstream, but I'd like her to be able to keep using the keyboard.

@dhardy
Copy link

dhardy commented Mar 16, 2016

I don't know, but 16-bit numbers have not been sufficient for all Unicode code points for a while now. Additionally "characters" can be composed of multiple code-points (e.g. combining diacritics). A better option would be to let a key send any UTF-8 sequence, possibly with a maximum length (this would also allow a key to type short sequences like 'th', 'the', 'and').

Don't know what you mean by LANGIDs; the whole point of Unicode is that you don't need language identifiers.

@philipt18
Copy link
Author

So I e-mailed the USB HID e-mail address on their site asking if it there was plan to support sending unicode codes via HID, and amazingly got the following response already from Steve McGowan at Intel:


Hi Philip,

The short answer is 'no'. This is the first time that I have heard it suggested.

When we originally defined the HID keyboard support, we needed a means of easily transitioning from PS2 scan codes to HID Usages. The values of PS2 scan codes were based on the position of the various keys in the key matrix of the keyboard, and the mapping of PS2 scan codes to ACSII characters was not intuitive. The HID keyboard Usages define a 1:1 mapping of the scan codes, but HID provides a more intuitive mapping to ASCII characters.

A problem at the time was that keyboard vendors used the same silicon in all their keyboards, no matter what country they shipped them to. They would simply populate the keyboards with different key tops depending on the country that they were shipping to. The USB Country Code was supposed to be the way for a vendor to identify the key tops that a keyboard was populated with, but at the time keyboard silicon was severely resource limited compared to the technology that is available today, many vendors used the same Country Code for all their keyboards, irrespective of the key tops that were used.

At the time that HID Keyboards were defined (1995) we used Nadine Kano's book "Developing International Software for Windows 95 and Windows NT" as the reference for translating country codes/scan codes in to Unicode. Her book is now out of print, but it is very easy to pick up cheaply. I realize that this is not an optimal solution, but HID keyboard definition has not changed for over 20 years.

However with today's technology, the capabilities of low cost USB keyboard silicon could be greatly enhanced, i.e. it could directly generate Unicode characters rather simplified HID Usages that that need to be translated to Unicode.

You could use "vendor defined" usages to define a proprietary method support HID Unicode keyboards.

Or if you wanted to standardize it, write a HID Review Request to get the new Usages added to the HID Usage Table document. The requirements for a Review Request are that at least 3 companies need to sponsor it, and at least one of them needs to be a USB-IF member.

In either case, a driver would need to be written to route the Unicode characters to existing keyboard drivers on each OS. Initially vendors of HID Unicode keyboards would be responsible for providing drivers but over time, assuming Unicode keyboards became popular, OS's would offer driver support.

Regards,

Steve

@haata
Copy link
Member

haata commented Mar 16, 2016

Neat! Well, my company (Input Club) would definitely support some sort of Unicode proposal (though we don't really have the money to become a USB-IF member right now...).

The way I want to treat the HID-IO spec is to be a way to test out "interesting" functionality that isn't possible using HID or adoption is rather slow by OS vendors. However, if additional HID support was added to the spec (say unicode), I'd fall back to that if available.

One other interesting issue with unicode is application support. How do you deal with press/release of unicode characters in an application? I'm sure most programs would blow up. So my plan right now is just "enter the character" rather than treat them as normal keys (I can also emulate repeat rate fairly easily).

What is the 'simple way' you mentioned? For which OSes?

Most OSs have a way to enter a "compose mode" where you can just enter the number of the unicode character. In theory you could setup a KLL macro that just entered all these keys quickly and you'd get a unicode character. Unfortunately this sequence is OS dependent (if available at all).

Does the USB HID standard support sending 16-bit unicode characters with HID LANGIDs?

Not really (as confirmed in the email from Intel). Even if HID could be coherenced into sending UTF-8 data, you still need a HID driver on the other end to deal with all this, regardless of how it's sent over USB (and Steve's in agreement with me on this one :S )

In any case, unicode support IS possible. But it's a big task (especially on the host computer side).
If HID-IO is any indication, I'm committed to getting unicode support (by any means necessary), but it may take me years at this rate without help (among all the other keyboard stuff I'm doing).

@jrus
Copy link

jrus commented Mar 16, 2016

In theory you could setup a KLL macro that just entered all these keys quickly and you'd get a unicode character.

I’ve done this for specific letters on some keyboards. As you say, it works okay, as long as you’re only targeting one OS. Also, doesn’t support every arbitrary Unicode character.

@philipt18
Copy link
Author

The trick is getting a USB-IF member to support what you're doing. Since you need three companies, but only one USB-IF member company, if you get someone who is already a member to work with you, you can do it without having to front the money for membership.

@philipt18
Copy link
Author

There are over 800 member companies (https://www.usb.org/members_landing/directory?complex_search_companies=1) so finding one or two that might find a benefit from being able to send unicode directly shouldn't be too hard, right?

@haata
Copy link
Member

haata commented Mar 17, 2016

In theory, yeah. I'm starting to poke around my contacts to see if I can
utilize some of them. This is only half the battle, still got to convince
them that what we're doing is worthwhile and worth their time.

Worst case, I can get Input Club to become a USB-IF member (once we have
money).

On Thu, Mar 17, 2016 at 2:49 PM philipt18 notifications@github.com wrote:

There are over 800 member companies (
https://www.usb.org/members_landing/directory?complex_search_companies=1)
so finding one or two that might find a benefit from being able to send
unicode directly shouldn't be too hard, right?


You are receiving this because you commented.

Reply to this email directly or view it on GitHub
#30 (comment)

@jackhumbert
Copy link

@fredizzimo mentioned this in QMK - I'm not sure how much it means, but OLKB would be happy to support this too :)

@heangfat
Copy link

heangfat commented Jun 4, 2018

Hello, how are your progresses in the Unicode keyboard implementation?

I will start to make a big keyboard with about 500 to 600 keys with Unicode characters. From documents Device Class Definition for HID and HID Usage Tables I understood that the range of Usage ID for key code report is from 0 to 0xFFFF (in HID Usage Tables Page 59), whose volume is similar to Unicode BMP; also the large range from 0xE8 to 0xFFFF is unused.

For OS-dependent solution, I can make my own OS layout after defining my usage ID for key code report (say within the range 0x1000 to 0x1300).

There might be another way which uses both Keyboard/Keypad Page 0x07 and Unicode Page 0x10 (in Page 108 of HID Usage Tables), though this only supports Plane 0. But I am not sure whether it still supports boot protocol.

@haata
Copy link
Member

haata commented Jun 4, 2018

@heangfat neat!

I am going to be doing something a bit different. Instead of using HID descriptors, I'm implementing a datastream using Raw HID where packet data can be free form.
This isn't too much more work, as a custom driver is needed for every OS anyways and for systems like Linux and macOS it is hard to override the default HID driver for a device. I worry a little bit about using the existing keyboard Usage Page because it might cause issues with some OSs and break cross-platform support (it might not, but requires lots of testing). It also makes NKRO support virtually impossible because the packet size will be enormous (10KRO using 16-bit packets would require 20 bytes minimum in every packet; vs. 8-bit which is only 10 bytes).

(for boot protocol, you cannot change it much, you'll need to use an NKRO/non-boot descriptor for the keyboard to work in places like the BIOS)

Using this approach, I think the best idea would be to use a new Page entirely. Use a fairly high number so it doesn't conflict with other pages (I believe 0x84 is the most recent addition; Review Request 79; http://www.usb.org/developers/hidpage/).

Now, the next problem is how to format the protocol. Personally, I think sending UTF-8 is probably the safest (but I'm not an expert on this topic). The main problem with UTF-8 is that isn't not fixed length (which is an advantage, but I'll get into that next). USB descriptor expect that everything has a pre-defined length. This makes it easy to make drivers and design hardware. However, it's very hard to design strings for. AFAICT, USB HID does not have a field specified for variable length data (I think it would be hard to define a custom field in this case).

The advantage of UTF-8 (or UTF-16 or UTF-32) being variable length is that new languages can be added over time without the character set becoming obsolete. UCS-16 only support 65k characters. There are currently 136 755 total characters defined, so UCS-2 doesn't even cover half of them.

I haven't thought about it a ton, but here's how I would probably deal with the variable'ness of UTF-8 + USB HID.

  1. Only send "changes" in state. USB HID keyboards send all the keys active (in boot mode) and all the keys all the time (in NKRO mode).
  2. When the OS gets confused (like missing a character; happens due to a bad USB cable or slow OS), it can request an update for that character. The active counter is used to make sure
  3. Configurable update packet size. It is useful on some keyboards to only have to send 1 character at a time. But others might be able to handle packing a 64 byte USB 2.0 FS packet.

Suggested format
Keyboard -> Host

<active counter:16 bit><packet type:2 bit><number of characters in packet:6 bit><array buffer for unicode data:size depends on implementation>

Array Buffer
[<size of character:8 bit><unicode character:variable>...]

Change counter = Incremented on each unicode symbol add or removal. Rolls back to 0 when it overflows. Think of it like a version number, where each Unicode character change should increment the version.
Number of character in packet = Total number of characters in this packet, 64 characters should be sufficient.
Packet Type = Kind of packet
  00 - Add Unicode character
  01 - Remove Unicode character
  10 - Update Unicode character (from OS request, or keyboard forced update)
  11 - Reserved
Size of character = Size of Unicode character in bytes
Unicode Character = UTF-8 symbol in as many bytes are needed (as long as it can fit inside the USB packet)

Host -> Keyboard

<Request update>

It might also be possible to use SET_REPORT, but I wouldn't recommend it.

These are just some of my thoughts, I'm happy to discuss further. I'm likely going to implement something like this with a custom driver first, but I'm glad to help you work on this.

@heangfat
Copy link

heangfat commented Jun 5, 2018

@haata Cool! Defining a new page is a great idea.

I do need some keys with Unicode out of BMP (UCS-2). The second way I suggested (using Unicode page 0x10) will cause some small issues.

For encoding, I would still suggest fixed length but not UTF-32. We may trim the useless leading byte 0x00 of UTF-32 to make it 3-byte “UTF-24”. Theoretically this “UTF-24” can have 256 planes which is far more than Unicode's 17 planes, thus range 0 to 0x10FFFF is defined for Unicode's 17 planes, and range from 0x110000 onwards till 0xFFFFFF is reserved. Talking about application level, the value of “UTF-24” (and also UTF-32) code is simply equal to Unicode itself and no surrogates involved.

But if you prefer variable length, I suggest UTF-16 instead of UTF-8, as the purpose is a non-ASCII keyboard. Within Plane 0, UTF-16 ensures the length of each code to be fixed at 2 bytes and is exactly same value with Unicode itself, while UTF-8 varies from 1 to 3 bytes. Out of BMP, length of UTF-16 is fixed at 4 bytes, while UTF-8 theoretically varies from 4 to 6 bytes. UTF-16 will be more economical to implement, more friendly to developers and more robust to users than UTF-8.

I guess this is not boot-compatible, that is, not usable in BIOS and DOS. Do correct me if I am wrong.

For the format, let me study and think about it before suggesting.

By the way, have you tried using the existing Keyboard Page and/or Unicode Page? I'm curious about on which OS might it cause what issues?

Thank you for being willing to help me, and I'm also glad to help you with my knowledge. Let's keep each other updated.

@haata
Copy link
Member

haata commented Jun 5, 2018

Yeah, I'm ok with UTF-32. Unicode seems to be ok with it, so it should be ok for at least a little while, haha. And like you said, not being variable will make this so much easier.

UTF-24 isn't really a standard, so it will be harder to work with (i.e. more bugs).

The next decision is whether to use UTF-32LE or UTF-32BE. USB Descriptor string text (as per Intel, because they created USB) uses UTF-16LE encoding, so we should probably stick with UTF-32LE to be consistent. Thoughts?

As for boot-compatible, yeah, it's not possible to make anything boot compatible outside of 6 keys + 8 modifiers. So I wouldn't worry about it.

I haven't tried implementing a new a page (or utilizing an old page). My suspicion that augmenting/adapting the standard OS hid drivers will be difficult. Linux may be the easiest, if only because all of the code is available, and you can technically modify anything. For macOS much of the code is available online for the HID implementation, but I'm not sure it can be modified/fixed. And I have no idea when it comes to Windows.

Also, once we can decide on a packet format, I think I will implement something similar (if not the same) in HID-IO for Unicode.

And, if we can get a working version of the new USB HID Usage Page working, I can look into actually getting it proposed to USB-IF. That will take some work (probably money too), but having prototype, use cases and some companies using it will help a lot. If anything, all of the programmable keyboard firmwares will likely be quite interested in proper Unicode support.

@heangfat
Copy link

heangfat commented Jun 5, 2018

BE is more natural than LE, as BE's byte order looks same with Unicode. Using LE needs to reverse the bytes order in pairs at least for human reading. I am not sure whether USB Descriptor string is LE or BE. It's good to keep consistent.

I just realized the Unicode Page 0x10 may be enough for all Unicode planes. Because in UTF-16, form Plane 1 (0x10000) onwards every code is represented by 4-byte surrogates: The first 2 bytes falls in range 0xD800 to 0xDBFF, while the last 2 bytes falls in range 0xDC00 to 0xDFFF. Normally when applications supporting Plane 1 onwards see 2 bytes within the range 0xD800 to 0xDBFF, they will not consider this as a character but will proceed to read the next 2 bytes which must fall in the range 0xDC00 to 0xDFFF (otherwise invalid). This is the difference between UTF-16 and UCS-2. UCS-2 just treats 0xD800 to 0xDFFF as normal characters.
In this case when a key with code 0xHHHHH is pressed, it needs to send 4 bytes at one time. Then the report descriptor for Unicode Page should be customized to make it accept variable length.

And for our UTF-32 usage page, the simplest way is to expand the existing Unicode Page 0x10 to four-octet form. I agree that we should propose to USB-IF after we get prototypes working. I guess if we propose, they will consider upgrading their Unicode Page 0x10 to four-octet form directly. I will try commercializing my big keyboard in the future, provided the functions can be achieved.

@dhardy
Copy link

dhardy commented Jun 5, 2018

BE is more natural than LE

Depends on the CPU. With Intel you can just cast an LE-byte-array pointer to a 32- or 64-bit integer pointer. It's fairly arbitrary.

@jordiorlando
Copy link

@haata instead of a "change counter", how about just sending a checksum byte with each packet. It could be a hash of the current keyboard state, that way if the OS misses a change, its hash will no longer match the keyboards and it can request a full update.

@haata
Copy link
Member

haata commented Jun 10, 2018

Yeah, I think I like that a bit better. The checksum could even be 8 bits, instead of 16 bits.

As for LE vs BE. MCUs such as Cortex-M4 have instructions for endianness swapping as well (which is what I use for keyboards).

Also, not a link to the spec directly, but this explains what USB uses for string descriptors today.
zephyrproject-rtos/zephyr#6594

@petermarkley
Copy link

petermarkley commented Dec 31, 2018

Well I'm a newbie in all this hardware talk and don't have money to contribute, but I'll add my voice to say I think what you guys are doing is cool!

I'm very sad about the news from Steve McGowan, but excited that we could be at the cutting edge of a new advancement in technology. Unicode blew my mind when I learned about it, I really appreciate it from an engineering perspective. More technology should use it.

My interest is for a fantasy world-building project that involves fictional languages. I'm using the Private Use Area (possible future entrant in the ConScript Unicode Registry? 😅), and it would make my life so much easier building and maintain my languages if I could build a keyboard for them (not to mention for IPA ... real linguists could also benefit from that).

Looks like for now I'll have to suffer through a custom keyboard layout on my desktop config (or settle for an on-screen keyboard 😖) ... but I'm rooting for you guys!

@42sol-eu
Copy link

Hi, I am also not so deep in the topic. Always thought that the problem lay on the keyboard part. After owning a ErgoDox I learned differently. It is so sad that one of the most effective ways for data entry up to this point is so poorly developed in principal. The missing solution of unicode from keyboards in the standard protocol is a real set back for me.
Thanks for the great details in this thread
Felix

@ncwhale
Copy link

ncwhale commented Jun 14, 2020

How about just create a keymap for custom UNICODE INPUT keyboard?

For linux it's an easy work to do, and keymap support UNICODEs:

https://wiki.archlinux.org/index.php/Linux_console/Keyboard_configuration#Creating_a_custom_keymap

For windows, we will need to create a keymap.dll for this keyboard to work, without write a driver for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests