Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I output Chinese with send? #45

Closed
lyonferris opened this issue May 29, 2019 · 10 comments
Closed

How can I output Chinese with send? #45

lyonferris opened this issue May 29, 2019 · 10 comments
Labels
help wanted Extra attention is needed question Further information is requested

Comments

@lyonferris
Copy link

Hello ~
I want to know how to send out Chinese?

@spyoungtech spyoungtech added the question Further information is requested label May 29, 2019
@spyoungtech
Copy link
Owner

Hmm. A quick search seems to indicate that AutoHotkey itself does not have great support for unicode. If you are wanting to use the send commands to send chinese input, that may be a limitation of AutoHotkey.

There are some documented workarounds for this in the AHK forums.

If I can find a good workaround for sending unicode that can be cleanly implemented, I'd be happy to add it as a function in the library.

@spyoungtech spyoungtech added the help wanted Extra attention is needed label May 29, 2019
@spyoungtech
Copy link
Owner

spyoungtech commented Jul 17, 2019

I think what I'll be planning to do is have a method which will take unicode characters and convert them to the unicode sequences in AHK scripts. E.g. characters like '\u2F72' in Python strings, will translate to the ahk unicode sequence like {U+2F72}

In the meantime, it should be possible to use the .send and .send_input methods with the ahk unicode strings directly.

ahk.send_input('Hello unicode {U+2F72}')

Eventually, maybe the .type method will have the capability to do this for you, or maybe a .type_unicode method will be added.

@ClericPy
Copy link

ClericPy commented Jul 20, 2019

ahk.image_search's image path could not use chinese words either
ahk.find_window's title can be set '中文-chinese-words'.encode('gbk')

will autohotkey_unicode version work for this?

and...
_run_script
https://github.com/spyoungtech/ahk/blob/master/ahk/script.py#L55

        script_bytes = bytes(script_text, 'utf-8')
                return result.stdout.decode()

could this support encoding arg?
bytes(script_text, 'gbk')
sometimes set encoding with gbk will fix mass output issue

@spyoungtech
Copy link
Owner

Yeah, this makes sense.

From what I understand, the version of AHK that I test against is already unicode compatible. I've been trying to figure out the behavior of how AHK interprets the encoding sent to it.

For instance, when reading from a file, AHK seems to read UTF-8 unicode just fine. When sent via stdin to the subprocess, it seems that there is some sort of encoding mismatch. I suspect a locale or system-default encoding is being used.

When sending the following as UTF-8 encoded bytes to the subprocess:

SendInput ⽲

AutoHotkey ends up sending â½² (which indicates to me it chose to interpret the bytes as cp1252 encoding, rather than UTF-8)

@ClericPy
Copy link

Is there some way set global encoding setting for different countries, instead of utf-8

@spyoungtech
Copy link
Owner

spyoungtech commented Oct 9, 2019

What's important is to identify how AHK does encoding detection. I believe it may rely on Windows semantics, which will used the preferred encoding for the locale if the bytes make sense in that encoding.

For example you can use locale.getpreferredencoding() to identify the locale encoding. However, this may not necessarily be the encoding that AHK uses. In the above case, the bytes which represetn the UTF-8 character is ambiguous with the locale encoding CP1252, which would interpret the bytes as â½². I'm not sure the behavior would be the same if the bytes sent were not valid in CP1252.

So knowing what the preferred encoding is alone may not be enough. I'll have to do some more testing around this and maybe inquire with some folks more knowledgable of Windows/AHK's behavior when reading from stdin.

@kymikoloco
Copy link

kymikoloco commented Jul 21, 2020

(Edit: Oops, didn't realize this was a Python wrapper, this probably won't solve anything, but it might help someone like me who stumbled upon this issue first :) )

Try saving the *.ahk file as UTF-8 with BOM to make sure AHK sends your UTF-8 text as the correct encoding.

https://www.reddit.com/r/AutoHotkey/comments/9zz9q3/why_do_hotstrings_sometimes_dont_work_depending/

AutoHotkey treats files as ANSI unless it has a very good reason not to. A file in UTF-8 encoding will work sometimes, but only if it has the Byte-Order-Mark (BOM) explicitly declared.

Encoding AHK treats as
ANSI/ASCII ANSI
UTF-8 (no BOM) ANSI
UTF-8-BOM Unicode
UTF-16 Unicode

UTF-8-BOM is the preferred encoding for AutoHotkey scripts because it is a variable-byte-encoding, which means when a character can be encoded using fewer bytes it encodes it with fewer bytes. A script with only standard ASCII/ANSI characters will only be as large as a regular ASCII/ANSI file (plus like 3 bytes for the BOM), then any Unicode characters will take up an extra byte or two.

@spyoungtech
Copy link
Owner

spyoungtech commented Jul 21, 2020

Yeah. Earlier on in this project, the implementation of calling AHK scripts was to write the .ahk script to a temporary file and then call the AHK executable, providing the temp filename as an argument. I think with that earlier implementation, a lot of the unicode stuff wasn't an issue.

However, the current implementation is to pass the script text (as bytes) to the AHK process via stdin, avoiding the need to write out the generated script to a file (avoiding things like permissions errors or failing to cleanup tempfiles)

For whatever reason, AHK doesn't seem to use UTF-8 when the script text is passed through stdin. 🤷‍♀️

@spyoungtech
Copy link
Owner

So, the BOM doesn't work when passing script as stdin, but there is a flag to set the codepage, which seems promising. See: #132

@spyoungtech
Copy link
Owner

This should be essentially resolved with #132

One major exception is when using daemon mode. In which case, you must workaround this by using unicode sequences (e.g. {U+nnnn}) to ensure unicode characters are correctly understood.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants