New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unicode (utf8 actually) characters not handled correctly #4
Comments
|
Comment by @skralg |
|
todo: fix encoding of language files |
|
Just checked out and still needed to edit tcl.c to get utf-8 output. |
|
[Edit//Note: these tests are on TCL 8.6.8 bots. This may be a TCL version-specific issue.] I can't duplicate this with the current version (1.8.3 / 1080303 [RC?]) The original poster used PUTMSG, so: Seems fine, using braces or double-quotes, raw output or through PUTMSG. |
|
I think I'm having a similar/related issue, and it's easy to reproduce. Try the commands below from an eggdrop console and notice how TCL seems to properly set and return the skull-and-crossbones emoji ( https://www.fileformat.info/info/unicode/char/1f571/index.htm ). However, the moment you try to output it with I'm guessing that putlog/putserv are mangling the UTF-8 string by encoding it with iso8859-1 or something before output. I'm using the latest stable eggdrop, version 1.8.4, and the tcl version is 8.6. This might be of interest, too: |
|
Hi @makk-mma thanks for taking the time to ask about this here. Someone with more knowledge on the topic than I should hopefully get a chance to give you a better answer here soon, but I believe it has to do with Tcl itself not supporting emoji-range unicode characters without some special compilation features in it. I don't have a fix for it handy here, but hang on for a little bit and someone else should chime in soon. Thanks! |
|
So here's what we've found out, after looking into this (finally!) in a somewhat-robust fashion: The encoding scheme used by Eggdrop's Tcl interface is set based on the locale settings of the host machine. You can check which locale your host machine is using by running the So in short, the popular patch at http://eggwiki.org/Bugs/Utf-8 only works if the locale is not set/found on the host machine. If you want Eggdrop to use a specific encoding scheme that it is not currently using, you can view the availabe locales on your machine via the If your experience with this differs, please don't just flame a response here- find us at #eggdrop on Freenode so we can learn more about your system environment and better address this issue. Another comment raised by @makk-mma talked about Emoji's that were not supported- that is a "feature" of Tcl. The very helpful Tcl wiki page on the subject simply states "emoji support isn't enabled by default, recompiling with TCL_UTF_MAX=6 is needed". I am unaware of a package-manager install that would remedy this at this time; if I find one I will update this post or hopefully one of you genius's out there will post a better/more thorough set of steps below. Edit: This proposal provides additional information on the 'why' behind this: https://core.tcl-lang.org/tips/doc/trunk/tip/389.md So to summarize that last paragraph- if you think UTF-8 isn't working for you, try some of the lower-numbered characters like I hope this helps, and am looking forward to feedback that may clarify or enhance this post. I'll add something to the wiki's/docs on this subject as well, to help get the word out. If no other comments are raised to the contrary, we'll (finally!) close this issue shortly. Edit: To recompile Tcl, download the source and edit generic/tcl.h . Look for the line Anecdotally, a user used the following line to compile Eggdrop and found success: |
|
For others stumbling on to this thread, another cause of issues (incorrectly attributed to Eggdrop) was the user using putty to connect to the shell/eggdrop, with a terminal that did not support UTF-8 codes (either terminal, or font, I was unable to deduce for sure from the troubleshooting). Switching to a different terminal program resolved the issue. There were also issues copy/pasting unicode characters instead of using ctrl-shift u [code] (unix) or [code] alt-X (windows) to create the unicode character. |
|
For those interested in emoji support, the Tcl KitCreator can build Tcl, libraries, etc. compiled with |
|
@tlcu Thanks a ton for that discovery - for those who are interested, you can download a compiled library and SDK from rkeene (he's good people). Select your OS (probably Linux/amd64), pick your version of Tcl, and make your selections- for Emoji support, definitely choose "TCL_UTF_MAX=6 (incompatibility with standard Tcl)" as an option, and you can add in things like TLS and Tcllib if you think you'll need those (good to have, just in case...) but you'll also want to make sure you select the "Build Library (KitDLL)" option as well. Once that builds for you, grab the .so but also click the link at the top of the page that says "SDK URL" - that will give you things like TclConfig.sh and tcl.h, which you'll want to compile against. Put those up on your shell and use the ./configure options listed above to point Eggdrop at that library, and you should be good to go. Thanks to @rkeene for an awesome build system for those without the means to compile by themselves. |
thommey commentedFeb 2, 2010
My setup:
eggdrop 1.6.17
tcl 8.5a6
Linux 2.6 with latest version of libraries and all stuff
freenode network
irssi client
Testing is simple, tcl script that echoes data back to channel:
bind pub - "utf" pub_proc
proc pub_proc { nick idx handle channel szoveg } {
putmsg $channel "$szoveg"
}
Now on original 1.6.17 when entering utf8 characters this happens:
21:39 < arekm8> utf óąłńś
21:39 < utftest> �EBD[
(some crap is echoed)
After patching src/tcl.c utf_convert() with:
I get:
21:57 < arekm8> utf óąłńś
21:57 < utftest> óąłńś
It works - proper characters are echoed back.
No idea why ByteArray is used in utf_convert() so I'm not sure if the fix is
correct. Is it?
The text was updated successfully, but these errors were encountered: