New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8-bit locales treated as UTF-8 #56

Closed
clbr opened this Issue Jun 14, 2013 · 9 comments

Comments

Projects
None yet
4 participants
@clbr

clbr commented Jun 14, 2013

I use a 8-bit locale due to various advantages over UTF-8 (fi_FI.ISO-8859-15@euro).

With an Xft-enabled JWM, it treats text as UTF-8, which only works for the ASCII subset. When having chars in the upper segment, such as with the clock string set to display the month name (Kesäkuu - note the ä), the string is chopped and only displays up to the non-ascii letter ("Kes").

A possible solution would be to query for the system's charset at startup, and if it's not UTF-8, convert strings from it to UTF-8 every time when strings need to be rendered.

The system's char encoding can be queried with nl_langinfo(CODESET), and converted with the iconv functions.

@joewing

This comment has been minimized.

Show comment
Hide comment
@joewing

joewing Jun 17, 2013

Owner

I'm looking into this. I think it should be a fairly easy change.

Owner

joewing commented Jun 17, 2013

I'm looking into this. I think it should be a fairly easy change.

@Magicloud

This comment has been minimized.

Show comment
Hide comment
@Magicloud

Magicloud Jul 11, 2014

Seems like it is not an easy change. :-)

I have a quick patch for my issue (trayer clock) that force the string to be UTF-8.

diff --git a/src/Makefile.in b/src/Makefile.in
index 9191939..0d8781e 100644
--- a/src/Makefile.in
+++ b/src/Makefile.in
@@ -2,7 +2,7 @@
 CC = @CC@
 CFLAGS = @CFLAGS@
 CPPFLAGS = @CPPFLAGS@
-LDFLAGS = @LDFLAGS@
+LDFLAGS = -liconv @LDFLAGS@
 BINDIR = $(DESTDIR)@BINDIR@

 VPATH=.:os
diff --git a/src/timing.c b/src/timing.c
index f6ffd8a..f6013a5 100644
--- a/src/timing.c
+++ b/src/timing.c
@@ -10,6 +10,9 @@
 #include "jwm.h"
 #include "timing.h"

+#include <langinfo.h>
+#include <iconv.h>
+
 static const unsigned long MAX_TIME_SECONDS = 60;

 /** Get the current time in milliseconds since midnight 1970-01-01 UTC. */
@@ -92,8 +95,30 @@ const char *GetTimeString(const char *format, const char *zone)
       strftime(str, sizeof(str), format, localtime(&t));
    }

-   return str;
-
+   char *codeset = nl_langinfo (CODESET);
+   if (strncmp (codeset, "UTF-8", 5) == 0) {
+      return str;
+   }
+   iconv_t cd = iconv_open ("UTF-8", codeset);
+   char * inBuf = str;
+   size_t inLeft = strlen (str);
+   size_t outLeft = 4 * inLeft;
+   if (outLeft >= 512) {
+/* should not happen since str is 80. but leave it here in case future changes. */
+      fprintf (stderr, "Time string too long.\n");
+      return str;
+   }
+   static char ob[512];
+   char * outBuf = (char *) ob;
+   size_t ret = iconv (cd, &inBuf, &inLeft, &outBuf, &outLeft);
+   iconv_close (cd);
+   *((wchar_t *) outBuf) = L'\0';
+   if (ret == -1) {
+      fprintf (stderr, "iconv error: %d, %d\n", inLeft, outLeft);
+      return str;
+   } else {
+      return ob;
+   }
  }

Seems like it is not an easy change. :-)

I have a quick patch for my issue (trayer clock) that force the string to be UTF-8.

diff --git a/src/Makefile.in b/src/Makefile.in
index 9191939..0d8781e 100644
--- a/src/Makefile.in
+++ b/src/Makefile.in
@@ -2,7 +2,7 @@
 CC = @CC@
 CFLAGS = @CFLAGS@
 CPPFLAGS = @CPPFLAGS@
-LDFLAGS = @LDFLAGS@
+LDFLAGS = -liconv @LDFLAGS@
 BINDIR = $(DESTDIR)@BINDIR@

 VPATH=.:os
diff --git a/src/timing.c b/src/timing.c
index f6ffd8a..f6013a5 100644
--- a/src/timing.c
+++ b/src/timing.c
@@ -10,6 +10,9 @@
 #include "jwm.h"
 #include "timing.h"

+#include <langinfo.h>
+#include <iconv.h>
+
 static const unsigned long MAX_TIME_SECONDS = 60;

 /** Get the current time in milliseconds since midnight 1970-01-01 UTC. */
@@ -92,8 +95,30 @@ const char *GetTimeString(const char *format, const char *zone)
       strftime(str, sizeof(str), format, localtime(&t));
    }

-   return str;
-
+   char *codeset = nl_langinfo (CODESET);
+   if (strncmp (codeset, "UTF-8", 5) == 0) {
+      return str;
+   }
+   iconv_t cd = iconv_open ("UTF-8", codeset);
+   char * inBuf = str;
+   size_t inLeft = strlen (str);
+   size_t outLeft = 4 * inLeft;
+   if (outLeft >= 512) {
+/* should not happen since str is 80. but leave it here in case future changes. */
+      fprintf (stderr, "Time string too long.\n");
+      return str;
+   }
+   static char ob[512];
+   char * outBuf = (char *) ob;
+   size_t ret = iconv (cd, &inBuf, &inLeft, &outBuf, &outLeft);
+   iconv_close (cd);
+   *((wchar_t *) outBuf) = L'\0';
+   if (ret == -1) {
+      fprintf (stderr, "iconv error: %d, %d\n", inLeft, outLeft);
+      return str;
+   } else {
+      return ob;
+   }
  }
@joewing

This comment has been minimized.

Show comment
Hide comment
@joewing

joewing Jul 11, 2014

Owner

It took me awhile to get to this primarily because I didn't know how to test it, and then I just forgot about it... The patch was helpful.
Anyway, commit 0b53361 attempts to fix this for everything, but I haven't tested it other than confirm that UTF-8 still works.

Owner

joewing commented Jul 11, 2014

It took me awhile to get to this primarily because I didn't know how to test it, and then I just forgot about it... The patch was helpful.
Anyway, commit 0b53361 attempts to fix this for everything, but I haven't tested it other than confirm that UTF-8 still works.

@clbr

This comment has been minimized.

Show comment
Hide comment
@clbr

clbr Jul 12, 2014

Can't test right now, but noting that on glibc systems the libiconv dep is an unnecessary one. Glibc provides all the iconv functions, so the configure check should not require libiconv should the libc provide the functions.

clbr commented Jul 12, 2014

Can't test right now, but noting that on glibc systems the libiconv dep is an unnecessary one. Glibc provides all the iconv functions, so the configure check should not require libiconv should the libc provide the functions.

@jdoe0154

This comment has been minimized.

Show comment
Hide comment
@jdoe0154

jdoe0154 Aug 26, 2014

Somehow it's not perfect yet. I just upgraded from JWM 933 to JWM 946. Now I get extra characters at the end of JWM's menu strings. Screenshot, with JWM's built-in "right-click on the title bar" menu open:

lang_without_utf-8

A little experimentation revealed the following: If JWM 946 is compiled with "./configure --disable-nls", then the extra characters appear always.

If JWM 946 is configured without the "--disable-nls" switch, then things depend on the environment variable "LANG":

  • if LANG is not set, or is set to a non-UTF locale (f. ex., LANG=C), then the extra characters appear;
  • if LANG is set to an UTF locale (f. ex., LANG=en_US.UTF-8), then there are no extra characters. Screenshot for this case:

lang_with_utf-8

Somehow it's not perfect yet. I just upgraded from JWM 933 to JWM 946. Now I get extra characters at the end of JWM's menu strings. Screenshot, with JWM's built-in "right-click on the title bar" menu open:

lang_without_utf-8

A little experimentation revealed the following: If JWM 946 is compiled with "./configure --disable-nls", then the extra characters appear always.

If JWM 946 is configured without the "--disable-nls" switch, then things depend on the environment variable "LANG":

  • if LANG is not set, or is set to a non-UTF locale (f. ex., LANG=C), then the extra characters appear;
  • if LANG is set to an UTF locale (f. ex., LANG=en_US.UTF-8), then there are no extra characters. Screenshot for this case:

lang_with_utf-8

joewing added a commit that referenced this issue Aug 26, 2014

@joewing

This comment has been minimized.

Show comment
Hide comment
@joewing

joewing Aug 26, 2014

Owner

Thanks for the report, @jdoe0154. This should be fixed as of commit ec0b4cb.
I wasn't properly null-terminating the strings coming out of iconv.

Owner

joewing commented Aug 26, 2014

Thanks for the report, @jdoe0154. This should be fixed as of commit ec0b4cb.
I wasn't properly null-terminating the strings coming out of iconv.

@joewing joewing modified the milestones: Version 2.3, Version 2.2.3 Aug 26, 2014

@jdoe0154

This comment has been minimized.

Show comment
Hide comment
@jdoe0154

jdoe0154 Aug 26, 2014

I downloaded jwm-master.zip (with the commit included), and the extra characters are now gone. Thank you!

But there is one thing more:

root [~] startx
DEBUG: main.c[116]: debug mode started
The XKEYBOARD keymap compiler (xkbcomp) reports:
> Warning:          Type "ONE_LEVEL" has 1 levels, but <RALT> has 2 symbols
>                   Ignoring extra symbols
Errors from xkbcomp are not fatal to the X server
DEBUG: Unknown event type: 34
*** Error in `jwm': free(): invalid pointer: 0x0982eb70 ***
======= Backtrace: =========
/lib/libc.so.6(+0x7ad06)[0xf745bd06]
/lib/libc.so.6(+0x837a1)[0xf74647a1]
/lib/libc.so.6(cfree+0x5d)[0xf7469c9d]
/lib/libc.so.6(+0x1a0b5)[0xf73fb0b5]
/lib/libc.so.6(iconv_close+0x1d)[0xf73fa5ed]
jwm[0x8069b90]
/lib/libc.so.6(__libc_start_main+0xf0)[0xf73f8e10]

The environment variable LANG was not set. I ran "startx", opened a terminal, checked JWM's right-click menu, then I stopped Xorg with Ctrl-Alt-Backspace. Then the console showed the backtrace. This occurred for JWM 946 already (but never for JWM 933 and before). It is followed by a long memory map, more than my scroll-back buffer can hold. I have only now found a trick to capture the start of it, too.

The backtrace comes from jwm-master compiled with --enable-debug. At the moment I don't know how to get more debug information (short of recompiling my C library ...).

I downloaded jwm-master.zip (with the commit included), and the extra characters are now gone. Thank you!

But there is one thing more:

root [~] startx
DEBUG: main.c[116]: debug mode started
The XKEYBOARD keymap compiler (xkbcomp) reports:
> Warning:          Type "ONE_LEVEL" has 1 levels, but <RALT> has 2 symbols
>                   Ignoring extra symbols
Errors from xkbcomp are not fatal to the X server
DEBUG: Unknown event type: 34
*** Error in `jwm': free(): invalid pointer: 0x0982eb70 ***
======= Backtrace: =========
/lib/libc.so.6(+0x7ad06)[0xf745bd06]
/lib/libc.so.6(+0x837a1)[0xf74647a1]
/lib/libc.so.6(cfree+0x5d)[0xf7469c9d]
/lib/libc.so.6(+0x1a0b5)[0xf73fb0b5]
/lib/libc.so.6(iconv_close+0x1d)[0xf73fa5ed]
jwm[0x8069b90]
/lib/libc.so.6(__libc_start_main+0xf0)[0xf73f8e10]

The environment variable LANG was not set. I ran "startx", opened a terminal, checked JWM's right-click menu, then I stopped Xorg with Ctrl-Alt-Backspace. Then the console showed the backtrace. This occurred for JWM 946 already (but never for JWM 933 and before). It is followed by a long memory map, more than my scroll-back buffer can hold. I have only now found a trick to capture the start of it, too.

The backtrace comes from jwm-master compiled with --enable-debug. At the moment I don't know how to get more debug information (short of recompiling my C library ...).

@joewing

This comment has been minimized.

Show comment
Hide comment
@joewing

joewing Aug 26, 2014

Owner

That part should be fixed as of commit 66ebcd6. iconv_close was getting called multiple times with the same descriptor.

Owner

joewing commented Aug 26, 2014

That part should be fixed as of commit 66ebcd6. iconv_close was getting called multiple times with the same descriptor.

@jdoe0154

This comment has been minimized.

Show comment
Hide comment
@jdoe0154

jdoe0154 Aug 26, 2014

Yes, now all six combinations work: JWM compile option "--disable-nls" (not used vs. used) X "LANG" environment variable (unset vs. set to "C" vs. set to an UTF-8 locale). There are no extra characters, and no backtraces. Thanks again!!

Yes, now all six combinations work: JWM compile option "--disable-nls" (not used vs. used) X "LANG" environment variable (unset vs. set to "C" vs. set to an UTF-8 locale). There are no extra characters, and no backtraces. Thanks again!!

@joewing joewing closed this Sep 5, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment