Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

config files fail to read in python3 if incompatible locale #273

Closed
kwirk opened this Issue Jun 30, 2013 · 8 comments

Comments

Projects
None yet
3 participants
Contributor

kwirk commented Jun 30, 2013

Mic92 on Arch Linux AUR reported an issue with python3 failing to read config files, due to UnicodeDecodeError. jail.conf has a few non-ascii characters which trip this.

Do we want to keep sure that default config files only contain ascii, and then if someone customises the log file, it's reliant on them to ensure they enter characters of compatible encoding;
Or is there a better alternative solution?

searching with grep -rP '[^\x00-\x7f]' config/:

config/action.d/iptables-xt_recent-echo.conf:# Author: Zbigniew JÄdrzejewski-Szmek <zbyszek@in.waw.pl>
config/jail.conf:#   ALERT â tried to register forbidden variable âGLOBALSâ
Owner

yarikoptic commented Jul 1, 2013

I am confused here... "UnicodeDecodeError" -- doesn't it mean that the files are actually non-utf8 (not just non-ascii)? I hope that Python 3.x's internal libraries (such as ConfigParser) handle utf8 files just fine.

I do not mind augmenting jail.conf to avoid using non-ascii utf8, since there is no big reason there for them. But as for Zbigniew's last name -- I do not see why should we alter it without good reason ;)

Contributor

kwirk commented Jul 5, 2013

Confusingly, any failed decode raises a UnicodeDecodeError, including "ascii"…

ConfigParser will open files in the system current local by default. It appears that the config parser can be forced to use a particular encoding (3.2+), but I think it makes sense to use the system locale by default (as the log files are also opened with system locale).

I would have just made the change had it only been the jail.conf file, but I agree that changing someone's name isn't ideal…

@keszybz Would you be okay with us changing your name to ascii in the config file? (and add your name to the THANKS file in UTF-8)

Contributor

keszybz commented Jul 8, 2013

I'm fine with removing the ogonek from my name :)

Owner

yarikoptic commented Jul 8, 2013

damn thing didn't post my reply via email:

>    ConfigParser will open files in the system current local by default. It
>    appears that the config parser can be forced to use a particular encoding
>    (3.2+), but I think it makes sense to use the system locale by default (as
>    the log files are also opened with system locale).

hm... I would disagree -- for files provided by some software it is the best to open in encoding they are coded in, and software aware of which particular one -- I guess we should just enforce utf8 throughout? or am I missing smth (i.e. we have some configs intentionally in old-fashion encodings? then we might need to add some meta header to trigger specific encoding happen it needed)

Contributor

kwirk commented Jul 10, 2013

I agree that would be good to force UTF-8 encoding for config files. My hesitation was the fact that this would then require python3 >=3.2. However, given this is for version 0.9 which isn't released yet, it's likely that any future releases of a distro that uses 0.9, will have at least python 3.2 (if not 3.3)

Therefore, if your in agreement, I'll put a fix in to use UTF-8, and update the readme to say python3 > 3.2?

Owner

yarikoptic commented Jul 10, 2013

sounds good. I wonder how could we reproduce original report since for me with python 3.2.4 I spot only 1 failing locale-related unittest:

$> LANGUAGE=ru_RU.KOI8-R LANG=ru_RU.KOI8-R LC_ALL=ru_RU.KOI8-R locale                                               
LANG=ru_RU.KOI8-R
LANGUAGE=ru_RU.KOI8-R
LC_CTYPE="ru_RU.KOI8-R"
LC_NUMERIC="ru_RU.KOI8-R"
LC_TIME="ru_RU.KOI8-R"
LC_COLLATE="ru_RU.KOI8-R"
LC_MONETARY="ru_RU.KOI8-R"
LC_MESSAGES="ru_RU.KOI8-R"
LC_PAPER="ru_RU.KOI8-R"
LC_NAME="ru_RU.KOI8-R"
LC_ADDRESS="ru_RU.KOI8-R"
LC_TELEPHONE="ru_RU.KOI8-R"
LC_MEASUREMENT="ru_RU.KOI8-R"
LC_IDENTIFICATION="ru_RU.KOI8-R"
LC_ALL=ru_RU.KOI8-R

$> LANGUAGE=ru_RU.KOI8-R LANG=ru_RU.KOI8-R LC_ALL=ru_RU.KOI8-R python3 bin/fail2ban-testcases                       
Fail2ban 0.9.0a1 test suite. Python 3.2.4 (default, May  8 2013, 20:55:18) [GCC 4.7.3]. Please wait...
..........................................................s.............F................................
======================================================================
FAIL: testGetFailures03 (fail2ban.tests.filtertestcase.GetFailures)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./fail2ban/tests/filtertestcase.py", line 805, in testGetFailures03
    _assert_correct_last_attempt(self, self.filter, output)
  File "./fail2ban/tests/filtertestcase.py", line 129, in _assert_correct_last_attempt
    _assert_equal_entries(utest, found, output, count)
  File "./fail2ban/tests/filtertestcase.py", line 100, in _assert_equal_entries
    utest.assertEqual(found[1], count or output[1])   # count
AssertionError: 7 != 6

----------------------------------------------------------------------
Ran 105 tests in 67.281s

FAILED (failures=1, skipped=1)

where it seems to start understanding those dates previously skipped because of non-utf8 format:

$> diff -Naur /tmp/d{1,2}_       
--- /tmp/d1_    2013-07-10 13:48:57.993521276 -0400
+++ /tmp/d2_    2013-07-10 13:48:54.977438302 -0400
@@ -3,8 +3,8 @@
 I: Skipping systemd backend testing. Got exception 'No module named systemd'
 Setting usedns = warn for FileFilter(None)
 Created FileFilter(None)
-Set jail log file encoding to UTF-8
-Added logfile = /home/yoh/.tmp/tmp_fail2ban0v8pwacrlf
+Set jail log file encoding to KOI8-R
+Added logfile = /home/yoh/.tmp/tmp_fail2bano7z_zkcrlf
 Found a match for 'Dc 31 11:59:59 [sshd] error: PAM: Authentication failure for kevin from 193.168.0.128\n' but no valid date/time found for 'Dc 31 11:59:59 [sshd] error: PAM: Authentication failure for kevin from 193.168.0.128'. Please file a detailed issue on https://github.com/fail2ban/fail2ban/issues in order to get support for this format.
 Found a match for 'Dc 31 11:59:59 [sshd] error: PAM: Authentication failure for kevin from 193.168.0.128\n' but no valid date/time found for 'Dc 31 11:59:59 [sshd] error: PAM: Authentication failure for kevin from 193.168.0.128'. Please file a detailed issue on https://github.com/fail2ban/fail2ban/issues in order to get support for this format.
 Found a match for 'Dc 31 11:59:59 [sshd] error: PAM: Authentication failure for kevin from 193.168.0.128\n' but no valid date/time found for 'Dc 31 11:59:59 [sshd] error: PAM: Authentication failure for kevin from 193.168.0.128'. Please file a detailed issue on https://github.com/fail2ban/fail2ban/issues in order to get support for this format.
@@ -90,14 +90,14 @@
 Ignore line since time 1104490799.0 < 1124013600 - 6000
 Setting usedns = warn for FileFilter(None)
 Created FileFilter(None)
-Set jail log file encoding to UTF-8
+Set jail log file encoding to KOI8-R
 Added logfile = ./fail2ban/tests/files/testcase01.log
-Error decoding line from './fail2ban/tests/files/testcase01.log' with 'UTF-8': b'D\xe9c 31 11:59:59 [sshd] error: PAM: Authentication failure for kevin from 193.168.0.128\n'
-Found a match for 'Dc 31 11:59:59 [sshd] error: PAM: Authentication failure for kevin from 193.168.0.128\n' but no valid date/time found for 'Dc 31 11:59:59 [sshd] error: PAM: Authentication failure for kevin from 193.168.0.128'. Please file a detailed issue on https://github.com/fail2ban/fail2ban/issues in order to get support for this format.
-Error decoding line from './fail2ban/tests/files/testcase01.log' with 'UTF-8': b'D\xe9c 31 11:59:59 [sshd] error: PAM: Authentication failure for kevin from 193.168.0.128\n'
-Found a match for 'Dc 31 11:59:59 [sshd] error: PAM: Authentication failure for kevin from 193.168.0.128\n' but no valid date/time found for 'Dc 31 11:59:59 [sshd] error: PAM: Authentication failure for kevin from 193.168.0.128'. Please file a detailed issue on https://github.com/fail2ban/fail2ban/issues in order to get support for this format.
-Error decoding line from './fail2ban/tests/files/testcase01.log' with 'UTF-8': b'D\xe9c 31 11:59:59 [sshd] error: PAM: Authentication failure for kevin from 193.168.0.128\n'
-Found a match for 'Dc 31 11:59:59 [sshd] error: PAM: Authentication failure for kevin from 193.168.0.128\n' but no valid date/time found for 'Dc 31 11:59:59 [sshd] error: PAM: Authentication failure for kevin from 193.168.0.128'. Please file a detailed issue on https://github.com/fail2ban/fail2ban/issues in order to get support for this format.
+Matched time template MONTH Day Hour:Minute:Second
+Found a match for ' [sshd] error: PAM: Authentication failure for kevin from 193.168.0.128\n' but no valid date/time found for 'Déc 31 11:59:59'. Please file a detailed issue on https://github.com/fail2ban/fail2ban/issues in order to get support for this format.
+Matched time template MONTH Day Hour:Minute:Second
+Found a match for ' [sshd] error: PAM: Authentication failure for kevin from 193.168.0.128\n' but no valid date/time found for 'Déc 31 11:59:59'. Please file a detailed issue on https://github.com/fail2ban/fail2ban/issues in order to get support for this format.
+Matched time template MONTH Day Hour:Minute:Second
+Found a match for ' [sshd] error: PAM: Authentication failure for kevin from 193.168.0.128\n' but no valid date/time found for 'Déc 31 11:59:59'. Please file a detailed issue on https://github.com/fail2ban/fail2ban/issues in order to get support for this format.
 Matched time template MONTH Day Hour:Minute:Second
 Correcting deduced year from 2005 to 2004 since 1136026799.000000 > 1124013600.000000
 Got time using template MONTH Day Hour:Minute:Second
@@ -180,7 +180,7 @@
 Ignore line since time 1104490799.0 < 1124013600 - 6000
 Setting usedns = warn for FileFilter(None)
 Created FileFilter(None)
-Set jail log file encoding to UTF-8
+Set jail log file encoding to KOI8-R
 Added logfile = ./fail2ban/tests/files/testcase02.log
 Matched time template MONTH Day Hour:Minute:Second
 Matched time template MONTH Day Hour:Minute:Second
@@ -217,7 +217,7 @@
 Matched time template MONTH Day Hour:Minute:Second
 Setting usedns = warn for FileFilter(None)
 Created FileFilter(None)
-Set jail log file encoding to UTF-8
+Set jail log file encoding to KOI8-R
 Added logfile = ./fail2ban/tests/files/testcase03.log
 Matched time template MONTH Day Hour:Minute:Second
 Got time using template MONTH Day Hour:Minute:Second
@@ -246,13 +246,16 @@
 Processing line with time:1124013424.0 and ip:203.162.223.135
 Found 203.162.223.135
 Total # of detected failures: 5. Current failures from 1 IPs (IP:count): 203.162.223.135:5
-Matched time template MONTH Day Hour:Minute:Second
-Found a match for ' HOSTNAME courieresmtpd: error,relay=::ffff:203.162.223.135,from=<firozquarl@aclunc.org>,to=<BOGUSUSER@HOSTEDDOMAIN.org>: 550 User unknown.\n' but no valid date/time found for 'Aoü 14 11:58:04'. Please file a detailed issue on https://github.com/fail2ban/fail2ban/issues in order to get support for this format.
+Replacing 'Aou' with 'Aug' in 'Aou 14 11:57:04'
+Got time using template MONTH Day Hour:Minute:Second
+Processing line with time:1124013424.0 and ip:203.162.223.135
+Found 203.162.223.135
+Total # of detected failures: 6. Current failures from 1 IPs (IP:count): 203.162.223.135:6
 Matched time template MONTH Day Hour:Minute:Second
 Got time using template MONTH Day Hour:Minute:Second
 Processing line with time:1124013544.0 and ip:203.162.223.135
 Found 203.162.223.135
-Total # of detected failures: 6. Current failures from 1 IPs (IP:count): 203.162.223.135:6
+Total # of detected failures: 7. Current failures from 1 IPs (IP:count): 203.162.223.135:7

but note that jail.conf seems read just fine with non-utf8 locale -- so where is the "bug"? ;)

Contributor

kwirk commented Jul 10, 2013

I've been able to recreate the original issue with LC_ALL=C. I guess the particular characters are valid for ru_RU.KOI8-R and UTF-8, but invalid ascii

Interesting results, but it does seem that the dates in the testlog01 are indeed invalid UTF-8:

----> 1 open("testcase01.log", encoding="utf-8").read()

/usr/lib/python3.3/codecs.py in decode(self, input, final)
    298         # decode input (taking the buffer into account)
    299         data = self.buffer + input
--> 300         (result, consumed) = self._buffer_decode(data, self.errors, final)
    301         # keep undecoded input until the next call
    302         self.buffer = data[consumed:]

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 1: invalid continuation byte
Contributor

kwirk commented Jul 27, 2013

Resolved by #285. Closing…

@kwirk kwirk closed this Jul 27, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment