Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error handling when attempting to check log files #236

Merged
merged 2 commits into from Jul 30, 2020

Conversation

chris-allan
Copy link
Member

I think this is a Python2 --> Python3 hangover. Python2 would have just silently corrupted incorrect encodings whereas Python3 will unicode everything and try to use what's specified. UnicodeDecodeError's can then be thrown during omero admin diagnostics if log files contain UTF-8. For example:

Log dir:    /opt/omero/OMERO.current/var/log exists
Log files:  Blitz-0.log                    339.8 MB      errors=273  warnings=1024
Log files:  Blitz-0.log.1                  Traceback (most recent call last):
  File "/opt/omero/OMERO.venv36/bin/omero", line 118, in <module>
    rv = omero.cli.argv()
  File "/opt/omero/OMERO.venv36/lib64/python3.6/site-packages/omero/cli.py", line 1754, in argv
    cli.invoke(args[1:])
  File "/opt/omero/OMERO.venv36/lib64/python3.6/site-packages/omero/cli.py", line 1187, in invoke
    stop = self.onecmd(line, previous_args)
  File "/opt/omero/OMERO.venv36/lib64/python3.6/site-packages/omero/cli.py", line 1264, in onecmd
    self.execute(line, previous_args)
  File "/opt/omero/OMERO.venv36/lib64/python3.6/site-packages/omero/cli.py", line 1346, in execute
    args.func(args)
  File "/opt/omero/OMERO.venv36/lib64/python3.6/site-packages/omero/install/windows_warning.py", line 26, in wrapper
    return func(self, *args, **kwargs)
  File "/opt/omero/OMERO.venv36/lib64/python3.6/site-packages/omero/plugins/prefs.py", line 79, in open_and_close_config
    return func(*args, **kwargs)
  File "/opt/omero/OMERO.venv36/lib/python3.6/site-packages/omero/plugins/admin.py", line 1417, in diagnostics
    parse_logs()
  File "/opt/omero/OMERO.venv36/lib/python3.6/site-packages/omero/plugins/admin.py", line 1371, in parse_logs
    self._exists(old_div(log_dir, x))
  File "/opt/omero/OMERO.venv36/lib64/python3.6/site-packages/omero/cli.py", line 1117, in _exists
    for l in p.lines():
  File "/opt/omero/OMERO.venv36/lib64/python3.6/site-packages/omero_ext/path.py", line 938, in lines
    return f.readlines()
  File "/usr/lib64/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1809: ordinal not in range(128)

Default encoding of our vendored path.py is specified as ascii here:

Since we are using PyPI for nearly everything now might be worth looking into removing our vendored version and adding a dependency on the path module.

/cc @emilroz

Copy link
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed change makes full sense. I have been wondering why we didn't encounter it during the Python 2->3 migration as we fixed other diagnostics bugs. I suspect the environment of our CI and production servers, all C7 servers with LANG and LC_ALL set to en_US.UTF-8, sets the correct encoding under the hood.

Tested the bug flagged this PR by creating and importing მიკროსკოპის.fake so that Unicode characters are in the Blitz log. By default, omero admin diagnostics completes as expected on our C7 based CI servers. After setting LANG and LC_ALL to C, omero admin diagnostics fails with the stack trace described in this PR.

With this PR included, the same workflow as described above results in omero admin diagnostics returning a non-zero status code with both the default and the C locale.

As a side note, there seem to be another error related to the encoding further down the stack

Log dir:    /home/omero/workspace/OMERO-server/OMERO.server/var/log exists
Log files:  Blitz-0.log                    8.6 MB        errors=0    warnings=1344
Log files:  DropBox.log                    1.2 KB       
Log files:  FileServer.log                 114 B        
Log files:  Indexer-0.log                  23.0 KB      
Log files:  MonitorServer.log              1.1 KB       
Log files:  PixelData-0.log                5.5 KB       
Log files:  Processor-0.log                94.8 KB       errors=0    warnings=1   
Log files:  Tables-0.log                   20.3 KB       errors=0    warnings=3   
Log files:  TestDropBox.log                n/a
Log files:  master.err                     empty
Log files:  master.out                     empty
Log files:  Total size                     8.78 MB

Error while parsing logs

The following logic likely needs to be updated to set the encoding

try:
for file in ('Blitz-0.log',):
p = self.ctx.dir / "var" / "log" / file
import fileinput
for line in fileinput.input([str(p)]):
lno = fileinput.filelineno()
for k, v in list(issues.items()):
if k.match(line):
self._item('Parsing %s' % file,
"[line:%s] %s" % (lno, v))
self.ctx.out("")
break
except Exception:
self.ctx.err("Error while parsing logs")

Do you want to look into this change or capture and/or handle separately?

@manics
Copy link
Member

manics commented Jul 28, 2020

Is it possible for non-utf8 characters to end up in master.err or master.out? If so how does this behave?

@sbesson
Copy link
Member

sbesson commented Jul 28, 2020

At least in the context of the diagnostic command, I think the parsing of master.err and master.out and generally all files under the log directory is covered by the same logic as the one fixed in this PR

files = log_dir.files()
files = set([x.basename() for x in files])
# Adding known names just in case
for x in known_log_files:
files.add(x)
files = list(files)
files.sort()
for x in files:
self._item("Log files", x)
self._exists(old_div(log_dir, x))

@chris-allan
Copy link
Member Author

@sbesson: In our case LANG=en_GB.UTF-8. I had lots of issues reproducing consistently with a მიკროსკოპის.fake. In fact on the system in question there are also import issues with such a file. Let's find some time to try and debug together over the next few days.

@chris-allan
Copy link
Member Author

To summarize today's sleuthing, the consensus was:

  1. Where possible the locale and default encodings of Python should be respected (on Linux this is via LANG and LC_ALL)
  2. Go with something similar to what @joshmoore has started in UTF-8 Fixes #224 and ignore decoding errors. After all, we are just trying to count errors and warnings here.

There seems to be a separate issue with Java 11 handling unicode somewhere resulting in ????????.fake files ending up in the managed repository when filenames are UTF-8. I will try to track this down separately. For completeness, some examples on how to check the encoding of various dependencies of our stack follow:

$ jshell
|  Welcome to JShell -- Version 11.0.7
|  For an introduction type: /help intro
jshell> import java.nio.charset.Charset;
jshell> import java.util.Locale;
jshell> Locale.getDefault();
$3 ==> en_GB
jshell> Charset.defaultCharset();
$4 ==> UTF-8
jshell> System.getProperty("file.encoding");
$5 ==> "UTF-8"
jshell> System.getProperty("sun.jnu.encoding");
$6 ==> "UTF-8"
$ ipython
Python 3.6.8 (default, Apr  2 2020, 13:34:55)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.16.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import sys
In [2]: import locale
In [3]: locale.getpreferredencoding()
Out[3]: 'UTF-8'
In [4]: sys.getdefaultencoding()
Out[4]: 'utf-8'
$ psql omero
psql (9.6.18)
Type "help" for help.
omero=> SHOW SERVER_ENCODING;
 server_encoding
-----------------
 UTF8
(1 row)
omero=> SHOW CLIENT_ENCODING;
 client_encoding
-----------------
 UTF8
(1 row)

@chris-allan chris-allan changed the title Use UTF-8 when attempting to check log files Error handling when attempting to check log files Jul 29, 2020
@joshmoore joshmoore mentioned this pull request Jul 30, 2020
2 tasks
Copy link
Member

@joshmoore joshmoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Merging along with #224.

@joshmoore joshmoore merged commit fa8d3b9 into ome:master Jul 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants