Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError on starting alot #693

Closed
evnu opened this issue Feb 22, 2014 · 12 comments
Closed

UnicodeDecodeError on starting alot #693

evnu opened this issue Feb 22, 2014 · 12 comments

Comments

@evnu
Copy link

evnu commented Feb 22, 2014

This is a similar error to #673, but not quite the same. Backtrace:

/usr/lib/python2.7/site-packages/alot/helper.py:517: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  return cmp(a.lower(), b.lower())

Traceback (most recent call last):
  File "/usr/bin/alot", line 20, in <module>
    main()
  File "/usr/lib/python2.7/site-packages/alot/init.py", line 187, in main
    UI(dbman, cmd)
  File "/usr/lib/python2.7/site-packages/alot/ui.py", line 87, in __init__
    self.mainloop.run()
  File "/usr/lib/python2.7/site-packages/urwid/main_loop.py", line 272, in run
    self.screen.run_wrapper(self._run)
  File "/usr/lib/python2.7/site-packages/urwid/raw_display.py", line 242, in run_wrapper
    return fn()
  File "/usr/lib/python2.7/site-packages/urwid/main_loop.py", line 312, in _run
    self.draw_screen()
  File "/usr/lib/python2.7/site-packages/urwid/main_loop.py", line 563, in draw_screen
    canvas = self._topmost_widget.render(self.screen_size, focus=True)
  File "/usr/lib/python2.7/site-packages/urwid/widget.py", line 141, in cached_render
    canv = fn(self, size, focus=focus)
  File "/usr/lib/python2.7/site-packages/urwid/decoration.py", line 225, in render
    canv = self._original_widget.render(size, focus=focus)
  File "/usr/lib/python2.7/site-packages/urwid/widget.py", line 141, in cached_render
    canv = fn(self, size, focus=focus)
  File "/usr/lib/python2.7/site-packages/urwid/container.py", line 1058, in render
    focus and self.focus_part == 'body')
  File "/usr/lib/python2.7/site-packages/alot/buffers.py", line 37, in render
    return self.body.render(size, focus)
  File "/usr/lib/python2.7/site-packages/urwid/widget.py", line 141, in cached_render
    canv = fn(self, size, focus=focus)
  File "/usr/lib/python2.7/site-packages/urwid/listbox.py", line 457, in render
    (maxcol, maxrow), focus=focus)
  File "/usr/lib/python2.7/site-packages/urwid/listbox.py", line 339, in calculate_visible
    self._set_focus_complete( (maxcol, maxrow), focus )
  File "/usr/lib/python2.7/site-packages/urwid/listbox.py", line 704, in _set_focus_complete
    (maxcol,maxrow), focus)
  File "/usr/lib/python2.7/site-packages/urwid/listbox.py", line 674, in _set_focus_first_selectable
    (maxcol, maxrow), focus=focus)
  File "/usr/lib/python2.7/site-packages/urwid/listbox.py", line 402, in calculate_visible
    next, pos = self.body.get_next( pos )
  File "/usr/lib/python2.7/site-packages/alot/walker.py", line 32, in get_next
    return self._get_at_pos(start_from + self.direction)
  File "/usr/lib/python2.7/site-packages/alot/walker.py", line 58, in _get_at_pos
    widget = self._get_next_item()
  File "/usr/lib/python2.7/site-packages/alot/walker.py", line 71, in _get_next_item
    next_widget = self.containerclass(next_obj, **self.kwargs)
  File "/usr/lib/python2.7/site-packages/alot/widgets/search.py", line 27, in __init__
    self.rebuild()
  File "/usr/lib/python2.7/site-packages/alot/widgets/search.py", line 148, in rebuild
    minw, maxw, align_mode)
  File "/usr/lib/python2.7/site-packages/alot/widgets/search.py", line 119, in _build_part
    lambda tag_widget: tag_widget.translated)
  File "/usr/lib/python2.7/site-packages/alot/helper.py", line 517, in tag_cmp
    return cmp(a.lower(), b.lower())
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)

The workaround from #637 (setting LANG) did not work here. I could work around by adding a try..except around the comparison in alot/helper.py:

try:
    if min(len(a), len(b)) == 1 and max(len(a), len(b)) > 1:
        return cmp(len(a), len(b))
    else:
        return cmp(a.lower(), b.lower())
except UnicodeDecodeError:
    import sys 
    sys.stderr.write("Error with tags %r, %r\n" %(a,b))
    return False

With this, I get the following output when starting alot:

Error with tags u'signed', '\xe2\x9a\xb7'
Error with tags u'inbox', '\xe2\x9a\xb7'
@pazz
Copy link
Owner

pazz commented Mar 19, 2014

it seems you are using a terminal icapable of displaying utf-8 characters and also
translating tag-strings to unicode symbols in your config do you?
Can you just confirm that this error does not occur if you turn off this feature by commenting out the respective lines in the config?

another sanity check: is your config file perhaps not encoded in utf-8?
HTH

@evnu
Copy link
Author

evnu commented Mar 19, 2014

Terminal: rxvt-unicode-256color (unicode capable)
Config file encoding:

file config 
config: UTF-8 Unicode text

Turning of the translation of the tag encrypted to a unicode character fixes it, though. It works if I remove the following lines:

[[encrypted]]
translated = ⚷

But the character can be displayed in alot itself, e.g. when I edit a new email and use the character as the content of the mail.

@pazz pazz added bug labels Mar 21, 2014
@pazz
Copy link
Owner

pazz commented Mar 27, 2014

your output

Error with tags u'signed', '\xe2\x9a\xb7'

suggests that the string b is not a unicode string as i would expect.
I just checked: if I add the line

logging.debug("comparing %r and %r" % (a,b))

somewhere in tag_cmp then i see lots of unicode strings in my log:

...
DEBUG:helper:comparing u'\u2197' and u'\U0001f4c3'
DEBUG:helper:comparing u'\U0001f4c3' and u'alot'
DEBUG:helper:comparing u'\u2197' and u'\U0001f4c3'
DEBUG:helper:comparing u'\U0001f4c3' and u'offlineimap'
DEBUG:helper:comparing u'\u2197' and u'\U0001f4c3'
DEBUG:helper:comparing u'\xae' and u'\u2197'
DEBUG:helper:comparing u'notmuch' and u'\xae'
DEBUG:helper:comparing u'notmuch' and u'\U0001f4c3'
DEBUG:helper:comparing u'notmuch' and u'offlineimap'
...

I followed the code to see where we get the second string from, and it seems that the
origin is https://github.com/pazz/alot/blob/master/alot/settings/manager.py#L292
Here, it is taken directly from a configobj object.

Just for comparison: i'm running

ipython -c "import configobj; configobj.__version__"
Out[1]: '4.7.2'

I think somewhere around the linked line in alot.settings.Manager we should
use alot.helper.string_decode for safety.
let me know how it goes,
/p

@pazz
Copy link
Owner

pazz commented Mar 27, 2014

FTR: configobj development gained some momentum recently:
http://sourceforge.net/p/configobj/mailman/message/31980282/
It could be that this is the cause for the observed difference in our setups.

@pazz
Copy link
Owner

pazz commented Apr 2, 2014

i just updated to configobj 5.0.2 from pip, and i still get the same result (unicode strings everywhere).

@evnu
Copy link
Author

evnu commented Apr 2, 2014

I am running configobj with version 5.0.2 as well. Interestingly, running your ipython command from above fails. The module configobj does not offer __version__ here.

ipython2 -c "import configobj; configobj.__version__"                             
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-1-0ca86862e46c> in <module>()
----> 1 import configobj; configobj.__version__

AttributeError: 'module' object has no attribute '__version__'

For completeness, I tried to make sure that the character is actually well-represented in my configuration file. In urxvt:

$ echo ⚷ > chiron # by entering: shift+ctrl+26b7
$ xxd chiron
0000000: e29a b70a

Then cross-check the config:

$ vim <(xxd config)
...skip to relevant lines..
00008d0: 2020 2020 7472 616e 736c 6174 6564 203d      translated =
00008e0: 20e2 9ab7 0a20 205b 5b74 6f64 6f5d 5d0a   ....  [[todo]].
           ^--------^
               |---- there it is

Can you check the character representation on your system as well? Maybe the above is wrong?

Maybe I will find more time next week to investigate further. Sorry for taking so long to respond, alot is happening right now (pun partly intended).

@pazz
Copy link
Owner

pazz commented Apr 3, 2014

Thats it! I involuntarily used configobj v4.7.2 when i used import configobj,
although i had a newer version installed locally. The new version indeed has no
version
and i can now reproduce this error.
Since the configobj people claim full backwards compatibility its clearly an
issue
with configobj. I will file an issue there and try to come up with a fix.
until then, you can revert to an earlier version to make alot work.
Sorry about this,
/p

Quoting Magnus Müller (2014-04-02 22:01:50)

I am running configobj with version 5.0.2 as well. Interestingly, running
your ipython command from above fails. The module configobj does not offer
version here.

ipython2 -c "import configobj; configobj.version"


AttributeError Traceback (most recent call last)
in ()
----> 1 import configobj; configobj.version

AttributeError: 'module' object has no attribute 'version'

For completeness, I tried to make sure that the character ⚷ is actually
well-represented in my configuration file. In urxvt:

$ echo ⚷ > chiron # by entering: shift+ctrl+26b7
$ xxd chiron
0000000: e29a b70a

Then cross-check the config:

$ vim <(xxd config)
...skip to relevant lines..
00008d0: 2020 2020 7472 616e 736c 6174 6564 203d translated =
00008e0: 20e2 9ab7 0a20 205b 5b74 6f64 6f5d 5d0a .... [[todo]].
^--------^
|---- there it is

Can you check the character representation on your system as well? Maybe
the above is wrong?

Maybe I will find more time next week to investigate further. Sorry for
taking so long to respond, alot is happening right now (pun partly
intended).


Reply to this email directly or [1]view it on GitHub.

References

Visible links

  1. UnicodeDecodeError on starting alot #693 (comment)

pazz added a commit that referenced this issue Apr 3, 2014
regarding utf8 chars in the config being read as str, not unicode
in python v2.7. This pathc introduces an additional
`alot.helper.string_decode` around the translated tagname read from the
config, and fixes issue #693. In the long run, we expect configobj
to be fully backweards compatible.
@pazz
Copy link
Owner

pazz commented Apr 3, 2014

i pushed a work around to branch 0.3.5-fix-workaround-new-configobj-693, and also to the testing for now.

@evnu
Copy link
Author

evnu commented Apr 3, 2014

Great, thanks for debugging this. I will try the workaround as soon as possible.

@robdennis
Copy link

hello there, I'm one of the configobj maintainers, and wanted to put here as a backup, that I believe this issue would fixed as a result of the newly-release configobj 5.0.3

@pazz
Copy link
Owner

pazz commented Apr 7, 2014

hi @robdennis! unfortunately, it seems this issue is not yet resolved, at least for me.
It could be me not using configobj properly though. I still get a string like '\xf0\x9f\x90\x9c'
instead of a unicode obj for the relevant string (and alot-master blows as initially reported).

@robdennis
Copy link

This could be as a result of you not passing in encoding via a parameter to the constructor. By default files are left as byte strings, but passing in the encoding argument makes them Unicode (and will use that value to decode).

The change in 5.0.3 was that encoding value wasn't being used and we'd default to using ASCII encoded Unicode. This is on display in the comment the other commenter (sorry, on my phone) posted.

Sounds like for every constructor call of configobj, you probably want to set encoding='utf-8'

On Mon, Apr 7, 2014 at 7:25 AM, Patrick Totzke notifications@github.com
wrote:

hi @robdennis! unfortunately, it seems this issue is not yet resolved, at least for me.
It could be me not using configobj properly though. I still get a string like '\xf0\x9f\x90\x9c'

instead of a unicode obj for the relevant string (and alot-master blows as initially reported).

Reply to this email directly or view it on GitHub:
#693 (comment)

@pazz pazz closed this as completed Aug 2, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants