New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UnicodeDecodeError #32
Comments
Hi, I've encountered similar issues with a migration I've been doing. To replicate this, you can grab the 2015-2016 archive files from the I used the python script available at https://blogs.gnome.org/muelli/2012/11/converting-mailman-archives-mboxes-to-maildir/ to convert the Unfortunately, this script seems to have produced a number of duplicates, so I tried to use your script to remove them. I've encountered several different exceptions: Subject: Re: [Freeipa-users] 389 DS & admin consoles
Traceback (most recent call last):
File "/usr/bin/mdedup", line 11, in <module>
sys.exit(cli())
File "/usr/lib/python2.7/site-packages/click/core.py", line 716, in __call__
return self.main(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/click/core.py", line 696, in main
rv = self.invoke(ctx)
File "/usr/lib/python2.7/site-packages/click/core.py", line 1060, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/lib/python2.7/site-packages/click/core.py", line 889, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/lib/python2.7/site-packages/click/core.py", line 534, in invoke
return callback(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/click/decorators.py", line 17, in new_func
return f(get_current_context(), *args, **kwargs)
File "/usr/lib/python2.7/site-packages/maildir_deduplicate/cli.py", line 140, in deduplicate
dedup.run()
File "/usr/lib/python2.7/site-packages/maildir_deduplicate/deduplicate.py", line 220, in run
sorted_messages_size = self.size_sort(messages)
File "/usr/lib/python2.7/site-packages/maildir_deduplicate/deduplicate.py", line 328, in size_sort
size = len(''.join(body).decode('utf-8'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0x96 in position 15: ordinal not in range(128) I got around this exception by commenting line 328 in the Subject: Re: [Freeipa-users] Squid authentication in FreeIPA
Traceback (most recent call last):
File "/usr/bin/mdedup", line 11, in <module>
sys.exit(cli())
File "/usr/lib/python2.7/site-packages/click/core.py", line 716, in __call__
return self.main(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/click/core.py", line 696, in main
rv = self.invoke(ctx)
File "/usr/lib/python2.7/site-packages/click/core.py", line 1060, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/lib/python2.7/site-packages/click/core.py", line 889, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/lib/python2.7/site-packages/click/core.py", line 534, in invoke
return callback(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/click/decorators.py", line 17, in new_func
return f(get_current_context(), *args, **kwargs)
File "/usr/lib/python2.7/site-packages/maildir_deduplicate/cli.py", line 140, in deduplicate
dedup.run()
File "/usr/lib/python2.7/site-packages/maildir_deduplicate/deduplicate.py", line 220, in run
sorted_messages_size = self.size_sort(messages)
File "/usr/lib/python2.7/site-packages/maildir_deduplicate/deduplicate.py", line 326, in size_sort
body = cls.get_lines_from_message_body(message)
File "/usr/lib/python2.7/site-packages/maildir_deduplicate/deduplicate.py", line 342, in get_lines_from_message_body
header_text, sep, body = message.as_string().partition("\n\n")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 3584: ordinal not in range(128) |
I could solve it with adding import sys
sys.setdefaultencoding("latin-1") to |
I removed the I still think we do not need this hack if we had properly handled strings in A first step might be to provide a unittest to clearly expose the issue discussed here, so we can try to find a cleaner way to handle that case. |
Fixed by #33. |
I still get this error (using the develop branch) Traceback (most recent call last):
File "/home/duncan/.pyenv/versions/2.7.9/bin/mdedup", line 9, in <module>
load_entry_point('maildir-deduplicate==1.2.1', 'console_scripts', 'mdedup')()
File "/home/duncan/.pyenv/versions/2.7.9/lib/python2.7/site-packages/click/core.py", line 716, in __call__
return self.main(*args, **kwargs)
File "/home/duncan/.pyenv/versions/2.7.9/lib/python2.7/site-packages/click/core.py", line 696, in main
rv = self.invoke(ctx)
File "/home/duncan/.pyenv/versions/2.7.9/lib/python2.7/site-packages/click/core.py", line 1060, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/duncan/.pyenv/versions/2.7.9/lib/python2.7/site-packages/click/core.py", line 889, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/duncan/.pyenv/versions/2.7.9/lib/python2.7/site-packages/click/core.py", line 534, in invoke
return callback(*args, **kwargs)
File "/home/duncan/.pyenv/versions/2.7.9/lib/python2.7/site-packages/click/decorators.py", line 17, in new_func
return f(get_current_context(), *args, **kwargs)
File "/space/git/tmp/maildir-deduplicate/maildir_deduplicate/cli.py", line 145, in deduplicate
dedup.add_maildir(maildir)
File "/space/git/tmp/maildir-deduplicate/maildir_deduplicate/deduplicate.py", line 82, in add_maildir
mail_file, message, self.use_message_id)
File "/space/git/tmp/maildir-deduplicate/maildir_deduplicate/deduplicate.py", line 106, in compute_hash
canonical_headers_text = cls.canonical_headers(mail_file, message)
File "/space/git/tmp/maildir-deduplicate/maildir_deduplicate/deduplicate.py", line 128, in canonical_headers
canonical_value = cls.canonical_header_value(header, value)
File "/space/git/tmp/maildir-deduplicate/maildir_deduplicate/deduplicate.py", line 155, in canonical_header_value
value = re.sub('\s+', ' ', value).strip()
File "/home/duncan/.pyenv/versions/2.7.9/lib/python2.7/re.py", line 155, in sub
return _compile(pattern, flags).sub(repl, string, count)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 36: ordinal not in range(128) |
@dmacvicar : Ok, I reopened the issue. We definitively needs unit-tests to to cover this area. |
Put the |
I put these three lines back in import sys
reload(sys)
sys.setdefaultencoding('latin-1') If I put UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 72: invalid start byte I installed it today using |
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
I tried to use
mdedup
on my maildir with 8276 and got the following error:Unfortunately I can't identify the message, which causes this error.
mdedup -v
doesn't show more information. How can I find the problematic message?The text was updated successfully, but these errors were encountered: