Skip to content
This repository has been archived by the owner on Dec 10, 2023. It is now read-only.

Can't encode character u'\xf6' #1

Open
habi opened this issue Sep 28, 2016 · 15 comments
Open

Can't encode character u'\xf6' #1

habi opened this issue Sep 28, 2016 · 15 comments
Labels

Comments

@habi
Copy link

habi commented Sep 28, 2016

Whenever I run wa2latex.py (Ubuntu 16.04, Python 2.7.12 (Anaconda custom (64-bit))), I get the error

UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 13: ordinal not in range(128)

from line 169 of the script.

I suppose this could be because I'm using a german chat lot with lots of Umlauts and ö corresponds to an ö...
Is there any way I can make a book from my chat log (except changing all Umlauts)?

@habi
Copy link
Author

habi commented Sep 28, 2016

PS: If I do change all the Umlauts to their written out expression (ä > ae, ö > oe, etc.), then the script works :)
But the text is then not really nicely readable in German...

@pbeck
Copy link
Owner

pbeck commented Sep 28, 2016

Hey David,

Thanks for reporting, I’ll look into it! I’ve used lots of umlauts aswell (chats in Swedish), but I don’t remember having issues with them.

Any chance you could try running wa2latex with Python 3? And could you upload a sample snippet that causes the error?

@pbeck pbeck added the bug label Sep 28, 2016
@habi
Copy link
Author

habi commented Sep 28, 2016

Running the command below with Python 3.5.2 (Thanks to Anaconda)

python wa2latex.py _chat_with_umlauts.txt > whatsbook-folio.tex

gives me

Traceback (most recent call last):
  File "wa2latex.py", line 148, in <module>
    line = emojis.replace_emoji(line)
  File "wa2latex.py", line 85, in replace_emoji
    text = text.replace(emoji, "\\emoji{" + emoji.encode('unicode-escape').encode('utf-8') + "}")
AttributeError: 'bytes' object has no attribute 'encode'

That's why I tried with Python2 :)
Might there be a problem with the encoding of the exported chat TXT file?

@pbeck
Copy link
Owner

pbeck commented Sep 28, 2016

What’s the encoding of your txt file?

@habi
Copy link
Author

habi commented Sep 28, 2016

I’ve tried under Linux (at work), where I don’t have access to the file now.
At home (on OS X 10.11.6), the chat.txt file is UTF-8 encoded, and I get this error with Python 2.7.11

anomalocaris:whatsbook habi$ python wa2latex.py _chat.txt > whatsbook-folio.tex
Traceback (most recent call last):
  File "wa2latex.py", line 26, in <module>
    import pandas as pd
  File "/usr/local/lib/python2.7/site-packages/pandas/__init__.py", line 44, in <module>
    from pandas.core.api import *
  File "/usr/local/lib/python2.7/site-packages/pandas/core/api.py", line 9, in <module>
    from pandas.core.groupby import Grouper
  File "/usr/local/lib/python2.7/site-packages/pandas/core/groupby.py", line 17, in <module>
    from pandas.core.frame import DataFrame
  File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 41, in <module>
    from pandas.core.series import Series
  File "/usr/local/lib/python2.7/site-packages/pandas/core/series.py", line 2909, in <module>
    import pandas.tools.plotting as _gfx
  File "/usr/local/lib/python2.7/site-packages/pandas/tools/plotting.py", line 28, in <module>
    import pandas.tseries.converter as conv
  File "/usr/local/lib/python2.7/site-packages/pandas/tseries/converter.py", line 7, in <module>
    import matplotlib.units as units
  File "/usr/local/lib/python2.7/site-packages/matplotlib/__init__.py", line 1131, in <module>
    rcParams = rc_params()
  File "/usr/local/lib/python2.7/site-packages/matplotlib/__init__.py", line 975, in rc_params
    return rc_params_from_file(fname, fail_on_error)
  File "/usr/local/lib/python2.7/site-packages/matplotlib/__init__.py", line 1100, in rc_params_from_file
    config_from_file = _rc_params_in_file(fname, fail_on_error)
  File "/usr/local/lib/python2.7/site-packages/matplotlib/__init__.py", line 1018, in _rc_params_in_file
    with _open_file_or_url(fname) as fd:
  File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/usr/local/lib/python2.7/site-packages/matplotlib/__init__.py", line 1000, in _open_file_or_url
    encoding = locale.getdefaultlocale()[1]
  File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/locale.py", line 543, in getdefaultlocale
    return _parse_localename(localename)
  File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/locale.py", line 475, in _parse_localename
    raise ValueError, 'unknown locale: %s' % localename
ValueError: unknown locale: UTF-8

@pbeck
Copy link
Owner

pbeck commented Sep 28, 2016

What happens if you do

export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8

or perhaps even better

export LC_ALL=de_CH.UTF-8
export LANG=de_CH.UTF-8

(if you’re in German speaking Switzerland)

and then run wa2python.py with python2?

@habi
Copy link
Author

habi commented Sep 28, 2016

If I export the Swiss german variables on OS X, then I get the same error as on Linux

Traceback (most recent call last):
  File "wa2latex.py", line 168, in <module>
    print(u"\section*{%s}" % date)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 13: ordinal not in range(128)

@pbeck
Copy link
Owner

pbeck commented Sep 30, 2016

Running wa2latex.py on a file containing åäöÅÄÖ works with Python 2.7.10 on macOS 10.11.5.
My locales are set to en_US.UTF8.

Any chance you could send me a sample of your chatlog? It’s hard for me to debug without proper (non-working 😄) data.

@habi
Copy link
Author

habi commented Sep 30, 2016

I just sent the file to the email address in your GitHub profile.

@pbeck
Copy link
Owner

pbeck commented Oct 1, 2016

I tried running wa2latex with your chat log, and it worked without any issues on macOS with Python 2.7. I’ll be traveling abroad next week, but I can hopefully figure something out when I’m back.

@laserjay
Copy link

Guys, I have the same problem on Ubuntu 16.04 with Python 2.7.12, LANG and LC_ALL both set to en_US.UTF-8 and the chat history being in UTF-8 (text contains Swiss German characters as well).
I get the following error:

Traceback (most recent call last):
File "wa2latex.py", line 168, in
print(u"\section*{%s}" % date)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 15: ordinal not in range(128)

Also, the section markers create sometimes correct ones:
\section*{01.01.2000}
but other times they use the first word on a new line if there's no date present:
\section*{word}

I checked the file with a hex editor and found out, that if there's a 0x200A before the new line/word, it gets a section marker with words, before other new lines there is a 0x0D0A and it parses the date correctly.

@pbeck
Copy link
Owner

pbeck commented Dec 26, 2016

I haven’t been able to reproduce @habi’s issues, but I’m sure they’re valid – even more so if you also have issues @laserjay. My ambition is to rewrite parts of wa2latex for Python 3 as soon as possible, I’m hoping this will if not solve your issues, at least make them easier to debug.

@laserjay
Copy link

Hey @pbeck

Similiar to @habi, I was able to make it work by

  • converting the umlauts äöü to ae, oe and ue
  • replacing « » with " "
  • removing further unicode characters (cyrillic and chinese) at all

However, the issue with new lines not starting with dates, and therefore creating arbitrary section markers remained even after that.

If you're rewriting it anyway, could you also add a function to optionally include the timestamp as well?

Thank you very much, really looking forward to it and let me know if I can provide further help, i.e. by testing it! :)

Cheers!
laserjay

@bakshi-varun
Copy link

hi

I am facing a similar issue as lasrerjay "However, the issue with new lines not starting with dates, and therefore creating arbitrary section markers remained even after that." Any leads how to solve that?

@pbeck
Copy link
Owner

pbeck commented Apr 3, 2017

@bakshi-varun Maybe laserjays latest comment might help?
I haven’t had the time to update the script yet and no ETA for when that will happen unfortunately.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants