Encoding error (non ascii characters are not valid in gTTS()) #71

Fatallis · 2017-05-18T23:23:24Z

Hello I recently found this new amazing project. Cangratulations!!! I found an error while using a file with spanish text in it.

This is the error message:
'ascii' codec can't encode character u'\xbf' in position 0: ordinal not in range(128)
The text in the input file:
¿Cómo sabes que amas a alguien? Filosofía Martha Nussbaum Incomplegencia Teorema de la Verdad del Corazón, de Platón a Proust.

The command:
gtts-cli -o test.mp3 -f test.txt -l 'es'
I am not an expert with codecs and this stuff, I added this lines to gtts-cli.py:

# encoding=utf8
reload(sys)
sys.setdefaultencoding('utf8')

And it worked well, however I don't know if it's the optimal solution.

The text was updated successfully, but these errors were encountered:

antropophob · 2017-07-25T21:42:11Z

I can confirm this (or very similar) issue with Russian language.
Here is a stack trace:
Traceback (most recent call last): File "/home/parallels/Documents/talk.py", line 394, in <module> SendSpeech(FileNameTmp) File "/home/parallels/Documents/talk.py", line 207, in SendSpeech Say(choicyfication(result)) File "/home/parallels/Documents/talk.py", line 146, in Say tts = gTTS(text, targetLanguage) File "/usr/local/lib/python2.7/dist-packages/gtts/tts.py", line 97, in __init__ text_parts = self._tokenize(text, self.MAX_CHARS) File "/usr/local/lib/python2.7/dist-packages/gtts/tts.py", line 169, in _tokenize min_parts += self._minimize(p, " ", max_size) File "/usr/local/lib/python2.7/dist-packages/gtts/tts.py", line 176, in _minimize if self._len(thestring) > max_size: File "/usr/local/lib/python2.7/dist-packages/gtts/tts.py", line 154, in _len return len(text.decode('utf8')) File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0xd0 in position 0: unexpected end of data

The reason for such behaviour is that gtts breaks long string on small chunks without preserving coding structure.
String example where gtss fails:
Для продолжения обслуживания требуется переключение на оператора Контактного центра Банка. Вы согласны?

XueWei · 2017-07-31T18:33:55Z

I encounter similar issues for zh-cn, ja. If I input long text.

pndurette · 2017-08-03T03:24:09Z

Hey! Thanks @Fatallis! Sorry it took so long to look at this, glad you had a working workaround.

Everyone: this should be fixed in gTTS v1.2.1 that was just released. I used all the examples above for testing as well. It was an issue with Python 2.7. Let me know how it goes.

pndurette added the bug label Jul 31, 2017

This was referenced Aug 2, 2017

Encoding error : Cannot handle more than 100 non-ascii characters. #73

Closed

Fix split long encoding #75

Merged

pndurette closed this as completed Aug 3, 2017

github-actions bot locked as resolved and limited conversation to collaborators Feb 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encoding error (non ascii characters are not valid in gTTS()) #71

Encoding error (non ascii characters are not valid in gTTS()) #71

Fatallis commented May 18, 2017 •

edited

Loading

antropophob commented Jul 25, 2017

XueWei commented Jul 31, 2017

pndurette commented Aug 3, 2017

Encoding error (non ascii characters are not valid in gTTS()) #71

Encoding error (non ascii characters are not valid in gTTS()) #71

Comments

Fatallis commented May 18, 2017 • edited Loading

antropophob commented Jul 25, 2017

XueWei commented Jul 31, 2017

pndurette commented Aug 3, 2017

Fatallis commented May 18, 2017 •

edited

Loading