Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

German Umlauts(ä, ö, ü) seem to throw ValueError exceptions #20

Closed
yucongo opened this issue Apr 10, 2017 · 5 comments
Closed

German Umlauts(ä, ö, ü) seem to throw ValueError exceptions #20

yucongo opened this issue Apr 10, 2017 · 5 comments

Comments

@yucongo
Copy link

yucongo commented Apr 10, 2017

from googletrans import Translator
translator = Translator()
dest = 'zh-CN'
text='Mädchen'
translator.translate(text, src='de', dest=dest).text
text = 'schön'
translator.translate(text, src='de', dest=dest).text
text = 'Prüfung'
translator.translate(text, src='de', dest=dest).text

----> 2 translator.translate(text, src='de', dest=dest).text

d:\python34\lib\site-packages\googletrans\client.py in translate(self, text, dest, src, delay)
147
148 origin = text
--> 149 data = self._translate(text, dest, src)
150
151 # this code will be updated when the format is changed.

d:\python34\lib\site-packages\googletrans\client.py in _translate(self, text, dest, src)
89 LOGGER.debug(" from_cache: %s", self.from_cache)
90
---> 91 data = utils.format_json(r.text)
92 return data
93

d:\python34\lib\site-packages\googletrans\utils.py in format_json(original)
49 text = text[:p] + states[j][1] + text[nxt:]
50
---> 51 converted = json.loads(text)
52 return converted
53

d:\python34\lib\json_init_.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
316 parse_int is None and parse_float is None and
317 parse_constant is None and object_pairs_hook is None and not kw):
--> 318 return _default_decoder.decode(s)
319 if cls is None:
320 cls = JSONDecoder

d:\python34\lib\json\decoder.py in decode(self, s, _w)
341
342 """
--> 343 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
344 end = _w(s, end).end()
345 if end != len(s):

d:\python34\lib\json\decoder.py in raw_decode(self, s, idx)
359 obj, end = self.scan_once(s, idx)
360 except StopIteration as err:
--> 361 raise ValueError(errmsg("Expecting value", s, err.value)) from None
362 return obj, end

ValueError: Expecting value: line 1 column 1 (char 0)

Python 3.4, 32 bit Windows 7, no umlaut german OK.

text='Maedchen'
translator.translate(text, src='de', dest=dest).text

-- End pasted text --

googletrans.client - client.py[line:89] 2017-04-10 19:06:38,270 : DEBUG : from_cache: False
Out[220]: '女孩'

@ssut ssut added the bug label Apr 10, 2017
@yucongo
Copy link
Author

yucongo commented Apr 10, 2017

I added some debug lines to utils.py
def format_json(original):
# save state
states = []
text = original

LOGGER.debug("\n      text: %s", text)

The output seems to indicate the following: for text with umlaut, r.text from requests is not valid (not jason data): <title>Error 403 (Forbidden)!!1</title>

I do hope you'll be able to fix it.

$ python trans_clipboard.py
Mädchen
requests.packages.urllib3.connectionpool-connectionpool.py[ln:805]:INFO: Starting new HTTPS connection (1): translate.google.cn
requests.packages.urllib3.connectionpool-connectionpool.py[ln:401]:DEBUG: "GET /translate_a/single?client=t&dt=at&dt=bd&dt=ex&dt=ld&dt=md&dt=qca&dt=rw&dt=rm&dt=ss&dt=t&hl=zh-CN&ie=UTF-8&oe=UTF-8&otf=1&q=M%C3%A4dchen&sl=de&ssel=0&tk=651525.1024958&tl=zh-CN&tsel=0 HTTP/1.1" 403 1841
googletrans.client-client.py[ln:89]:DEBUG: from_cache: False
googletrans.utils-utils.py[ln:33]:DEBUG:
text:

<title>Error 403 (Forbidden)!!1</title> <style> *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px} </style>

403. That’s an error.

Your client does not have permission to get URL /translate_a/single?client=t&dt=at&dt=bd&dt=ex&dt=ld&dt=md&dt=qca&dt=rw&dt=rm&dt=ss&dt=t&hl=zh-CN&ie=UTF-8&oe=UTF-8&otf=1&q=M%C3%A4dchen&sl=de&ssel=0&tk=651525.1024958&tl=zh-CN&tsel=0 from this server. That’s all we know.

Traceback (most recent call last):
File "trans_clipboard.py", line 18, in
pyperclip.copy(translator.translate(text, src='de', dest=dest).text)
File "D:\Python34\lib\site-packages\googletrans\client.py", line 149, in translate
data = self._translate(text, dest, src)
File "D:\Python34\lib\site-packages\googletrans\client.py", line 91, in translate
data = utils.format_json(r.text)
File "D:\Python34\lib\site-packages\googletrans\utils.py", line 57, in format_json
converted = json.loads(text)
File "D:\Python34\lib\json_init
.py", line 318, in loads
return _default_decoder.decode(s)
File "D:\Python34\lib\json\decoder.py", line 343, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "D:\Python34\lib\json\decoder.py", line 361, in raw_decode
raise ValueError(errmsg("Expecting value", s, err.value)) from None
ValueError: Expecting value: line 1 column 1 (char 0)

$ python trans_clipboard.py
Machen
requests.packages.urllib3.connectionpool-connectionpool.py[ln:805]:INFO: Starting new HTTPS connection (1): translate.google.cn
requests.packages.urllib3.connectionpool-connectionpool.py[ln:401]:DEBUG: "GET /translate_a/single?client=t&dt=at&dt=bd&dt=ex&dt=ld&dt=md&dt=qca&dt=rw&dt=rm&dt=ss&dt=t&hl=zh-CN&ie=UTF-8&oe=UTF-8&otf=1&q=Machen&sl=de&ssel=0&tk=88832.461243&tl=zh-CN&tsel=0 HTTP/1.1" 200 None
googletrans.client-client.py[ln:89]:DEBUG: from_cache: False
googletrans.utils-utils.py[ln:33]:DEBUG:
text: [[["使","Machen",null,null,0],[null,null,"Shǐ"]],null,"de",null,null,[["Machen",null,[["使",539,false,false],["让",281,false,fals
e],["令",115,false,false],["做",49,false,false],["使得",14,false,false]],[[0,6]],"Machen",0,1]],1,null,[["de"],null,[1],["de"]],null,null,null
,null,null,[["fertig machen","sauber machen","rückg\xe4ngig machen","bekannt machen","aufmerksam machen","kaputt machen","Urlaub machen","deu tlich machen","sich Sorgen machen","Sorgen machen"]]]
使

@ssut
Copy link
Owner

ssut commented Apr 11, 2017

This is because of the invalid token. It will take some time to fix this issue. Please wait a moment.

@yucongo
Copy link
Author

yucongo commented Apr 12, 2017

remove_accents from EDIT in the thrid answer in http://stackoverflow.com/questions/517923/what-is-the-best-way-to-remove-accents-in-a-python-unicode-string

Seems to do the trick in the meantime... I was doing replace(umlauts, ae|oe|ue) for German, but remove_accents is much cleaner... and works for French as well...

# -*- coding: utf-8 -*-
import unicodedata


def remove_accents(input_str):
    nfkd_form = unicodedata.normalize('NFKD', input_str)
    return u"".join([c for c in nfkd_form if not unicodedata.combining(c)])

if __name__ == '__main__':
    from googletrans import Translator
    dest = 'zh-CN'
    translator = Translator()
    googletrans_text = lambda x: translator.translate(x, dest=dest).text

    text = 'Tatsächlich stimmten die Diagnosen der zuvor behandelnden Ärzte in nur zwölf Prozent (36 Fälle)'

    print(googletrans_text(remove_accents(text)))

    text = 'Une grande partie du camp d’accueil de migrants de la Linière, ouvert en mars 2016 à Grande-Synthe (Nord), a été ravagée par un incendie dans la nuit de lundi à mardi.'
    print(googletrans_text(remove_accents(text)))

@yucongo
Copy link
Author

yucongo commented Apr 12, 2017

remove_accents cant handle "Weiß".
There can be other cases ……

@ssut
Copy link
Owner

ssut commented Apr 14, 2017

I'm trying to fix this issue as soon as possible but it may take some time to fix this issue as this is related to the token generator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants