UnicodeEncodeError: 'ascii' codec can't encode characters in position 2-3: ordinal not in range(128) #96

luvensaitory · 2018-08-20T03:49:58Z

Details

My data is about Chinese Math questions :
小蓉吃了8顆水餃，小宇吃了10顆水餃，誰吃的水餃比較多？ ( )吃的多

And the training data is :
小小蓉人名 B-人名
蓉小蓉人名 E-人名
吃吃 VC S
了了 Di S
8 8 Neu S
顆顆 Nf S
水水餃 Na S
餃水餃 Na S
，， COMMACATEGORY S
小小宇人名 B-人名
宇小宇人名 E-人名
吃吃 VC S
了了 Di S
1 10 Neu S
0 10 Neu S
顆顆 Nf S
水水餃 Na S
餃水餃 Na S
，， COMMACATEGORY S
誰誰 Nh S
吃吃 VC S
的的 DE S
水水餃 Na S
餃水餃 Na S
比比較 Dfa S
較比較 Dfa S
多多 VH S
？？ QUESTIONCATEGORY S
( ( PARENTHESISCATEGORY S
) ) PARENTHESISCATEGORY S
吃吃 VC S
的的 DE S
多多 VH S

umoqnier · 2019-04-02T17:58:41Z

I have the same problem with Otomí (mexican language).
My Traceback looks like this

'ascii' codec can't encode character '\xe9' in position 8: ordinal not in range(128)

And the first three elements of xseq list looks like this:

[[b'bias', b'letterLowercase=d', b'postag=unkwn', b'BOS', b'nxtpostag=cnj', b'BOW', b'nxtletter=<i', b'nxt2letters=<ig', b'nxt3letters=<ige', b'nxt4letters=<igeh'], [b'bias', b'letterLowercase=i', b'postag=unkwn', b'BOS', b'nxtpostag=cnj', b'letterposition=-7', b'prevletter=d>', b'nxtletter=<g', b'nxt2letters=<ge', b'nxt3letters=<geh', b'nxt4letters=<geh\xc3\xb1'], [b'bias', b'letterLowercase=g', b'postag=unkwn', b'BOS', b'nxtpostag=cnj', b'letterposition=-6', b'prev2letters=di>', b'prevletter=i>', b'nxtletter=<e', b'nxt2letters=<eh', b'nxt3letters=<eh\xc3\xb1', b'nxt4letters=<eh\xc3\xb1a']]

In previous step i try to do this for encoding but seems not works property:

featurelist.append([f.encode('utf-8') for f in features])

Weber12321 · 2022-10-03T03:47:40Z

Is this problem solved? I have the same problem with same error trying to train the NER model with Chinese too...

fgregg · 2024-09-30T17:56:20Z

closed by 4014eb0

umoqnier mentioned this issue May 15, 2019

UnicodeDecodeError at tag method #106

Closed

erip mentioned this issue Jul 22, 2020

Hotfix encoding issue #121

Closed

fgregg closed this as completed Sep 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnicodeEncodeError: 'ascii' codec can't encode characters in position 2-3: ordinal not in range(128) #96

UnicodeEncodeError: 'ascii' codec can't encode characters in position 2-3: ordinal not in range(128) #96

luvensaitory commented Aug 20, 2018

umoqnier commented Apr 2, 2019

Weber12321 commented Oct 3, 2022

fgregg commented Sep 30, 2024

UnicodeEncodeError: 'ascii' codec can't encode characters in position 2-3: ordinal not in range(128) #96

UnicodeEncodeError: 'ascii' codec can't encode characters in position 2-3: ordinal not in range(128) #96

Comments

luvensaitory commented Aug 20, 2018

umoqnier commented Apr 2, 2019

Weber12321 commented Oct 3, 2022

fgregg commented Sep 30, 2024