我想用jieba分词后，只想提出里面的中文分词，不要标点符号，怎么用python处理啊谢谢 #528

tianke0711 · 2017-09-27T06:35:19Z

No description provided.

cbzhuang · 2017-09-28T02:53:25Z

我是用正则表达式处理的，new_sentence = re.sub(r'[^\u4e00-\u9fa5]', ' ', old_sentence) 然后再进行分词的, \u4e00-\u9fa5这个是utf-8中，中文编码的范围

tianke0711 · 2017-09-28T07:29:30Z

@cbzhuang 非常谢谢你的回复！我用了这个，不知道可对。#169

kn45 · 2018-08-07T10:59:32Z

Actually, CJK characters are encoded together so there's no critical range for Chinese characters. A punctuation dict could be used to do the filtering.

Zhya1124 · 2021-10-25T15:19:36Z

@cbzhuang 很棒，但你这个' '中间多打了一个空格吧，应该是new_sentence = re.sub(r'[^\u4e00-\u9fa5]', '', old_sentence)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

我想用jieba分词后，只想提出里面的中文分词，不要标点符号，怎么用python处理啊谢谢 #528

我想用jieba分词后，只想提出里面的中文分词，不要标点符号，怎么用python处理啊谢谢 #528

tianke0711 commented Sep 27, 2017

cbzhuang commented Sep 28, 2017

tianke0711 commented Sep 28, 2017

kn45 commented Aug 7, 2018

Zhya1124 commented Oct 25, 2021

我想用jieba分词后，只想提出里面的中文分词，不要标点符号，怎么用python处理啊 谢谢 #528

我想用jieba分词后，只想提出里面的中文分词，不要标点符号，怎么用python处理啊 谢谢 #528

Comments

tianke0711 commented Sep 27, 2017

cbzhuang commented Sep 28, 2017

tianke0711 commented Sep 28, 2017

kn45 commented Aug 7, 2018

Zhya1124 commented Oct 25, 2021

我想用jieba分词后，只想提出里面的中文分词，不要标点符号，怎么用python处理啊谢谢 #528

我想用jieba分词后，只想提出里面的中文分词，不要标点符号，怎么用python处理啊谢谢 #528