使用卷积神经网络和循环神经网络进行中文文本的分类
CNN做句子分类的论文可以参看: Convolutional Neural Networks for Sentence Classification
还可以去读dennybritz大牛的博客:Implementing a CNN for Text Classification in TensorFlow 以及字符级CNN的论文:Character-level Convolutional Networks for Text Classification
本文是基于TensorFlow在中文数据集上的简化实现,使用了字符级CNN和RNN对中文文本进行分类,达到了较好的效果。
文中所使用的Conv1D与论文中有些不同,详细参考官方文档:tf.nn.conv1d
Tensorfow >= 1.3 Python 2/3 numpy scikit-learn scipy
python run.py cnn/rnn train/test
please choose one model from cnn and rnn to train first, and then test on test text. Finally you can get a predict text set to predict.
size of training text is too large to upload, you can download it from HERE, and then put it in path data/cnews