monkeyshare / novels Public

Notifications You must be signed in to change notification settings
Fork 1
Star 1

基于信息熵和凝聚度的新词发现

1 star 1 fork Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
README.md		README.md
chapdics.txt		chapdics.txt
sentences.txt		sentences.txt
worddic.txt		worddic.txt
基于信息熵的新词发现.py		基于信息熵的新词发现.py

Repository files navigation

基于信息熵和凝聚度的新词发现

测试！！！

数据

全文存放在chapdics数据表中；
然后对文章进行分句（以所有标点符号为分割符）

方法

根据词频>5、凝聚度>30，信息熵>3，找出每篇小说的词，存入数据表worddic中
然后统计worddic中各个词出现的次数，发现每篇小说特有的词

About

基于信息熵和凝聚度的新词发现

word cut

Report repository

Releases

No releases published

Packages

No packages published

Languages

Python 100.0%