CC98

#爬虫设定板块的ID号，然后爬虫开始去追踪版面信息，把该板块的每个帖子里，每层楼的发帖者，发帖时间，楼层，发帖内容，改帖子信息存储到MongoDB数据库。

#热词统计选取帖子超过30页的帖子，进行分词热词统计，然后过滤掉一些无用的热词，每个帖子的热词存储在MongoDB数据库里面。

#依赖库

Beautifusoup4
用来解析HTML页面，定位和提取HTML页面里面所需存储的信息。
```
pip install beautifulsoup4
```
lxml
Beautifulsoup使用的第三方解析器
```
pip install lxml
```
pymongo
MongoDB的python接口
```
pip install pymongo	
```
jieba
用于分词
pip install jieba

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.gitignore		.gitignore
README.md		README.md
cc98.py		cc98.py
classify.py		classify.py
crawl_cc98.py		crawl_cc98.py

Provide feedback