a crawler for zhihu
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.DS_Store
.gitignore
LICENSE
README.md
crawler.py
db.py
engine.py
multi_threading.py
red_filter.py
test.py
thread_crawler.py
thread_db.py
thread_red.py
toCsv.py

README.md

#Zhihu_Crawler


this is web crawler for zhihu.com

the crawler use Redis for checking the url has been crawled or not,and use mongodb for storing data.

if you wanna print out the data,run:

python engine.py --mongo

the crawler would store the data in mongodb

but if just run :

python engine.py

you will see

************************************************************
用户名:Mingo鸣哥

用户性别:female

用户地址:香港

被同意:59960

被感谢:14474

被关注:39055

关注了:806

工作:记者/
教育:香港中文大学 (Chinese University of Hong Kong)/新媒体
************************************************************