Skip to content

hellozjj/web-crawlers

 
 

Repository files navigation

Five crawlers, targeted at five different sites.

####Difficulties

  • damai
    • ajax
  • douban
    • captcha
    • block ip
  • zhihu
    • dynamic page
  • weibo
    • post data has random id
  • songtaste

####Solutions

  • douban
    • catch the captcha and enter the characters manually
    • set a interval for each request, or use a proxy
  • zhihu
    • use selenium2 and phantomjs instead of urllib2
  • weibo
    • catch the random id
  • songtaste
    • the simplest one

About

five python crawlers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%