Skip to content
scrapy spider of forum
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
lichking
README
scrapy.cfg

README

Tips:没有抓不到的数据,只有不努力的爬虫 fight!

1.运行
    cd /home/bastion/spider_lenovo/lichking
    nohup scrapy crawl ithome > ../spider_logs/ithome.log &
    nohup scrapy crawl ihei5 > ../spider_logs/ihei5.log &
    nohup python run.py >> /home/work/spider_lenovo/spider_daily.log &
2.依赖
    sudo pip install -t /usr/local/lib/python2.7/site-packages/
    pip install scrapy
    pip install beautifulsoup4
    pip install lxml
    pip install mongoengine
    pip install apscheduler
    sudo /usr/local/bin/pip install supervisor
3 数据
    mongo localhost:27017/yuqing -u yuqing -p   123456
    db.y_lenovo_forum_item.count()


4.spider启动注意事项:
    spider crawl myspider -a category=pc
    # 在 init 方法中获取
    scrapy genspider mydomain mydomain.com
    import logging
    logging.log(logging.WARNING, "waring")

5.mongo 运维:
    查看连接数:db.serverStatus().connections

6.debug
    from scrapy import cmdline
    cmdline.execute('scrapy crawl amazon_products -o items.csv -t csv'.split())
You can’t perform that action at this time.